R Tips-n-Tricks

How to transfer your library when updating R

https://rstats.wtf/maintaining-r.html#how-to-transfer-your-library-when-updating-r

Useful/interesting R packages

  • validate
  • lemon
  • naniar
    • gg_miss_var()
    • gg_miss_upset() (instead of venn diagrams)
    • geom_miss_point() (jitter of missing)
    • bind_shadow()
    • vis_miss() (Credit: Iris Minichmayr)
  • ggpmisc (linear regression equation)
  • ggstatplot
  • ”future” β€” r-package for cluster and parallell in r
  • ”furrr” β€” future:ize the purrr-functions
  • Targets
  • Renv
  • Roxygen2
  • Usethis
  • Devtools
  • latticeExtra
  • tidymodels

Max Kuhn Keynote on tidymodels

Useful functions

Show ALL duplicate values

df |>
group_by(ID) |>
filter(
  duplicated(COV) |
  duplicated(COV, fromLast = TRUE)
)

Find first occurence of a non-NA

df$col1[which.max(!is.na(col2))]

Pipe into View(), with title

df |>
  filter |>
  group_by |>
  View("df")

Generate a frequency table (1-, 2-, or 3-way)

janitor::tabyl(mtcars, cyl)
#>  cyl  n percent
#>    4 11 0.34375
#>    6  7 0.21875
#>    8 14 0.43750

A fully-featured alternative to table(). Results are data.frames and can be formatted and enhanced with janitor’s family of adorn_ functions.

Extract R code from Rmd

knitr::purl(
    input = "my_document.Rmd",
    output = "my_script.R"
)

Skim a data frame, getting useful summary statistics

skim(iris)
#> ── Data Summary ────────────────────────
#>                            Values
#> Name                       iris  
#> Number of rows             150   
#> Number of columns          5     
#> _______________________          
#> Column type frequency:           
#>   factor                   1     
#>   numeric                  4     
#> ________________________         
#> Group variables            None  
#> 
#> ── Variable type: factor ───────────────────────────────────────────────────────
#>   skim_variable n_missing complete_rate ordered n_unique
#> 1 Species               0             1 FALSE          3
#>   top_counts               
#> 1 set: 50, ver: 50, vir: 50
#> 
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#>   skim_variable n_missing complete_rate mean    sd  p0 p25  p50 p75 p100 hist 
#> 1 Sepal.Length          0             1 5.84 0.828 4.3 5.1 5.8  6.4  7.9 β–†β–‡β–‡β–…β–‚
#> 2 Sepal.Width           0             1 3.06 0.436 2   2.8 3    3.3  4.4 ▁▆▇▂▁
#> 3 Petal.Length          0             1 3.76 1.77  1   1.6 4.35 5.1  6.9 ▇▁▆▇▂
#> 4 Petal.Width           0             1 1.20 0.762 0.1 0.3 1.3  1.8  2.5 ▇▁▇▅▃

Correlation Analysis

library(correlation)
rez <- correlation(mtcars)

x <- cor_sort(as.matrix(rez))
layers <- visualisation_recipe(x)
plot(layers)

Rmarkdown

R-Style

  • 80 characters per line
  • 150 lines per script

Misc

  • Use sink() to record output of script
  • Use locate() instead of select(col, everything())
  • R-tips
  • txtProgressBar
    • (single=3)
  • If write files on cluster for use in next step
    • sys.sleep(3)
  • Bang Bang – How to program with dplyr
  • Fundamentals of Data Visualization, Claus O. Wilke
  • David Robinson:
    • libr
    • ggplot:
      • theme_set(theme_bw())
      • mutate(a_col = fct_reorder(a_col, by)) %>% ggplot()
      • ggplot(aes(fill = x)) + theme(legend.position = β€œnone”)
      • expand_limits(y = 0) #to include 0 on the y axis
      • scale_y_continuous(labels = scales::dollar_format())
        • scales::percent_format()
      • ggplot(label=a_column) plotly::ggplotly(ggplot_object)
  • library(Hmisc, include.only = '%nin%')
    • use %in% to make 1==NA equal FALSE and not NA
    • Also check for NAs
      • any: NA %in% vector
      • sum: vector %in% NA %>% sum