R Tips-n-Tricks

How to transfer your library when updating R

https://rstats.wtf/maintaining-r.html#how-to-transfer-your-library-when-updating-r

Useful/interesting R packages

Modeling

nonmem2mrgsolve
tidymodels
- A Gentle Introduction to tidymodels

Max Kuhn Keynote on tidymodels

Plotting

validate
lemon
naniar
- gg_miss_var()
- gg_miss_upset() (instead of venn diagrams)
- geom_miss_point() (jitter of missing)
- bind_shadow()
- vis_miss() (Credit: Iris Minichmayr)
ggpmisc
- Linear regression equation
ggstatplot
latticeExtra
- Manipulate Xpose4 plots

Development

Targets
Renv
Roxygen2
Usethis
Devtools
”future” — r-package for cluster and parallell in r
”furrr” — future:ize the purrr-functions

Useful functions

Show ALL duplicate values

df |>
group_by(ID) |>
filter(
  duplicated(COV) |
  duplicated(COV, fromLast = TRUE)
)

Find first occurence of a non-NA

df$col1[which.max(!is.na(col2))]

Pipe into View(), with title

df |>
  filter |>
  group_by |>
  View("df")

Generate a frequency table (1-, 2-, or 3-way)

janitor::tabyl(mtcars, cyl)
#>  cyl  n percent
#>    4 11 0.34375
#>    6  7 0.21875
#>    8 14 0.43750

A fully-featured alternative to table(). Results are data.frames and can be formatted and enhanced with janitor’s family of adorn_ functions.

Extract R code from Rmd

knitr::purl(
    input = "my_document.Rmd",
    output = "my_script.R"
)

Skim a data frame, getting useful summary statistics

skim(iris)
#> ── Data Summary ────────────────────────
#>                            Values
#> Name                       iris  
#> Number of rows             150   
#> Number of columns          5     
#> _______________________          
#> Column type frequency:           
#>   factor                   1     
#>   numeric                  4     
#> ________________________         
#> Group variables            None  
#> 
#> ── Variable type: factor ───────────────────────────────────────────────────────
#>   skim_variable n_missing complete_rate ordered n_unique
#> 1 Species               0             1 FALSE          3
#>   top_counts               
#> 1 set: 50, ver: 50, vir: 50
#> 
#> ── Variable type: numeric ──────────────────────────────────────────────────────
#>   skim_variable n_missing complete_rate mean    sd  p0 p25  p50 p75 p100 hist 
#> 1 Sepal.Length          0             1 5.84 0.828 4.3 5.1 5.8  6.4  7.9 ▆▇▇▅▂
#> 2 Sepal.Width           0             1 3.06 0.436 2   2.8 3    3.3  4.4 ▁▆▇▂▁
#> 3 Petal.Length          0             1 3.76 1.77  1   1.6 4.35 5.1  6.9 ▇▁▆▇▂
#> 4 Petal.Width           0             1 1.20 0.762 0.1 0.3 1.3  1.8  2.5 ▇▁▇▅▃

Correlation Analysis

library(correlation)
rez <- correlation(mtcars)

x <- cor_sort(as.matrix(rez))
layers <- visualisation_recipe(x)
plot(layers)

Rmarkdown

ref.label=c()
- (https://bookdown.org/yihui/rmarkdown-cookbook/reuse-chunks.html#ref-label

R-Style

80 characters per line
150 lines per script

Misc

Use sink() to record output of script
Use locate() instead of select(col, everything())
R-tips
- Paul van der Laken
- Benjamin Guiastrennec
txtProgressBar
- (single=3)
If write files on cluster for use in next step
- sys.sleep(3)
Bang Bang – How to program with dplyr
- Markus Berroth: How to program with dplyr (2019)
Fundamentals of Data Visualization, Claus O. Wilke
- serialmentor.com
David Robinson:
- libr
- ggplot:
  - theme_set(theme_bw())
  - mutate(a_col = fct_reorder(a_col, by)) %>% ggplot()
  - ggplot(aes(fill = x)) + theme(legend.position = “none”)
  - expand_limits(y = 0) #to include 0 on the y axis
  - scale_y_continuous(labels = scales::dollar_format())
    - scales::percent_format()
  - ggplot(label=a_column) plotly::ggplotly(ggplot_object)
library(Hmisc, include.only = '%nin%')
- use %in% to make 1==NA equal FALSE and not NA
- Also check for NAs
  - any: NA %in% vector
  - sum: vector %in% NA %>% sum