R Tips-n-Tricks
How to transfer your library when updating R
https://rstats.wtf/maintaining-r.html#how-to-transfer-your-library-when-updating-r
Useful/interesting R packages
- validate
- lemon
- naniar
gg_miss_var()
gg_miss_upset()
(instead of venn diagrams)geom_miss_point()
(jitter of missing)bind_shadow()
vis_miss()
(Credit: Iris Minichmayr)
- ggpmisc (linear regression equation)
- ggstatplot
- βfutureβ β r-package for cluster and parallell in r
- βfurrrβ β future:ize the purrr-functions
- Targets
- Renv
- Roxygen2
- Usethis
- Devtools
- latticeExtra
- tidymodels
Max Kuhn Keynote on tidymodels
Useful functions
Show ALL duplicate values
|>
df group_by(ID) |>
filter(
duplicated(COV) |
duplicated(COV, fromLast = TRUE)
)
Find first occurence of a non-NA
$col1[which.max(!is.na(col2))] df
Pipe into View(), with title
|>
df |>
filter |>
group_by View("df")
Generate a frequency table (1-, 2-, or 3-way)
::tabyl(mtcars, cyl)
janitor#> cyl n percent
#> 4 11 0.34375
#> 6 7 0.21875
#> 8 14 0.43750
A fully-featured alternative to table()
. Results are data.frames and can be formatted and enhanced with janitorβs family of adorn_
functions.
Extract R code from Rmd
::purl(
knitrinput = "my_document.Rmd",
output = "my_script.R"
)
Skim a data frame, getting useful summary statistics
skim(iris)
#> ββ Data Summary ββββββββββββββββββββββββ
#> Values
#> Name iris
#> Number of rows 150
#> Number of columns 5
#> _______________________
#> Column type frequency:
#> factor 1
#> numeric 4
#> ________________________
#> Group variables None
#>
#> ββ Variable type: factor βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> skim_variable n_missing complete_rate ordered n_unique
#> 1 Species 0 1 FALSE 3
#> top_counts
#> 1 set: 50, ver: 50, vir: 50
#>
#> ββ Variable type: numeric ββββββββββββββββββββββββββββββββββββββββββββββββββββββ
#> skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
#> 1 Sepal.Length 0 1 5.84 0.828 4.3 5.1 5.8 6.4 7.9 ββββ
β
#> 2 Sepal.Width 0 1 3.06 0.436 2 2.8 3 3.3 4.4 βββββ
#> 3 Petal.Length 0 1 3.76 1.77 1 1.6 4.35 5.1 6.9 βββββ
#> 4 Petal.Width 0 1 1.20 0.762 0.1 0.3 1.3 1.8 2.5 ββββ
β
Correlation Analysis
library(correlation)
<- correlation(mtcars)
rez
<- cor_sort(as.matrix(rez))
x <- visualisation_recipe(x)
layers plot(layers)
Rmarkdown
R-Style
- 80 characters per line
- 150 lines per script
Misc
- Use
sink()
to record output of script - Use
locate()
instead ofselect(col, everything())
- R-tips
- txtProgressBar
- (single=3)
- If write files on cluster for use in next step
sys.sleep(3)
- Bang Bang β How to program with dplyr
- Fundamentals of Data Visualization, Claus O. Wilke
- David Robinson:
- libr
- ggplot:
theme_set(theme_bw())
mutate(a_col = fct_reorder(a_col, by)) %>% ggplot()
ggplot(aes(fill = x)) + theme(legend.position = βnoneβ)
expand_limits(y = 0) #to include 0 on the y axis
scale_y_continuous(labels = scales::dollar_format())
scales::percent_format()
ggplot(label=a_column)
plotly::ggplotly(ggplot_object)
library(Hmisc, include.only = '%nin%')
- use
%in%
to make1==NA
equalFALSE
and notNA
- Also check for NAs
- any:
NA %in% vector
- sum:
vector %in% NA %>% sum
- any:
- use