Reproducible Research

Published

June 19, 2025

Under construction

Reproducibility is defined as obtaining consistent results using the same data and code as the original study (synonymous with computational reproducibility).
Replicability means obtaining consistent results across studies aimed at answering the same scientific question using new data or other new computational methods.

Reproducibility and Replicability in Research

Use version control (git or svn)

Do not track model development in git, it is too messy, which messes with the git history.
- Use rsync if needed
- Track the Rmd-file for the report
- This tracks the models. The models are still in the “messy” folder.
  - base_model <- run25.mod
  - covariate_model <- run63.mod
  - final_model <- run67.mod
  - simulation_model <- run68.mod
- Runrecord
  - runno
  - based on
  - OFV
  - dOFV
  - Condition number (CN)
Do not track produced PDFs in git

File/folder naming

Name files/folders using only A-Z, a-z, 0-9, -, _.
- Start folder names with a number for sorting purposes.
In general, use kebab-case for naming (easier to read than snake_case).
- If there are multiple parts to a name (e.g., a description, a date, and an author), use snake_case to separate between parts, and kebab-case within the parts (e.g., descriptive-name_2025-01-08_viktor-rognas.ext)

Folder structure:

project/
  - README.md       # Project description
  - input/
    - data/         # All input data files
      - raw_data/   # Untouched original data files
        - raw_data.csv
      - dat1.csv
      - dat2.csv
  - R/              # R-scripts
      - dat1.R
      - dat2.R
  - NONMEM/
    - model/        # Model files
      - pk/
        - run001.mod
      - pd/
        - run002.mod
  - output/         # Results
    - report/
      - 1a/
        - .tex
        - .pdf
      - 1b/
        - .tex
        - .pdf
      - 1/
        - .tex
        - .pdf
    - presentation/ # Communication
      - slides.pptx

Coding: language agnostic

Function naming

Strive to use verbs for function names: to, add, remove, do, get, make, take, find, use, call, try, have, has, give, ask, go, put, let, help, move, turn, run, hold, write, read, include, set, change, watch, stop, start, create, open, close, save, build, wait, require, kill, pull, push, pass, stay, etc…

Use (lower) camelCase for self-defined functions that are not to be exported outside your project.

Class names on the other hand should use Pascal Casing.

This is to make it a clear distiction between self-defined and imported functions.

# Class

Parameter <- R6Class("Parameter", ....)

# Variable

parameterToDelete <- ...

# Method and function

performSimulation <- function (...)

Don’t hesitate to choose lengthy names for test functions.

Unlike regular functions, long names are less problematic for test functions because

They are not visible or accessible to the users
They are not called repeatedly throughout the codebase

Variable naming

Variable names should be nouns.

True constant variables should use ALL_CAPS Casing.

# Constant variables

DEFAULT_PERCENTILE <- 0.5

Names for Boolean variables or functions should make clear what true and false mean. This can be done using prefixes (is, has, can, etc)

# not great
if (child) {
  if (parentSupervision) {
    watchHorrorMovie <- TRUE
  }
}

# better
if (isChild) {
  if (hasParentSupervision) {
    canWatchHorrorMovie <- TRUE
  }
}

Use positive terms for Booleans since they are easier to process.

# double negation - difficult
is_firewall_disabled <- FALSE

# better
is_firewall_enabled <- TRUE

Coding: language specific

R

Script all plots.
Quarto-scripted report.
- R.version
- rstudioapi::versionInfo()
- .packages()
- devtools::session_info(pkgs = "attached")

NONMEM

When using Monte-Carlo estimation methods (e.g., SAEM, IMP, or FOCE MCETA)), always specify the SEED option and RANMETHOD=P. Also, it is recommended to specify the RANMETHOD option accordingy: * For SAEM and IMP: RANMETHOD=3S2P * For MCETA:RANMETHOD=4P($SIMULATION` uses this method by default)