Reproducible Research
CautionUnder construction
- Reproducibility is defined as obtaining consistent results using the same data and code as the original study (synonymous with computational reproducibility).
- Replicability means obtaining consistent results across studies aimed at answering the same scientific question using new data or other new computational methods.
Reproducibility and Replicability in Research
Use version control (git or svn)
- Do not track model development in git, it is too messy, which messes with the git history.
- Use
rsync
if needed - Track the Rmd-file for the report
- This tracks the models. The models are still in the “messy” folder.
- base_model <- run25.mod
- covariate_model <- run63.mod
- final_model <- run67.mod
- simulation_model <- run68.mod
- Runrecord
- runno
- based on
- OFV
- dOFV
- Condition number (CN)
- Use
- Do not track produced PDFs in git
When and how to commit
Commit often and in small, contained chunks.
The seven rules of a great Git commit message1
- Separate subject from body with a blank line
- Limit the subject line to 50 characters
- Capitalize the subject line
- Do not end the subject line with a period
- Use the imperative mood in the subject line. Git itself uses the imperative whenever it creates a commit on your behalf. A properly formed Git commit subject line should always be able to complete the following sentence: If applied, this commit will your subject line here
- Wrap the body at 72 characters
- Use the body to explain what and why vs. how
File/folder naming
- Name files/folders using only A-Z, a-z, 0-9, -, _.
- Start folder names with a number for sorting purposes.
- In general, use kebab-case for naming (easier to read than snake_case).
- If there are multiple parts to a name (e.g., a description, a date, and an author), use snake_case to separate between parts, and kebab-case within the parts (e.g.,
descriptive-name_2025-01-08_viktor-rognas.ext
)
- If there are multiple parts to a name (e.g., a description, a date, and an author), use snake_case to separate between parts, and kebab-case within the parts (e.g.,
Folder structure:
project/
- README.md # Project description
- input/
- data/ # All input data files
- raw_data/ # Untouched original data files
- raw_data.csv
- dat1.csv
- dat2.csv
- R/ # R-scripts
- dat1.R
- dat2.R
- NONMEM/
- model/ # Model files
- pk/
- run001.mod
- pd/
- run002.mod
- output/ # Results
- report/
- 1a/
- .tex
- .pdf
- 1b/
- .tex
- .pdf
- 1/
- .tex
- .pdf
- presentation/ # Communication
- slides.pptx
Coding: language specific
R
- Script all plots.
- Quarto-scripted report.
R.version
rstudioapi::versionInfo()
.packages()
devtools::session_info(pkgs = "attached")
NONMEM
When using Monte-Carlo estimation methods (e.g., SAEM
, IMP
, or FOCE MCETA)
), always specify the SEED
option and RANMETHOD=P
. Also, it is recommended to specify the RANMETHOD
option accordingy: * For SAEM
and IMP
: RANMETHOD=3S2P
* For MCETA:
RANMETHOD=4P(
$SIMULATION` uses this method by default)
Footnotes
https://cbea.ms/git-commit/↩︎