NONMEM Tips-n-Tricks

log-transform both sides: cognigen.com/nonmem/nm/99apr232002.html

Dataset

Tab-separated (less likely to be present in the actual data)
Start every dataset with a “Comment” column, the IGNORE=@ will then ignore a row if there is a text comment!
- The second column should be a REF column with each line number

Sparse vs Rich data

“Richness” of data has different meaning. It can relate to ETAs and RSE/RUV. It can also relate to where on the PK-curve samples are taken. If samples are taken on in the absorption-phase (pre-C_max), around C_max, and in the elimination phase, and C_trough (pre-dose), then it can be considered “rich” sampling.

PK Studies:
- “Intensive studies” (Phase 1/2)
  - Few subjects (homogeneous)
  - Single dose
  - Frequent (rich) sampling
- “Population studies” (Phase 3/4)
  - Many subjects (heterogeneous)
  - Multiple dose
  - Sparse sampling

Modeling

Coding

When you use F, CMT matters (Credit: Maria Kjellsson)
ETADER=3: Get out of a local minima (Credit: Rikard Nordgren)
Use $COVARIANCE MATRIX=R for consistency between statistical software.
Use EXIT(1) instead of p=0 /+1E-15 (Credit: Gunnar Yngman)
Always code with MU-modeling if possible.
Always add an ETA, even if you don’t intend to use it; then fix it to 0. This allows for easy estimation of the variability in case you need it, without changing the code.
Avoid temporary variables in the $DES-block. The more code you have there, the slower your model will run.
Proportional RUV is usually only needed if your measurement covers more than 3 orders of magnitude.
- PD biomarkers usually only cover 1–2 orders of magnitude, so only additive RUV is needed.

Time-to-event

Non-parametric analysis: Log-rank test
Semi-parametric analysis: Cox PH (does not assume a distribution)
Parametric: Weibull, Gompertz, etc (fit a distribution)

Weibull*COV forces proportional hazard (Credit: Andrew Hooker)

TTE simulations can be done either with a dataset containing all possible event times, or using MTIME (Credit: Joakim Nyberg)
logistic=expit=inverse logit
- logit≈probit(scaled)

Estimation

In general, always start with FOCE, then turn to EM. For example, if LAPLACE causes issues, go to EM. When building a relatively complex PKPD model (e.g. 47 parameters and 11 differential equations), there can be issues using FOCE so an EM-method is warranted. See Bach et al. (2021) for a case study of FOCE vs EM-methods.

Classical methods

Classical methods want to be “statistically fair” to all possibilities.

Robustness

SADDLE_RESET=1
MCETA (most important during first iterations)
- If MCETA ends up in local minima, increase value, or:

$ESTIMATION METHOD=1 MCETA=1000 MAXEVAL=200
$ESTIMATION METHOD=1 MCETA=1    MAXEVAL=9999

Reproducibility

RANMETHOD=4P
SEED=123456789

First-order conditional estimation (FOCE)

Always use the FAST option, tehre are very few cases where FAST is suboptimal.

Note

The FAST option uses MU-modeling to be effective.

EM-methods

Reproducibility

RANMETHOD=3S2P
SEED=123456789

Iterative two-stage (ITS)

Importance sampling (IMP)

RANMETHOD=S2 can use fewer IMSAMPLEs for IMP
METHOD=IMPMAP is equivalent to METHOD=IMP MAPITER=1 MAPINTER=1
CINTERVAL=1 is fine for IMP

Stochastic noise

$ESTIMATION METHOD=ITS
            INTERACTION
            LAPLACE
            NITER=200
            SIG=3
            PRINT=1
            SIGL=6
            NOHABORT 
            CTYPE=3
            NUMERICAL
            SLOW 

$ESTIMATION METHOD=IMPMAP
            INTERACTION
            LAPLACE
            ISAMPLE=1000
            NITER=1000
            SIG=3
            PRINT=1 
            SIGL=6
            NOHABORT
            CTYPE=3
            IACCEPT=0.4
            MAPITER=0
            RANMETHOD=3S2 

$COVARIANCE UNCONDITIONAL MATRIX=S TOL=12 SIGL=12 SLOW

The iterations for the initial ITS step seems in this case to be quite stable with some artefacts:

iteration 175 OBJ= 4693.4674554341409
iteration 176 OBJ= 4694.2296104065535
iteration 177 OBJ= 4693.7753507970829
iteration 178 OBJ= 4693.9600270372885
iteration 179 OBJ= 4693.5732455834705
iteration 180 OBJ= 4693.6386423202493
iteration 181 OBJ= 4693.6215390721527
iteration 182 OBJ= 4693.6006496138452
iteration 183 OBJ= 4693.7877620448235
iteration 184 OBJ= 4694.1591757809929
iteration 185 OBJ= 4693.2614956897451
iteration 186 OBJ= 4693.5641640401127
iteration 187 OBJ= 4693.5575289919379
iteration 188 OBJ= 4495.6489907149398
iteration 189 OBJ= 4693.7711764252363
iteration 190 OBJ= 4693.6281175153035
iteration 191 OBJ= 4694.1171774559862
iteration 192 OBJ= 4693.7908707845536
iteration 193 OBJ= 4693.7709264605819
iteration 194 OBJ= 4495.9262902940209
iteration 195 OBJ= 4693.3321354894242
iteration 196 OBJ= 4694.3177205227348
iteration 197 OBJ= 4694.1301486616576
iteration 198 OBJ= 4694.2898587322170
iteration 199 OBJ= 4693.8304358341920
iteration 200 OBJ= 4691.6818293505230

#TERM: 
OPTIMIZATION WAS NOT COMPLETED

The IMP step can be less stable:

iteration 120 OBJ= 4314.8310660241377 eff.= 446. Smpl.= 1000. Fit.= 0.96389
iteration 121 OBJ= 4326.9079856676717 eff.= 448. Smpl.= 1000. Fit.= 0.96409
iteration 122 OBJ= 4164.6649529423103 eff.= 479. Smpl.= 1000. Fit.= 0.96392
iteration 123 OBJ= 4299.9887619753636 eff.= 432. Smpl.= 1000. Fit.= 0.96395
iteration 124 OBJ= 4303.9571213327054 eff.= 399. Smpl.= 1000. Fit.= 0.96349
iteration 125 OBJ= 4328.9835950930074 eff.= 417. Smpl.= 1000. Fit.= 0.96423
iteration 126 OBJ= 4304.3861595488252 eff.= 550. Smpl.= 1000. Fit.= 0.96392
iteration 127 OBJ= 4291.0862736663648 eff.= 422. Smpl.= 1000. Fit.= 0.96430
iteration 128 OBJ= 4326.2378678645500 eff.= 407. Smpl.= 1000. Fit.= 0.96409
iteration 129 OBJ= 4157.5352046539456 eff.= 406. Smpl.= 1000. Fit.= 0.96404
iteration 130 OBJ= 4332.6894073732456 eff.= 399. Smpl.= 1000. Fit.= 0.96399
iteration 131 OBJ= 4357.5343346793761 eff.= 493. Smpl.= 1000. Fit.= 0.96414

Convergence achieved 

iteration 131 OBJ= 4336.1893012015007 eff.= 417. Smpl.= 1000. Fit.= 0.96369 

#TERM: 
OPTIMIZATION WAS COMPLETED

Note

If eff. divided by Smpl. is not approximately equal to IACCEPT, then tweak something.

Fit. tells you how Gaussian the profile is. If Fit. is approximately 1, you might as well use FOCE. If Fit. is more like 0.6, then IMP is a good choice.

It is not unusual to see this large Monte Carlo noise in the OFV estimate. It indicates that the number of samples (ISAMPLE) may not be high enough.

Tip

When using the IMP method, it is adviced to include two sequential $ESTIMATION records. The first record will perform optimization of parameter estimates until a global minimum is found. The second record will then take those parameter estimates and calculate more precise estimates of the objective function value. The second $ESTIMATION command will have a higher ISAMPLE to reduce the Monte Carlo noise, and have EONLY=1 (no optimization of population parameters). IMP EONLY=1 needs a few iterations to stabilize because of internal processes

In this case, perform another run with the parameter values set to their final estimates, and with:

$ESTIMATION METHOD=IMP
            INTERACTION
            LAPLACE
            ISAMPLE=10000 ; Number of samples
            NITER=5 ; Number of iterations
            SIG=3
            PRINT=1 
            SIGL=6
            EONLY=1 ; Perform only the expectation step
            NOHABORT
            RANMETHOD=3S2

The higher number of samples should give a more stable result (although the run time of each iteration will increase significantly). Taking the average OFV of these 5 iterations will give a more accurate estimation of the final OFV.

(Credit: Jon Moss)

Caution

Do not compare IMP and FOCE OFVs

SAEM

For binary data
SAEM is like a search light, dose not need as large of a starting density (“umbrella”) as IMP.
CINTERVAL: ~10–100

http://monolix.lixoft.com/tasks/population-parameter-estimation-using-saem/

Software-specifics

Grok NONMEM

.ext means “extra”
.cnv means “convergence”, contains convergence-testing statistics.
.phm means “.phi-file for mixture model”
.ets-file gets “random sampled ETAs” (less prone to shrinkage)
.phi contains individual parameters: φi=μi+ηi, φi=(φ1,…,φn)

phc=Var(phi)
non_MU_ref parameters: φi=ηi
NONMEM places the low bound equal to 1/100 (or 1/1000) of the initial value even if it is set to zero in the code. You may rerun with lower initial value or put NOTHETABOUNDTEST to the estimation record. (Credit: Leonid Gibiansky)

Faster parallel NONMEM:

--threads-per-core=1: Do not hyperthread.
FILE07–39 should be size 0; in fact, FILE* should be size 0. Therefore, use --maxlim=3: Do not use diskspace (slow).

execute -nmfe_options="[sbatchargs=--threads-per-core=1] -maxlim=3  -parafprint=100"

Model Evaluation

QA
ETA ShrinkageSD < 20% ? (then we can use EBE based diagnostics)
EPS ShrinkageSD < 20% ? (then we can look at IPRED vs DV)
Condition number (CN) – $COVARIANCE PRINT=E
- Calculated differently in different software
  - In PsN it is calculated as dividing the largest eigenvalue with the smallest
  - Gabrielsson and Wiener calculates it as log(largest/smallest)
- Different guidelines for ill-conditioning
  - If CN < 10p, where p = “number of estimable parameters”, then it’s good
  - There are also references that point to CN < 106
  - < 1000 (seems to relate more to linear models, or PK-models with 3 parameters (CL, V, KA))
NRD = Number of Required Digits

Significant improvement

Don’t directly associate OFVs with p-values (Wählby et al. 2001)
- Regard a drop in OFV with ~10 (Byon et al. 2013)
boxcox ETA: dOFV > -10.827 (p < 0.001)

Bias

Driving individuals

Compare iOFV values in the .phi-file between runs.

Sharkplot (ΔOFV (= OFV_full - OFV_reduced) vs #subjects_removed)
Should be able to remove 5 subjects without loosing significance.

Outlying observations

Do a sensitivity analysis for outliers (|CWRES| < 5).

Re-estimate model with the outliers removed
Parameter estimates change -> Remove outliers
Parameter estimates doesn’t change -> Keep outliers

Parameter uncertainty

RSE% considered precise: < 30% for THETAs, < 50% for ETAs
SIR (PsN vs NONMEM)

VPCs

The VPCs central metric are the prediction of data percentiles. If you focus on the difference between e.g. the 5th and 95th percentile based on the simulated data you will have a prediction interval, like Bill states. If you focus on an individual percentile, but consider the imprecision with which it is derived, often given as a shaded area, then it is like other metrics of imprecision a confidence interval. Confidence interval (ci) and prediction interval (pi) for VPC.

Notes

Mixture prior slide 31. Prior+flat prior “what if we are wrong”. When conflict: up-weights data, downplays history
Two-stage approach: Apply a model to each individuals data
- Require much data per individual.
- OK mean parameters (i.e. typical individual)
- But IIV id inflated (also diff ID -> diff model)
NONMEM: Apply a model to all individuals data

Interactive Control of NONMEM runs

Find the directory where NONMEM is running (NM_run1/ in PsN) and manually create an empty file with a different name according to the desired action.

touch file.sig

Here’s a list:

Action	Description	Filename
Print toggle	Monitor estimation progress	`print.sig`
Paraprint toggle	Monitor parallel processing traffic	`paraprint.sig`
Next	Move on to next estimation mode or next estimation. NONMEM grinds a couple of iterations more and then it terminates nicely as if the maximum number of function evaluations had been reached.	`next.sig`
Stop	End the present run cleanly	`stop.sig`
Subject print toggle		`subject.sig`

https://www.mail-archive.com/nmusers@globomaxnm.com/msg03990.html

(Credit: Paolo Denti)

Random ETA samples:

When used in nonmem 7.5, a recommended example/setup would be (as shown on page 152 of nm750.pdf)

$SIZES ISAMPLEMAX=250
... 

$ESTIMATION METHOD=SAEM
            NBURN=500
            NITER=1000
            MASSRESET=1
            ISAMPLE=2
            PRINT=20
            NOPRIOR=1
            RANMETHOD=P

$ESTIMATION METHOD=SAEM
            NBURN=0
            NITER=0
            MASSRESET=0
            ETASAMPLES=1
            ISAMPLE=200
            EONLY=1 

$ESTIMATION METHOD=IMP
            NITER=5
            MASSRESET=1
            ETASAMPLES=0
            ISAMPLE=1000
            EONLY=1
            PRINT=1
            MAPITER=0

(Credit: Robert Bauer)

University of Maryland

https://ctm.umaryland.edu/#/ms-pharma

Error messages

EXIT OCCURS DURING SEARCH FOR ETA

0PRED EXIT CODE = 1
0INDIVIDUAL NO.       4   ID= 4.00000000000000E+00   (WITHIN-INDIVIDUAL) DATA 
REC NO.   2
THETA=
  1.00E-03   5.00E-01   1.00E+01   1.00E+01   1.00E-01   1.00E+01   1.00E+00
OCCURS DURING SEARCH FOR ETA AT INITIAL VALUE, ETA=0
F OR DERIVATIVE RETURNED BY PRED IS INFINITE (INF) OR NOT A NUMBER (NAN).
 Elapsed estimation time in seconds:     0.00

This is a generic error message, nothing informative except indicates that there are some numerical problems. Often this is related to the outliers (Section 4.1.2), so it helps to clean the data.

Things to try:

Change ADVAN (6 to 13, if this is differential equations model).
Change TOL in $SUBROUTINES record.
Change SIGL and NSIG in $ESTIMATION.
Change initial estimates
- To something else within a realistic range.
- To the values of the previous model.
- If this is a complex model, try to start with something simpler, like linear, and use the output as initial estimates, and also look for outliers.
- Increase initial estimate of RUV.
Sometimes MCETA=1000 in $ESTIMATION records helps with this message (Section 2.3.1.1).
Add IIV on RUV.

(Credit: Leonid Gibiansky)

Terminology

Use “Disease modifying” instead of “protective” (Credit: Mats Karlsson)
Time varying covariate: Specify as “regressor”.