NONMEM Tips-n-Tricks
log-transform both sides: cognigen.com/nonmem/nm/99apr232002.html
Dataset
- Tab-separated (less likely to be present in the actual data)
- Start every dataset with a “Comment” column, the
IGNORE=@
will then ignore a row if there is a text comment!- The second column should be a
REF
column with each line number
- The second column should be a
Sparse vs Rich data
“Richness” of data has different meaning. It can relate to ETAs and RSE/RUV. It can also relate to where on the PK-curve samples are taken. If samples are taken on in the absorption-phase (pre-Cmax), around Cmax, and in the elimination phase, and Ctrough (pre-dose), then it can be considered “rich” sampling.
- PK Studies:
- “Intensive studies” (Phase 1/2)
- Few subjects (homogeneous)
- Single dose
- Frequent (rich) sampling
- “Population studies” (Phase 3/4)
- Many subjects (heterogeneous)
- Multiple dose
- Sparse sampling
- “Intensive studies” (Phase 1/2)
Modeling
Coding
- When you use
F
,CMT
matters (Credit: Maria Kjellsson) ETADER=3
: Get out of a local minima (Credit: Rikard Nordgren)- Use
$COVARIANCE MATRIX=R
for consistency between statistical software. - Use
EXIT(1)
instead ofp=0 /+1E-15
(Credit: Gunnar Yngman) - Always code with MU-modeling if possible.
- Always add an ETA, even if you don’t intend to use it; then fix it to 0. This allows for easy estimation of the variability in case you need it, without changing the code.
- Avoid temporary variables in the
$DES
-block. The more code you have there, the slower your model will run. - Proportional RUV is usually only needed if your measurement covers more than 3 orders of magnitude.
- PD biomarkers usually only cover 1–2 orders of magnitude, so only additive RUV is needed.
Time-to-event
- Non-parametric analysis: Log-rank test
- Semi-parametric analysis: Cox PH (does not assume a distribution)
- Parametric: Weibull, Gompertz, etc (fit a distribution)
Weibull*COV forces proportional hazard (Credit: Andrew Hooker)
TTE simulations can be done either with a dataset containing all possible event times, or using
MTIME
(Credit: Joakim Nyberg)logistic=expit=inverse logit
- logit≈probit(scaled)
Estimation
In general, always start with FOCE, then turn to EM. For example, if LAPLACE causes issues, go to EM. When building a relatively complex PKPD model (e.g. 47 parameters and 11 differential equations), there can be issues using FOCE so an EM-method is warranted. See Bach et al. (2021) for a case study of FOCE vs EM-methods.
Classical methods
Classical methods want to be “statistically fair” to all possibilities.
Robustness
SADDLE_RESET=1
MCETA
(most important during first iterations)- If
MCETA
ends up in local minima, increase value, or:
- If
$ESTIMATION METHOD=1 MCETA=1000 MAXEVAL=200
$ESTIMATION METHOD=1 MCETA=1 MAXEVAL=9999
Reproducibility
RANMETHOD=4P
SEED=123456789
First-order conditional estimation (FOCE)
Always use the FAST
option, tehre are very few cases where FAST
is suboptimal.
The FAST
option uses MU-modeling to be effective.
EM-methods
Reproducibility
RANMETHOD=3S2P
SEED=123456789
Iterative two-stage (ITS)
Importance sampling (IMP)
RANMETHOD=S2
can use fewer IMSAMPLEs for IMPMETHOD=IMPMAP
is equivalent toMETHOD=IMP MAPITER=1 MAPINTER=1
CINTERVAL=1
is fine for IMP
Stochastic noise
$ESTIMATION METHOD=ITS
INTERACTION
LAPLACE=200
NITER=3
SIG=1
PRINT=6
SIGL
NOHABORT =3
CTYPE
NUMERICAL
SLOW
$ESTIMATION METHOD=IMPMAP
INTERACTION
LAPLACE=1000
ISAMPLE=1000
NITER=3
SIG=1
PRINT=6
SIGL
NOHABORT=3
CTYPE=0.4
IACCEPT=0
MAPITER=3S2
RANMETHOD
$COVARIANCE UNCONDITIONAL MATRIX=S TOL=12 SIGL=12 SLOW
The iterations for the initial ITS
step seems in this case to be quite stable with some artefacts:
175 OBJ= 4693.4674554341409
iteration 176 OBJ= 4694.2296104065535
iteration 177 OBJ= 4693.7753507970829
iteration 178 OBJ= 4693.9600270372885
iteration 179 OBJ= 4693.5732455834705
iteration 180 OBJ= 4693.6386423202493
iteration 181 OBJ= 4693.6215390721527
iteration 182 OBJ= 4693.6006496138452
iteration 183 OBJ= 4693.7877620448235
iteration 184 OBJ= 4694.1591757809929
iteration 185 OBJ= 4693.2614956897451
iteration 186 OBJ= 4693.5641640401127
iteration 187 OBJ= 4693.5575289919379
iteration 188 OBJ= 4495.6489907149398
iteration 189 OBJ= 4693.7711764252363
iteration 190 OBJ= 4693.6281175153035
iteration 191 OBJ= 4694.1171774559862
iteration 192 OBJ= 4693.7908707845536
iteration 193 OBJ= 4693.7709264605819
iteration 194 OBJ= 4495.9262902940209
iteration 195 OBJ= 4693.3321354894242
iteration 196 OBJ= 4694.3177205227348
iteration 197 OBJ= 4694.1301486616576
iteration 198 OBJ= 4694.2898587322170
iteration 199 OBJ= 4693.8304358341920
iteration 200 OBJ= 4691.6818293505230
iteration
:
#TERM OPTIMIZATION WAS NOT COMPLETED
The IMP step can be less stable:
120 OBJ= 4314.8310660241377 eff.= 446. Smpl.= 1000. Fit.= 0.96389
iteration 121 OBJ= 4326.9079856676717 eff.= 448. Smpl.= 1000. Fit.= 0.96409
iteration 122 OBJ= 4164.6649529423103 eff.= 479. Smpl.= 1000. Fit.= 0.96392
iteration 123 OBJ= 4299.9887619753636 eff.= 432. Smpl.= 1000. Fit.= 0.96395
iteration 124 OBJ= 4303.9571213327054 eff.= 399. Smpl.= 1000. Fit.= 0.96349
iteration 125 OBJ= 4328.9835950930074 eff.= 417. Smpl.= 1000. Fit.= 0.96423
iteration 126 OBJ= 4304.3861595488252 eff.= 550. Smpl.= 1000. Fit.= 0.96392
iteration 127 OBJ= 4291.0862736663648 eff.= 422. Smpl.= 1000. Fit.= 0.96430
iteration 128 OBJ= 4326.2378678645500 eff.= 407. Smpl.= 1000. Fit.= 0.96409
iteration 129 OBJ= 4157.5352046539456 eff.= 406. Smpl.= 1000. Fit.= 0.96404
iteration 130 OBJ= 4332.6894073732456 eff.= 399. Smpl.= 1000. Fit.= 0.96399
iteration 131 OBJ= 4357.5343346793761 eff.= 493. Smpl.= 1000. Fit.= 0.96414
iteration
Convergence achieved
131 OBJ= 4336.1893012015007 eff.= 417. Smpl.= 1000. Fit.= 0.96369
iteration
:
#TERM OPTIMIZATION WAS COMPLETED
If eff.
divided by Smpl.
is not approximately equal to IACCEPT
, then tweak something.
Fit.
tells you how Gaussian the profile is. If Fit.
is approximately 1, you might as well use FOCE. If Fit.
is more like 0.6, then IMP is a good choice.
It is not unusual to see this large Monte Carlo noise in the OFV estimate. It indicates that the number of samples (ISAMPLE
) may not be high enough.
When using the IMP method, it is adviced to include two sequential $ESTIMATION
records. The first record will perform optimization of parameter estimates until a global minimum is found. The second record will then take those parameter estimates and calculate more precise estimates of the objective function value. The second $ESTIMATION
command will have a higher ISAMPLE
to reduce the Monte Carlo noise, and have EONLY=1
(no optimization of population parameters). IMP EONLY=1
needs a few iterations to stabilize because of internal processes
In this case, perform another run with the parameter values set to their final estimates, and with:
$ESTIMATION METHOD=IMP
INTERACTION
LAPLACE=10000 ; Number of samples
ISAMPLE=5 ; Number of iterations
NITER=3
SIG=1
PRINT=6
SIGL=1 ; Perform only the expectation step
EONLY
NOHABORT=3S2 RANMETHOD
The higher number of samples should give a more stable result (although the run time of each iteration will increase significantly). Taking the average OFV of these 5 iterations will give a more accurate estimation of the final OFV.
(Credit: Jon Moss)
Do not compare IMP and FOCE OFVs
SAEM
- For binary data
- SAEM is like a search light, dose not need as large of a starting density (“umbrella”) as IMP.
- CINTERVAL: ~10–100
http://monolix.lixoft.com/tasks/population-parameter-estimation-using-saem/
Software-specifics
Grok NONMEM
.ext
means “extra”
.cnv
means “convergence”, contains convergence-testing statistics.
.phm
means “.phi-file for mixture model”
.ets
-file gets “random sampled ETAs” (less prone to shrinkage)
.phi
contains individual parameters: φi=μi+ηi, φi=(φ1,…,φn)
phc=Var(phi)
non_MU_ref parameters: φi=ηi
NONMEM places the low bound equal to 1/100 (or 1/1000) of the initial value even if it is set to zero in the code. You may rerun with lower initial value or put
NOTHETABOUNDTEST
to the estimation record. (Credit: Leonid Gibiansky)
Faster parallel NONMEM:
--threads-per-core=1
: Do not hyperthread.
FILE07–39
should be size 0; in fact, FILE*
should be size 0. Therefore, use --maxlim=3
: Do not use diskspace (slow).
execute -nmfe_options="[sbatchargs=--threads-per-core=1] -maxlim=3 -parafprint=100"
Model Evaluation
QA
ETA ShrinkageSD < 20% ? (then we can use EBE based diagnostics)
EPS ShrinkageSD < 20% ? (then we can look at IPRED vs DV)
Condition number (CN) –
$COVARIANCE PRINT=E
- Calculated differently in different software
- In PsN it is calculated as dividing the largest eigenvalue with the smallest
- Gabrielsson and Wiener calculates it as log(largest/smallest)
- Different guidelines for ill-conditioning
- If CN < 10p, where p = “number of estimable parameters”, then it’s good
- There are also references that point to CN < 106
- < 1000 (seems to relate more to linear models, or PK-models with 3 parameters (CL, V, KA))
- Calculated differently in different software
NRD = Number of Required Digits
- Don’t directly associate OFVs with p-values (Wählby et al. 2001)
- Regard a drop in OFV with ~10 (Byon et al. 2013)
- boxcox ETA: dOFV > -10.827 (p < 0.001)
Bias
Driving individuals
Compare iOFV values in the .phi
-file between runs.
- Sharkplot (ΔOFV (= OFVfull - OFVreduced) vs #subjectsremoved)
- Should be able to remove 5 subjects without loosing significance.
Outlying observations
Do a sensitivity analysis for outliers (|CWRES| < 5).
- Re-estimate model with the outliers removed
- Parameter estimates change -> Remove outliers
- Parameter estimates doesn’t change -> Keep outliers
Parameter uncertainty
- RSE% considered precise: < 30% for THETAs, < 50% for ETAs
- SIR (PsN vs NONMEM)
VPCs
The VPCs central metric are the prediction of data percentiles. If you focus on the difference between e.g. the 5th and 95th percentile based on the simulated data you will have a prediction interval, like Bill states. If you focus on an individual percentile, but consider the imprecision with which it is derived, often given as a shaded area, then it is like other metrics of imprecision a confidence interval. Confidence interval (ci) and prediction interval (pi) for VPC.
- Mixture prior slide 31. Prior+flat prior “what if we are wrong”. When conflict: up-weights data, downplays history
- Two-stage approach: Apply a model to each individuals data
- Require much data per individual.
- OK mean parameters (i.e. typical individual)
- But IIV id inflated (also diff ID -> diff model)
- NONMEM: Apply a model to all individuals data
Interactive Control of NONMEM runs
Find the directory where NONMEM is running (NM_run1/
in PsN) and manually create an empty file with a different name according to the desired action.
touch file.sig
Here’s a list:
Action | Description | Filename |
---|---|---|
Print toggle | Monitor estimation progress | print.sig |
Paraprint toggle | Monitor parallel processing traffic | paraprint.sig |
Next | Move on to next estimation mode or next estimation. NONMEM grinds a couple of iterations more and then it terminates nicely as if the maximum number of function evaluations had been reached. | next.sig |
Stop | End the present run cleanly | stop.sig |
Subject print toggle | subject.sig |
https://www.mail-archive.com/nmusers@globomaxnm.com/msg03990.html
(Credit: Paolo Denti)
Random ETA samples:
When used in nonmem 7.5, a recommended example/setup would be (as shown on page 152 of nm750.pdf)
$SIZES ISAMPLEMAX=250
...
$ESTIMATION METHOD=SAEM
=500
NBURN=1000
NITER=1
MASSRESET=2
ISAMPLE=20
PRINT=1
NOPRIOR=P
RANMETHOD
$ESTIMATION METHOD=SAEM
=0
NBURN=0
NITER=0
MASSRESET=1
ETASAMPLES=200
ISAMPLE=1
EONLY
$ESTIMATION METHOD=IMP
=5
NITER=1
MASSRESET=0
ETASAMPLES=1000
ISAMPLE=1
EONLY=1
PRINT=0 MAPITER
(Credit: Robert Bauer)
University of Maryland
Error messages
EXIT OCCURS DURING SEARCH FOR ETA
0PRED EXIT CODE = 1
0INDIVIDUAL NO. 4 ID= 4.00000000000000E+00 (WITHIN-INDIVIDUAL) DATA
2
REC NO. =
THETA1.00E-03 5.00E-01 1.00E+01 1.00E+01 1.00E-01 1.00E+01 1.00E+00
, ETA=0
OCCURS DURING SEARCH FOR ETA AT INITIAL VALUE(INF) OR NOT A NUMBER (NAN).
F OR DERIVATIVE RETURNED BY PRED IS INFINITE Elapsed estimation time in seconds: 0.00
This is a generic error message, nothing informative except indicates that there are some numerical problems. Often this is related to the outliers (Section 4.1.2), so it helps to clean the data.
Things to try:
- Change ADVAN (6 to 13, if this is differential equations model).
- Change
TOL
in$SUBROUTINES
record. - Change
SIGL
andNSIG
in$ESTIMATION
. - Change initial estimates
- To something else within a realistic range.
- To the values of the previous model.
- If this is a complex model, try to start with something simpler, like linear, and use the output as initial estimates, and also look for outliers.
- Increase initial estimate of RUV.
- Sometimes
MCETA=1000
in$ESTIMATION
records helps with this message (Section 2.3.1.1). - Add IIV on RUV.
- Use “Disease modifying” instead of “protective” (Credit: Mats Karlsson)
- Time varying covariate: Specify as “regressor”.