NONMEM Tips-n-Tricks
Introduction
NONMEM is a computer program. The name stands for “NONlinear Mixed Effects Modeling.”
Data review
Do a thorough exploratory data analysis before starting to model.
- Tab-separated (less likely to be present in the actual data)
- Start every dataset with a “Comment” column, the
IGNORE=@
will then ignore a row if there is a text comment- The second column should be a
REF
column with each line number
- The second column should be a
Sparse vs Rich data
The “richness” of data has different meanings.
It can relate to ETAs and RSE/RUV. It can also relate to where on the PK-curve samples are taken. If samples are taken on in the absorption-phase (pre-Cmax), around Cmax, and in the elimination phase, and Ctrough (pre-dose), then it can be considered “rich” sampling.
- PK Studies:
- “Intensive studies” (Phase 1/2)
- Few subjects (homogeneous)
- Single dose
- Frequent (rich) sampling
- “Population studies” (Phase 3/4)
- Many subjects (heterogeneous)
- Multiple dose
- Sparse sampling
- “Intensive studies” (Phase 1/2)
Modify the NONMEM-dataset, inside a model file
Let’s say TIME
is all -99, and we want to modify it.
$INPUT ID TIME NTIME TAFD DV TYPE MDV EVID
AMT CMT DOSE AGE SEX HT WT BMI BSA
BLOQ COMMENT=DROP
$DATA data.csv IGNORE=@
$SUBROUTINE ADVAN=13 TOL=6
$MODEL
COMP=(CENTRAL,DEFDOSE,DEFOBS)
COMP=PERIPHERAL
$INFN
1 IF (ICALL == 1) THEN
2 DOWHILE(DATA)
3 IF (NEWIND < 2) START_TIME = TAFD
TIME = TAFD - START_TIME
ENDDO
ENDIF
$PK
...
- 1
-
ICALL==1
, the routine has been called for initialization at the beginning of a NONMEM problem, one such call per problem - 2
-
DOWHILE
loop over entire dataset - 3
- Modifying statements here. They will be considered as “original dataset” by NONMEM.
Modeling: Writing the control stream
One hint on control files: never write one from scratch. Instead, find a control file that does something similar to what you want to do, and modify it as necessary.
$ERROR
has almost nothing to do with error. It is used to calculate the result. It should really be called $RESULT
instead.
General coding
- When you use
F
,CMT
matters (Credit: Maria Kjellsson) ETADER=3
: Get out of a local minima (Credit: Rikard Nordgren)- Use
$COVARIANCE MATRIX=R
for consistency between statistical software. - Use
EXIT(1)
instead ofp=0 /+1E-15
(Credit: Gunnar Yngman) - Always code with MU-modeling if possible.
- Always add an ETA, even if you don’t intend to use it; then fix it to 0. This allows for easy estimation of the variability in case you need it, without changing the code.
- Avoid temporary variables in the
$DES
-block. The more code you have there, the slower your model will run. - Proportional RUV is usually only needed if your measurement covers more than 3 orders of magnitude.
- PD biomarkers usually only cover 1–2 orders of magnitude, so only additive RUV is needed.
Time-to-event models
- Non-parametric analysis: Log-rank test
- Semi-parametric analysis: Cox PH (does not assume a distribution)
- Parametric: Weibull, Gompertz, etc (fit a distribution)
Weibull*COV forces proportional hazard (Credit: Andrew Hooker)
TTE simulations can be done either with a dataset containing all possible event times, or using
MTIME
(Credit: Joakim Nyberg)logistic=expit=inverse logit
- logit≈probit(scaled)
Estimation methods/algorithms
In general, always start with FOCE, then turn to EM. For example, if LAPLACE causes issues, go to EM. When building a relatively complex PKPD model (e.g., 47 parameters and 11 differential equations), there can be issues using FOCE so an EM-method is warranted. See Bach et al. (2021) for a case study of FOCE vs EM-methods.
Classical methods
Classical methods want to be “statistically fair” to all possibilities.
Robustness
SADDLE_RESET=1
MCETA
(most important during first iterations)- If
MCETA
ends up in local minima, increase value, or:
- If
Reproducibility
RANMETHOD=4P
SEED=123456789
First-order conditional estimation (FOCE)
Always use the FAST
option, tehre are very few cases where FAST
is suboptimal.
The FAST
option uses MU-modeling to be effective.
EM-methods
Reproducibility
RANMETHOD=3S2P
SEED=123456789
For the stochastic methods (IMP, IMPMAP, SAEM and BAYES), distribution of subjects to different nodes for computations is dependent on number of processors. This leads to differences in sequences of random numbers generated during computations that could make the results dependent on the number of processors.
Moreover, with the parallel processing option PARSE_TYPE=4
, the results of the stochastic methods may not be exactly reproducible even when repeated on the same number of processors, because the distribution of the subjects to each processor may depend on the computer load. The option PARSE_TYPE=1
(equal number of subjects at each node) guarantees the results to be independent of the computer load. Still, the results will depend on the number of used processors.
Copy this file: /opt/local64/nonmem/nm_7.4.4_g510/run/mpilinux8.pnm
Into my_own_parafile.pnm
and put it somewhere resonable. Then change PARSE_TYPE
to 1:
$GENERAL
; [nodes] is a User defined variable.
NODES=[nodes] PARSE_TYPE=1
When running models, use your own parafile: -parafile=my_own_parafile.pnm
Iterative two-stage (ITS)
Importance sampling (IMP)
RANMETHOD=S2
can use fewer IMSAMPLEs for IMPMETHOD=IMPMAP
is equivalent toMETHOD=IMP MAPITER=1 MAPINTER=1
CINTERVAL=1
is fine for IMP
If eff.
divided by Smpl.
is not approximately equal to IACCEPT
, then tweak something.
Fit.
tells you how Gaussian the profile is. If Fit.
is approximately 1, you might as well use FOCE. If Fit.
is more like 0.6, then IMP is a good choice.
It is not unusual to see this large Monte Carlo noise in the OFV estimate. It indicates that the number of samples (ISAMPLE
) may not be high enough.
When using the IMP method, it is adviced to include two sequential $ESTIMATION
records. The first record will perform optimization of parameter estimates until a global minimum is found. The second record will then take those parameter estimates and calculate more precise estimates of the objective function value. The second $ESTIMATION
command will have a higher ISAMPLE
to reduce the Monte Carlo noise, and have EONLY=1
(no optimization of population parameters). IMP EONLY=1
needs a few iterations to stabilize because of internal processes
In this case, perform another run with the parameter values set to their final estimates, and with:
$ESTIMATION METHOD=IMP
INTERACTION
LAPLACE
ISAMPLE=10000 ; Number of samples
NITER=5 ; Number of iterations
SIG=3
PRINT=1
SIGL=6
EONLY=1 ; Perform only the expectation step
NOHABORT
RANMETHOD=3S2
The higher number of samples should give a more stable result (although the run time of each iteration will increase significantly). Taking the average OFV of these 5 iterations will give a more accurate estimation of the final OFV.
(Credit: Jon Moss)
Do not compare IMP and FOCE OFVs
SAEM
- For binary data
- SAEM is like a search light, dose not need as large of a starting density (“umbrella”) as IMP.
- CINTERVAL: ~10–100
http://monolix.lixoft.com/tasks/population-parameter-estimation-using-saem/
Random ETA samples:
When used in nonmem 7.5, a recommended example/setup would be (as shown on page 152 of nm750.pdf)
$SIZES ISAMPLEMAX=250
...
$ESTIMATION METHOD=SAEM
NBURN=500
NITER=1000
MASSRESET=1
ISAMPLE=2
PRINT=20
NOPRIOR=1
RANMETHOD=P
$ESTIMATION METHOD=SAEM
NBURN=0
NITER=0
MASSRESET=0
ETASAMPLES=1
ISAMPLE=200
EONLY=1
$ESTIMATION METHOD=IMP
NITER=5
MASSRESET=1
ETASAMPLES=0
ISAMPLE=1000
EONLY=1
PRINT=1
MAPITER=0
(Credit: Robert Bauer)
Running NONMEM
Grok NONMEM
.ext
means “extra”
.cnv
means “convergence”, contains convergence-testing statistics.
.phm
means “.phi-file for mixture model”
.ets
-file gets “random sampled ETAs” (less prone to shrinkage)
.phi
contains individual parameters: φi=μi+ηi, φi=(φ1,…,φn)
phc=Var(phi)
non_MU_ref parameters: φi=ηi
NONMEM places the low bound equal to 1/100 (or 1/1000) of the initial value even if it is set to zero in the code. You may rerun with lower initial value or put
NOTHETABOUNDTEST
to the estimation record. (Credit: Leonid Gibiansky)
Faster parallel NONMEM
--threads-per-core=1
: Do not hyperthread.
FILE07–39
should be size 0; in fact, FILE*
should be size 0. Therefore, use --maxlim=3
: Do not use diskspace (slow).
Interactive Control of NONMEM runs
Find the directory where NONMEM is running (NM_run1/
in PsN) and manually create an empty file with a different name according to the desired action.
Here’s a list:
Action | Description | Filename |
---|---|---|
Print toggle | Monitor estimation progress | print.sig |
Paraprint toggle | Monitor parallel processing traffic | paraprint.sig |
Next | Move on to next estimation mode or next estimation. NONMEM grinds a couple of iterations more and then it terminates nicely as if the maximum number of function evaluations had been reached. | next.sig |
Stop | End the present run cleanly | stop.sig |
Subject print toggle | subject.sig |
Interpreting the results, model validation
The “best” model isn’t necessarily the one with the lowest objective function.
QA
ETA ShrinkageSD < 20% ? (then we can use EBE based diagnostics)
EPS ShrinkageSD < 20% ? (then we can look at IPRED vs DV)
Condition number (CN) –
$COVARIANCE PRINT=E
- Calculated differently in different software
- In PsN it is calculated as dividing the largest eigenvalue with the smallest
- Gabrielsson and Wiener calculates it as log(largest/smallest)
- Different guidelines for ill-conditioning
- If CN < 10p, where p = “number of estimable parameters”, then it’s good
- There are also references that point to CN < 106
- < 1000 (seems to relate more to linear models, or PK-models with 3 parameters (CL, V, KA))
- Calculated differently in different software
NRD = Number of Required Digits
- Don’t directly associate OFVs with p-values (Wählby et al. 2001)
- Regard a drop in OFV with ~10 (Byon et al. 2013)
- boxcox ETA: dOFV > -10.827 (p < 0.001)
Bias
Driving individuals
Compare iOFV values in the .phi
-file between runs.
- Sharkplot (ΔOFV (= OFVfull - OFVreduced) vs #subjectsremoved)
- Should be able to remove 5 subjects without loosing significance.
Outlying observations
Do a sensitivity analysis for outliers (|CWRES| < 5).
- Re-estimate model with the outliers removed
- Parameter estimates change -> Remove outliers
- Parameter estimates doesn’t change -> Keep outliers
Parameter uncertainty
- RSE% considered precise: < 30% for THETAs, < 50% for ETAs
- SIR (PsN vs NONMEM)
VPCs
The VPCs central metric are the prediction of data percentiles. If you focus on the difference between e.g. the 5th and 95th percentile based on the simulated data you will have a prediction interval, like Bill states. If you focus on an individual percentile, but consider the imprecision with which it is derived, often given as a shaded area, then it is like other metrics of imprecision a confidence interval. Confidence interval (ci) and prediction interval (pi) for VPC.
- Mixture prior slide 31. Prior+flat prior “what if we are wrong”. When conflict: up-weights data, downplays history
- Two-stage approach: Apply a model to each individuals data
- Require much data per individual.
- OK mean parameters (i.e. typical individual)
- But IIV id inflated (also diff ID -> diff model)
- NONMEM: Apply a model to all individuals data
Troubleshooting
EXIT OCCURS DURING SEARCH FOR ETA
0PRED EXIT CODE = 1
0INDIVIDUAL NO. 4 ID= 4.00000000000000E+00 (WITHIN-INDIVIDUAL) DATA
REC NO. 2
THETA=
1.00E-03 5.00E-01 1.00E+01 1.00E+01 1.00E-01 1.00E+01 1.00E+00
OCCURS DURING SEARCH FOR ETA AT INITIAL VALUE, ETA=0
F OR DERIVATIVE RETURNED BY PRED IS INFINITE (INF) OR NOT A NUMBER (NAN).
Elapsed estimation time in seconds: 0.00
This is a generic error message, nothing informative except indicates that there are some numerical problems. Often this is related to the outliers (Section 6.1.2), so it helps to clean the data.
Things to try:
- Change ADVAN (6 to 13, if this is differential equations model).
- Change
TOL
in$SUBROUTINES
record. - Change
SIGL
andNSIG
in$ESTIMATION
. - Change initial estimates
- To something else within a realistic range.
- To the values of the previous model.
- If this is a complex model, try to start with something simpler, like linear, and use the output as initial estimates, and also look for outliers.
- Increase initial estimate of RUV.
- Sometimes
MCETA=1000
in$ESTIMATION
records helps with this message (Section 4.1.1). - Add IIV on RUV.
- Use “Disease modifying” instead of “protective” (Credit: Mats Karlsson)
- Time varying covariate: Specify as “regressor”.