| BUGS ASSISTANCE |
David Spiegelhalter
-
Andrew Thomas
-
Nicky Best
-
Wally Gilks
MRC Biostatistics Unit, Institute of Public Health,
Robinson Way, Cambridge CB2 2SR
Tel: 44-1223-330300 Fax: 44-1223-330388
e-mail: bugs@mrc-bsu.cam.ac.uk ftp: ftp.mrc-bsu.cam.ac.uk
http://www.mrc-bsu.cam.ac.uk/bugs
BUGS 0.6 manual
This Addendum specifies additional features of BUGS 0.6, and should be read in conjunction with the current manual for BUGS 0.5 [Spiegelhalter et al.1996a] .
BUGS 0.6 may be obtained from World Wide Web page http://www.mrc-bsu.cam.ac.uk/bugs, or by anonymous ftp from ftp.mrc-bsu.cam.ac.uk in directory pub/methodology/bugs: login as anonymous and give your full e-mail address as the password. The message and README files will tell you how to obtain the program, examples and documentation.
#!/bin/sh
# runs BUGS interactively
#
case $# in
0) bugs06.sparc 32 bugs;;
*) echo 'bugs6' ;;
esac
rm bugs.buf
``32" refers to the number of bins in the metropolis algorithm (Section 2.2).
``bugs" refers to the header for filenames.
An appropriate path name should be added before ``bugs06.sparc''.
#!/bin/sh
# runs BUGS taking commands from command file
#
case $# in
0) echo 'usage backbugs command_file'
exit 1 ;;
1) bugs06.sparc 32 bugs $1;;
*) echo 'usage backbugs command_file' ;;
esac
rm bugs.buf
An appropriate path name should be added before ``bugs06.sparc''.
So to submit a series of commands from a file job.cmd, use the command backbugs job.cmd.
A command now exists to save people wasting a whole run if a crash occurs. If you type, say, checkpoint(1000), then after every 1000 iterations BUGS will
A general Metropolis-within-Gibbs routine can now be used for non-log-concave sampling. This routine uses a simple histogram- based proposal distribution [Ritter and Tanner, 1992], and therefore any parameter that requires Metropolis sampling must have bounded range, which is best set up using the I( , ) construct on the prior distribution, although this is not necessary if a uniform or beta prior distribution are assumed. This is best adapted to as narrow a range as possible to bound the posterior distribution. The default number of bins in the histogram is 32, but this can be changed within the BUGS script (see Section 1.2).
If a bounded range is not given, an error message will say that BUGS is Unable to choose update method for node.
Improved Metropolis routines are being written for future versions, and will be available in BUGS for Windows.
This problem is described in Spiegelhalter et al. (1996c) [page 4] and comprises a non-linear and non-conjugate model:
This gives a non-log-concave distribution for
. The problem was previously handled by discretizing
, and specifying equal prior probabilities for each discrete value.
The BUGS 0.6 code is shown below.
model dugongs;
const
N = 27; # number of observations
var
x[N],Y[N],mu[N],alpha,beta,gamma,tau,sigma,U1,U2,U3;
data x, Y in "dugongs.dat";
inits in "dugongs.in";
{
for (i in 1:N) {
mu[i] <- alpha - beta*pow(gamma,x[i]);
Y[i] ~ dnorm(mu[i],tau)
}
alpha ~ dnorm(0.0,1.0E-4);
beta ~ dnorm(0.0,1.0E-4);
tau ~ dgamma(1.0E-3,1.0E-3); sigma <- 1.0/sqrt(tau);
gamma ~ dunif(0.5,1.0);
# Transform alpha, beta and gamma to scale used by Carlin and Gelfand
U1 <- log(alpha);
U2 <- log(beta);
U3 <- logit(gamma);
}
We note that
has been given a bounded domain by using a uniform
prior distribution. BUGS 0.6 detects that the Metropolis sampler is required and reports during
compilation: Metropolis method choosen for node gamma.
Analysis
After a 500 iteration burn-in, 1000 iterations took 31 seconds using the default 32 bins for the Metropolis sampler.
(The C & G posterior refers to that in Carlin and Gelfand (1991)).
We note that the Metropolis sampler is considerably faster than using discretisation, and the results are virtually indistinguishable.
This example is described in Spiegelhalter et al. (1996b)[page 30], in which convergence problems are noted. Gelfand et al (1995) and Gelfand et al (1996) discuss the method of hierarchical centering for such models, in which each stochastic variable is, as far as possible, considered as arising from a stochastic mean. In effect, covariates are entered as `high' in the model as possible. They argue this procedure should often improve convergence, and further evidence is provided by Roberts and Sahu (1997).
For Model III in the epilepsy example, rather than having both random effects entering into a single regression for the Poisson mean, we may separate out the random effects to create an additional level on the model. The model is thus given by:
Coefficients and precisions are given independent ``noninformative'' priors. The appropriate graph is shown in Figure 1.
Figure 1:
Graphical model for epil example, using a hierarchically centered
parameterisation
Model specification for epil example with hierarchical centering. The part indicated +++++ is identical to that given in Spiegelhalter et al (1996b) [page 30].
for(j in 1:N) {
for(k in 1:T) {
mu[j,k] <- a0 + alpha.Base * (log.Base4[j]-log.Base4.bar)
+ alpha.Trt * (Trt[j]-Trt.bar)
+ alpha.BT * (BT[j] - BT.bar)
+ alpha.Age * (log.Age[j]-log.Age.bar)
+ alpha.V4 * (V4[k] - V4.bar)
+ b1[j];
y[j,k] ~ dpois(m[j,k]);
log(m[j,k]) <- b[j,k];
b[j,k] ~ dnorm(mu[j,k],tau.b); # subject*visit random effects
}
b1[j] ~ dnorm(0.0,tau.b1); # subject random effects
+++++
Analysis
A burn-in of 3000 iterations was followed by a further 7000 iterations. This took approximately 30 minutes.
We have generally found that hierarchical centering leads to both quicker sampling and earlier convergence.
This example is analysed in Lindstrom and Bates (1990) as an example of a mixed non-linear growth curve model. The data describe the growth of each of five orange trees, with measurements at seven common times:
A logistic growth curve model, with an unknown maximum, is assumed.
We first standardise the covariate
to
in order
to improve convergence and stability of estimates, and to make
the random effects assumptions more reasonable.
Lindstrom and Bates (1990) only take
as a random effect with a Gaussian population distribution.
We shall allow all three growth parameters
to vary between trees; means and precisions are
given independent ``noninformative'' priors.
As mentioned in Section 2.2, we need to specify
a range for each of the
parameters to be sampled using the Metropolis algorithm. It is convenient to first run a fixed-effect model in which five independent growth
curves are fitted to the data, specifying only that the
's are all between
-20 and +20. This gives rise to estimates for
that
suggest generous lower and upper bounds of (4,6) for
's,
(-2,0) for
's and (-3,0) for
's. These
are then placed in the data file as the lower and upper
vectors.
Figure 2:
Graph of the orange tree example.
Model specification for orange example
model otree;
const
n = 7,
K = 5;
var
tauC,mu[3],tau[3], Y[K,n],m[K,n],phi[K,3],theta[K,3],
lower[3],upper[3],sigmaC,sigma[3],x[n],x.bar,x.sd;
data in "otree.dat";
inits in "otree.in";
{
x.bar <- mean(x[]);
x.sd <- sd(x[]);
for (i in 1:K) {
for (j in 1:n) {
Y[i, j] ~ dnorm(m[i, j], tauC)
m[i, j] <- exp(theta[i,1]) /
(1 + exp(theta[i,2] + theta[i, 3] * (x[j]-x.bar)/x.sd));
}
for (k in 1:3) {
theta[i, k] ~ dnorm(mu[k], tau[k])I(lower[k], upper[k])
}
}
tauC ~ dgamma(1.0E-3, 1.0E-3)
sigmaC <- 1 / sqrt(tauC)
for (k in 1:3) {
mu[k] ~ dnorm(0, 1.0E-4)
tau[k] ~ dgamma(1.0E-3, 1.0E-3)
sigma[k] <- 1 / sqrt(tau[k])
}
}
Analysis
A burn-in of 500 iterations followed by a further 1000 iterations took approximately 2.25 minutes.
From the size and standard deviations of the random effects
's,
the assumption of Lindstrom and Bates (1990) that a random
effect is only required for
appears reasonable.
Consider the Gaussian linear mixed model [Laird and Ware, 1982],
where the
are vectors of length
containing the observations on
the
unit, and the
are error vectors of the same
length independently distributed as
: note that all our Normal parameterisations are in terms
of precisions. In this mixed model,
is an
design matrix of covariates and
is a corresponding
vector of fixed effects. In contrast,
is a
design matrix
(q typically less than p), and
is a
vector of
subject-specific random effects. The
model the
subject-specific means,
as well as enabling the model to capture marginal dependence among the
observations on the
unit. The hierarchical specification of this
model is completed by adding the prior distributions
,
,
and
.
We apply this model to
continuous longitudinal data from a clinical trial
originally described by Abrams et al (1994), which
compared the
effectiveness of two antiretroviral drugs (didanosine, ddI, and
zalcitabine, ddC) in 467 persons with advanced HIV infection. The
response variable
for patient i at time j is the
square root of the patient's CD4
count, a seriological measure of immune system health and prognostic factor
for AIDS-related illness and mortality.
The dataset records patient CD4 counts at study
entry and again at 2, 6, 12, and 18 months after entry, though a great many
of these observations are missing for many patients (the sample sizes at the
five time points for the two drug groups are (230, 182, 153, 102, 22) and
(236, 186, 157, 123, 14), respectively).
Following a Bayesian reanalysis of these data
[Carlin, 1996, Carlin and Louis, 1996],
we seek to fit model (1)
where the
row of the patient i's design matrix
takes the form
where
and
. Thus the
three columns of
correspond to individual-level intercept,
slope, and possible change in slope after the two month visit (by which time
the drugs are expected to produce a detectable benefit). We further account
for the effect of two covariates by including them in the fixed effect
design matrix
. These covariates are
, a binary variable
indicating whether patient i received ddI (
) or ddC (
),
and
, a binary variable telling whether the patient was diagnosed as
having AIDS at baseline (
) or not (
). Each of these
covariates is allowed to influence the intercept, slope and change, and hence
so that p = 3q = 9.
We complete our model specification with minimally informative priors,
taking care to ensure that they do not lead to improper posterior
distributions for the variance components
and D. Following
previous work, we set
and
,
which should preserve identifiability while still allowing the random
effects a reasonable amount of freedom. For the prior on
we take a = 1, b=100
(a prior with both mean and
standard deviation equal to
), while for
we set
a prior biased strongly away from 0 only for the baseline intercept,
, and the intercept adjustment for a positive AIDS diagnosis,
. This prior also forces the drug group intercept (i.e., the effect
at baseline)
to be very small, since patients were assigned
to drug group at random.
Figure 3:
Graph of the ddIddc example.
Here is the BUGS code to fit this model, where ind indexes the individual in the study, and i and j index the rows and columns of the design matrices, respectively. By placing NA's in the data file, the W matrix is common to all individuals, but the X matrix is still individual-specific.
model ddIddC;
const
N = 467, # number of patients
s = 5, # number of time points
q = 3, # number of random effects
p = 9; # number of fixed effects
var
X[N,s,p],W[s,q],Y[N,s],alpha[p],beta[N,q],d[N],a[N],Omega[q,q],V[q,q],
Sigma2[q,q],sigma,tau,R[q,q],rho,c[p],Omega.alpha[p,p],
mu[N,s],mu.beta[q];
data d,a in "drugaids.dat", Y in "Y.dat",
W in "W.dat", c in "priormean.dat", Omega.alpha in "priorprec.dat";
inits in "ddIddC.in";
{
for (ind in 1:N) {
for (i in 1:s) {
for (j in 1:q) {
X[ind,i,j] <- W[i,j];
X[ind,i,j+3] <- d[ind]*W[i,j];
X[ind,i,j+6] <- a[ind]*W[i,j];
}
Y[ind,i] ~ dnorm(mu[ind,i],tau);
mu[ind,i] <- inprod(X[ind,i,],alpha[]) + inprod(W[i,],beta[ind,]);
}
beta[ind,] ~ dmnorm(mu.beta[],Omega[,]); # trivariate Normal
}
tau ~ dgamma(1, 100); sigma <- 1.0/sqrt(tau);
Omega[,] ~ dwish(R[,],24); # Wishart prior on precision matrix
R[1,1] <- 96.0; R[1,2] <- 0.0; R[1,3] <- 0.0;
R[2,1] <- 0.0; R[2,2] <- 1.5; R[2,3] <- 0.0;
R[3,1] <- 0.0; R[3,2] <- 0.0; R[3,3] <- 1.5;
V[,] <- inverse(Omega[,])
mu.beta[1] <- 0.0; mu.beta[2] <- 0.0; mu.beta[3] <- 0.0;
# for (j in 1:p){alpha[j] ~ dnorm(c[j],Omega.alpha[j,j]);} # univ normals
alpha[] ~ dmnorm(c[],Omega.alpha[,]); # mv normal -- better convergence!
}
Running this BUGS code for 5000 iterations produces the summaries in Table 1. The results are quite comparable to those given by Carlin and Louis (1996) [pp.280-281]. Interestingly, this original work took several hundred lines of code in Fortran 77, augmented with IMSL subroutine calls for matrix manipulation and random variate generation - a stark contrast with the fewer than 40 lines of BUGS code above.
Table 1: Posterior summaries, ddI/ddC data model
The single line that is commented out in the above BUGS code can
be used to specify the (independence) prior for
componentwise
using dnorm, instead of all at once using dmnorm. While
mathematically equivalent, Table 2 shows that the
univariate specification to be inferior in terms of convergence speed,
since BUGS then updates the
one at a time, instead of
as a vector. Laboring against the cross-correlations within this
vector, overall performance deteriorates.
We remark that the hierarchically centered version of this model recommended for this dataset by Gelfand et al (1995), namely
where
,
, and
,
is not possible within the current version of BUGS.
This is because BUGS cannot calculate the
proper multivariate normal mean and precision matrix when the ``data''
(in the centered version, the
) are not univariate, unless the
data mean is identical to the multivariate normal prior.
BUGS can however accommodate some
simpler, univariate centering forms, as in the revised epil example.
Table 2: Lag 1 sample autocorrelations, algorithms for ddI/ddC data model
Wakefield et al (1994) consider the data in Table 3, which
record the plasma concentration
of the drug Cadralazine
at various time lags
following the administration of a single
dose of 30 mg in 10 cardiac failure patients. Here,
indexes the patient, while
indexes the
observations,
.
These authors suggest a
``one-compartment'' nonlinear pharmacokinetic model wherein the mean
plasma concentration
is given by
Subsequent unpublished work by these same authors suggests this model is best fit on the log scale. That is, we suppose
where
. The mean
structure for the
's thus emerges as
where
and
.
Table 3: Cadralazine concentration data
Following the analysis by Wakefield et al (1994), we assume the
subject-specific random effects
are
i.i.d. from a
distribution, where
. These authors also recommend the usual
conjugate prior specification, namely
,
, and
.
Since the full conditional distributions of the random effects
are neither simple conjugate forms nor guaranteed to be
log-concave, the new Metropolis capability of
BUGS 0.6 is required. This Metropolis routine requires
bounds to be placed on variables using the I(,) construction, so
unfortunately, the model for
cannot be specifed bivariately
as above, since BUGS currently cannot handle multivariate range
restrictions.
However, the model may still be specified in BUGS using the
product formulation of the bivariate normal, namely
where
and
are broad truncation regions to
enable the grid-based Metropolis algorithm, and c is a constant used
to roughly center
the
's (hence reduce correlation between the intercept
and
slope
).
Under this formulation, we replace the normal prior for
and
the Wishart prior for
with gamma priors for
and
and normal priors for
and
.
Figure 4:
Graph of the PK example.
The BUGS code to fit this model follows. As can be seen, we
adopt the tuning constants c=3,
, and
. The latter values comfortably
contain all the posterior mass for the
and
; significantly
more widely dispersed values
(say,
and
) do in fact lead to sharp drops
in the Metropolis acceptance rate, hence reductions in efficiency.
model PK;
const
N = 10, # number of patients
T = 8; # number of time points
var
X[T],Z[N,T],theta[N,2],a[N],b[N],lnu[N,T],tau[N],sigma[N],Y[N,T],
mu.a,tau.a,mub[N],tau.b,int.b,slope.b,mu.b;
data Z in "PKZ.dat", X in "PKX.dat";
inits in "PK.in";
{
for (i in 1:N) {
for (j in 1:T) {
Z[i,j] ~ dnorm(lnu[i,j],tau[i]);
Y[i,j] <- exp(Z[i,j]);
lnu[i,j] <- log(30) - a[i] - exp(b[i]-a[i])*X[j];
} # end of j loop
a[i] ~ dnorm(mu.a,tau.a) I(-5, 10);
b[i] ~ dnorm(mub[i],tau.b) I(-5, 10);
mub[i] <- int.b + slope.b * (a[i] - 3.0); # center the a_i's
tau[i] ~ dgamma(.0001, .0001); sigma[i] <- 1.0/sqrt(tau[i]);
} # end of i loop
mu.a ~ dnorm(0.0, 0.0001);
int.b ~ dnorm(0.0, 0.0001); slope.b ~ dnorm(0.0, 0.0001);
mu.b <- int.b + slope.b * (mu.a - 3.0);
tau.a ~ dgamma(1, 0.04); tau.b ~ dgamma(1, 0.04); # vague Wakefield prior
} # end of PK.bug program
Using a sequence of univariate Metropolis (Gaussian proposals) and
Gibbs steps, Sargent et al (1997) fit
the original Wishart formulation of this model using the priors
recommended by Wakefield et al (1994), namely
,
,
,
,
and
. We attempt a comparable prior in our
formulation by taking
G(0.0001, 0.0001) priors for the
,
N(0, 0.0001) priors for
and
, and
G(1, 0.04) priors for
and
.
Table 4: Posterior summaries and lag 1 sample autocorrelations, PK
data model
Running this BUGS code for 5000
iterations (following a 250 iteration burn-in period) produces the
posterior summaries and lag 1 sample autocorrelations given
in Table 4. Also shown are the results produced by
Sargent et al (1997), to which
the BUGS results are quite comparable, given
the slight differences in model and prior formulation.
(Results for
in the Wishart model are given in the
row of the table; however,
results for
are not shown in the
row since
is a conditional precision, given
the
.)
The relatively large posterior means for
and
(and correspondingly large posterior variances for
,
and
) at first seem counter-intuitive, since these two patients
had the most data available for study. However, their final 2 to 3
observations fit the overall model poorly (with those for patient 8
not even being monotone), explaining this oddity. Finally (and
relatedly), note the predicted values of the final observations for
patients 2 (whose clearance rate is the slowest)
and 7 (whose rate is amongst the fastest). The former has
mean somewhat larger than that suggested by the
posterior predictive distribution under the ``power model'' fit on
the original (unlogged) scale by Wakefield et al (1994) [page 216].
We are grateful to all those people who have proposed ways of improving the BUGS software, and apologise for implementing so few of their suggestions. We are particularly grateful for Brad Carlin for providing the ddIddC and the PK examples.