Performs Bayesian computing using the Differential Evolution Markov Chain algorithm (W. van den Berg & R.W. Payne).
Options
PRINT = string token |
What to print (results , monitoring , scatterplot , histogram ); default resu , moni , scat , hist |
---|---|
CALCULATION = expression |
Calculation(s) of logposterior, involving explanatory or pointer variate; if unset, this is calculated by the procedure specified by the PROCEDURE option |
LOGPOSTERIOR = scalar |
Identifier of scalar holding log-posterior within CALCULATION (must be set if CALCULATION is set) |
MULTIPLE = scalar |
Number of populations is number of parameters times MULTIPLE ; default 3 |
UNIFORMLIMIT = scalar |
Uniform random numbers are drawn from (-UNIFORMLIMIT , UNIFORMLIMIT ) and added to candidate parameter sets; default 0.00001 |
DATA = identifiers |
Data structures used in CALCULATION or by PROCEDURE |
NGENERATIONS = scalar |
Maximum number of iterations; default 1000 |
STEP1 = scalar or variate |
Generations for which gamma is set to 1; default 0 |
FRACTIONBURNIN = scalar |
Fraction of iterations used for burn-in; default 0.5 |
GRVARIANCE = scalar or variate |
Variance to generate populations from initial values of the parameters; default 0.1 |
PERCENTAGES = variate |
Percentages for which quantiles has to be calculated; default !(2.5, 25, 50, 75, 97.5) |
PROCEDURE = identifier |
Identifier of procedure to calculate LOGPOSTERIOR if CALCULATION is unset; default _DEMCLOGPOSTERIOR |
SEED = scalar |
Seed for the random numbers; default 0 |
NWINDOWS = scalar |
Number of histograms and scatterplots per screen when plotting estimates and logposterior from all iterations |
SDLOGPOSTERIOR = scalar |
Saves the s.d. for LOGPOSTERIOR |
QUANTILESLOGPOSTERIOR = variate |
Saves quantiles for LOGPOSTERIOR |
RHATLOGPOSTERIOR = scalar |
Saves the convergence criterion for LOGPOSTERIOR |
ALLLOGPOSTERIOR = variate |
Saves the parameter estimates for LOGPOSTERIOR from all the iterations |
IPOPULATIONS = pointers |
Pointer to supply initial populations of the parameters and the corresponding log-posteriors |
FPOPULATIONS = pointers |
Pointer to save final populations of the parameters and the corresponding log-posteriors |
Parameters
PARAMETER = scalars |
Parameters to estimate |
---|---|
INITIAL = scalars |
Initial values of the parameters; must be set unless IPOPULATIONS is set |
SD = scalars |
Standard errors of the estimates |
QUANTILES = variates |
Saves the quantiles for each parameter |
RHAT = scalars |
Convergence criteria |
ALLESTIMATES = variates |
Saves the parameter estimates from all the iterations |
Description
DEMC
uses the Differential Evolution Markov Chain algorithm of Ter Braak (2006) to do Bayesian computations by Markov chain Monte Carlo. The logarithm of the posterior density for each set of parameters can be calculated either by a list of expressions supplied by the CALCULATION
option, or by a (user-defined) procedure whose name is specified by the PROCEDURE
option (with default name _DEMCLOGPOSTERIOR
). The names of the parameters and their initial values are specified by the PARAMETER
and INITIAL
parameters, respectively. Data structures containing information that is needed to calculate the log-posterior are supplied by the DATA
option. Also, if you are using the CALCULATION
option, you must define the identifier of the log-posterior (as used to store the results of the calculations) using the LOGPOSTERIOR
option.
The number of populations of parameters to be generated is defined as the number of parameters multiplied by the value supplied by the MULTIPLE
option (default 3). The Normal variance used to generate the initial population from the initial values is specified by the GRVARIANCE
option. You can set this to a scalar to use the same variance for each parameter, or to a variate to define different variances for the parameters; by default GRVARIANCE=0.1
. The fraction of the data used for burn-in is specified by the FRACTIONBURNIN
option (default 0.5).
The NGENERATIONS
option defines the number of generations to form from the populations, and the FRACTIONBURNIN
option defines the proportion of these that are for burn-in. (The distributions of the parameters are determined only from the generations that are produced after burn-in is complete.) The SEED
option defines a seed for the random numbers that are used within DEMC
. The default value 0 continues from the previous random-number generation or (if none) initializes the seed automatically. Options UNIFORMLIMIT
and STEP1
, which control how the new populations are formed, are explained in the Method section.
Once the generations are complete, the identifiers defined by PARAMETER
are defined as scalars containing the means of the parameters over the populations generated after burn-in. Standard deviations and convergence criteria for the parameters can be saved, in scalars, using the SD
and RHAT
parameters. If RHAT
is greater than 1.1, say, for any parameter, the number of generations should be increased. The QUANTILES
parameter allows to save a variate for each PARAMETER
, containing quantiles at percentages specified by the PERCENTAGES
option (by default 2.5, 25, 50, 75, 97.5). To study the parameter distributions in more detail, you can also use the ALLESTIMATES
parameter to save variates containing all the values generated after burn-in for each PARAMETER
. The LOGPOSTERIOR
, SDLOGPOSTERIOR
, RHATLOGPOSTERIOR
, QUANTILESLOGPOSTERIOR
and ALLLOGPOSTERIOR
allow the equivalent information to be saved for the log-posterior.
The final populations and corresponding log-posteriors can be saved, in a pointer, by the FPOPULATIONS
option. You can then restart DEMC
from the current position, and run some more generations, by using this pointer as the setting of the IPOPULATIONS
option. FPOPULATIONS[1...N]
have number of units equal to the number of parameters d, while FPOPULATIONS[N1]
has number of units equal to N
, where N
= MULTIPLE
× d. This can cause problems if you try to save FPOPULATIONS[]
using procedure EXPORT
.
Options: PRINT
, CALCULATION
, LOGPOSTERIOR
, MULTIPLE
, UNIFORMLIMIT
, DATA
, NGENERATIONS
, STEP1
, FRACTIONBURNIN
, GRVARIANCE
, PERCENTAGES
, PROCEDURE
, SEED
, NWINDOWS
, SDLOGPOSTERIOR
, QUANTILESLOGPOSTERIOR
, RHATLOGPOSTERIOR
, ALLLOGPOSTERIOR
, IPOPULATIONS
, FPOPULATIONS
.
Parameters: PARAMETER
, INITIAL
, SD
, QUANTILES
, RHAT
, ALLESTIMATES
.
Method
DEMC uses the DE-MC algorithm of Ter Braak (2006) to perform Markov chain Monte Carlo (MCMC); see Congdon (2001, 2003), Gelman et al. (2004) or Lee (2003). The DE-MC algorithm combines the genetic algorithm called Differential Evolution (DE) with MCMC. The values of the INITIAL
parameter are used to generate n parameter sets, by generating d independent Normal deviates with means INITIAL
and variance GRVARIANCE
. Here, d is the number of parameters, and n is d multiplied by the value of the MULTIPLE
option.
For each parameter set i (i=1…n), the algorithm selects two other parameter sets at random, and calculates the differences between their parameter values, multiplied by a parameter γ and a random number taken from the uniform distribution on (-UNIFORMLIMIT
, UNIFORMLIMIT
); γ generally takes the value 2.38/√(2×d), but the STEP1
option allows you to define generations in which γ takes the value 1 (by default there are none). These differences are then added to the parameter values in set i to form a new candidate set of values. The candidate set replaces set i if its log-posterior likelihood is greater than the log-posterior likelihood of set i + the logarithm of a random number from the uniform distribution on (0,1); see Ter Braak 2006).
References
Congdon, P. (2001). Bayesian Statistical Modelling. Wiley, Chichester, England
Congdon, P. (2003). Applied Bayesian Modelling. Wiley, Chichester, England.
Gelman, A., Carlin, J.B., Stern, H.S. & D.B. Rubin (2004). Bayesian Data Analysis, 2nd Edition. Chapman & Hall, London.
Lee, P.M. (2003). Bayesian Statistics an Introduction, 3rd Edition. Arnold, London.
Ter Braak, C.J.F. (2006) A Markov chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. Statistics & Computing, 16, in press.
See also
Procedure: BGXGENSTAT
.
Commands for: Bayesian methods.
Example
CAPTION 'DEMC example',!t(\ 'Coagulation time data from Table 11.2 of Gelman, Carlin, Stern & Rubin',\ '(2004). Bayesian Data Analysis, 2nd Edition, p. 299.'); STYLE=meta,plain VARIATE [VALUES=62,60,63,59,63,67,71,64,65,66,68,66,\ 71,67,68,68,56,62,60,61,63,64,63,59] Coagulation_time FACTOR [LABELS=!t(A,B,C,D); VALUES=4(1),6(2,3),8(4)] Diet VARIATE [VALUES=4(0)] muvar VCOMPONENTS Diet REML [PRINT=model,components] Coagulation_time VKEEP [SIGMA2=sigma2reml] Diet; COMPONENT=compreml; MEAN=mean CALCULATE muin=MEAN(mean) EXPRESSION p[1...5]; VALUE=\ !E(muvar$[1...4] = mu1, mu2, mu3, mu4 ),\ !E(fit = NEWL(Diet; muvar)),\ !E(l1 = -12 * logs2 - 0.5 * SUM((Coagulation_time - fit)**2) / EXP(logs2)),\ !E(l2 = -2 * logtau2 - 0.5 * SUM((muvar - mu)**2) / EXP(logtau2)),\ !E(lposterior = l1 + l2 + logtau2 / 2 - 14 * LOG(2 * C('pi'))) DEMC [PRINT=results,monitoring,histogram;\ CALCULATION=p[]; LOGPOSTERIOR=lposterior;\ DATA=Coagulation_time,Diet; PERCENTAGES=!(25,50,75);\ NGENERATIONS=1000; SEED=349472; SDLOGPOSTERIOR=sdlposterior;\ RHATLOGPOSTERIOR=rhlposterior; QUANTILESLOGPOSTERIOR=qu[8]]\ mu1,mu2,mu3,mu4,mu,logs2,logtau2; INITIAL=#mean,muin,1,1