DEMC procedure

Performs Bayesian computing using the Differential Evolution Markov Chain algorithm (W. van den Berg & R.W. Payne).

Options

`PRINT` = string token	What to print (`results`, `monitoring`, `scatterplot`, `histogram`); default `resu`, `moni`, `scat`, `hist`
`CALCULATION` = expression	Calculation(s) of logposterior, involving explanatory or pointer variate; if unset, this is calculated by the procedure specified by the `PROCEDURE` option
`LOGPOSTERIOR` = scalar	Identifier of scalar holding log-posterior within `CALCULATION` (must be set if `CALCULATION` is set)
`MULTIPLE` = scalar	Number of populations is number of parameters times `MULTIPLE`; default 3
`UNIFORMLIMIT` = scalar	Uniform random numbers are drawn from (`-UNIFORMLIMIT`, `UNIFORMLIMIT`) and added to candidate parameter sets; default 0.00001
`DATA` = identifiers	Data structures used in `CALCULATION` or by `PROCEDURE`
`NGENERATIONS` = scalar	Maximum number of iterations; default 1000
`STEP1` = scalar or variate	Generations for which gamma is set to 1; default 0
`FRACTIONBURNIN` = scalar	Fraction of iterations used for burn-in; default 0.5
`GRVARIANCE` = scalar or variate	Variance to generate populations from initial values of the parameters; default 0.1
`PERCENTAGES` = variate	Percentages for which quantiles has to be calculated; default !(2.5, 25, 50, 75, 97.5)
`PROCEDURE` = identifier	Identifier of procedure to calculate `LOGPOSTERIOR` if `CALCULATION` is unset; default `_DEMCLOGPOSTERIOR`
`SEED` = scalar	Seed for the random numbers; default 0
`NWINDOWS` = scalar	Number of histograms and scatterplots per screen when plotting estimates and logposterior from all iterations
`SDLOGPOSTERIOR` = scalar	Saves the s.d. for `LOGPOSTERIOR`
`QUANTILESLOGPOSTERIOR` = variate	Saves quantiles for `LOGPOSTERIOR`
`RHATLOGPOSTERIOR` = scalar	Saves the convergence criterion for `LOGPOSTERIOR`
`ALLLOGPOSTERIOR` = variate	Saves the parameter estimates for `LOGPOSTERIOR` from all the iterations
`IPOPULATIONS` = pointers	Pointer to supply initial populations of the parameters and the corresponding log-posteriors
`FPOPULATIONS` = pointers	Pointer to save final populations of the parameters and the corresponding log-posteriors

Parameters

`PARAMETER` = scalars	Parameters to estimate
`INITIAL` = scalars	Initial values of the parameters; must be set unless `IPOPULATIONS` is set
`SD` = scalars	Standard errors of the estimates
`QUANTILES` = variates	Saves the quantiles for each parameter
`RHAT` = scalars	Convergence criteria
`ALLESTIMATES` = variates	Saves the parameter estimates from all the iterations

Description

DEMC uses the Differential Evolution Markov Chain algorithm of Ter Braak (2006) to do Bayesian computations by Markov chain Monte Carlo. The logarithm of the posterior density for each set of parameters can be calculated either by a list of expressions supplied by the CALCULATION option, or by a (user-defined) procedure whose name is specified by the PROCEDURE option (with default name _DEMCLOGPOSTERIOR). The names of the parameters and their initial values are specified by the PARAMETER and INITIAL parameters, respectively. Data structures containing information that is needed to calculate the log-posterior are supplied by the DATA option. Also, if you are using the CALCULATION option, you must define the identifier of the log-posterior (as used to store the results of the calculations) using the LOGPOSTERIOR option.

The number of populations of parameters to be generated is defined as the number of parameters multiplied by the value supplied by the MULTIPLE option (default 3). The Normal variance used to generate the initial population from the initial values is specified by the GRVARIANCE option. You can set this to a scalar to use the same variance for each parameter, or to a variate to define different variances for the parameters; by default GRVARIANCE=0.1. The fraction of the data used for burn-in is specified by the FRACTIONBURNIN option (default 0.5).

The NGENERATIONS option defines the number of generations to form from the populations, and the FRACTIONBURNIN option defines the proportion of these that are for burn-in. (The distributions of the parameters are determined only from the generations that are produced after burn-in is complete.) The SEED option defines a seed for the random numbers that are used within DEMC. The default value 0 continues from the previous random-number generation or (if none) initializes the seed automatically. Options UNIFORMLIMIT and STEP1, which control how the new populations are formed, are explained in the Method section.

Once the generations are complete, the identifiers defined by PARAMETER are defined as scalars containing the means of the parameters over the populations generated after burn-in. Standard deviations and convergence criteria for the parameters can be saved, in scalars, using the SD and RHAT parameters. If RHAT is greater than 1.1, say, for any parameter, the number of generations should be increased. The QUANTILES parameter allows to save a variate for each PARAMETER, containing quantiles at percentages specified by the PERCENTAGES option (by default 2.5, 25, 50, 75, 97.5). To study the parameter distributions in more detail, you can also use the ALLESTIMATES parameter to save variates containing all the values generated after burn-in for each PARAMETER. The LOGPOSTERIOR, SDLOGPOSTERIOR, RHATLOGPOSTERIOR, QUANTILESLOGPOSTERIOR and ALLLOGPOSTERIOR allow the equivalent information to be saved for the log-posterior.

The final populations and corresponding log-posteriors can be saved, in a pointer, by the FPOPULATIONS option. You can then restart DEMC from the current position, and run some more generations, by using this pointer as the setting of the IPOPULATIONS option. FPOPULATIONS[1...N] have number of units equal to the number of parameters d, while FPOPULATIONS[N1] has number of units equal to N, where N = MULTIPLE × d. This can cause problems if you try to save FPOPULATIONS[] using procedure EXPORT.

Options: PRINT, CALCULATION, LOGPOSTERIOR, MULTIPLE, UNIFORMLIMIT, DATA, NGENERATIONS, STEP1, FRACTIONBURNIN, GRVARIANCE, PERCENTAGES, PROCEDURE, SEED, NWINDOWS, SDLOGPOSTERIOR, QUANTILESLOGPOSTERIOR, RHATLOGPOSTERIOR, ALLLOGPOSTERIOR, IPOPULATIONS, FPOPULATIONS.

Parameters: PARAMETER, INITIAL, SD, QUANTILES, RHAT, ALLESTIMATES.

Method

DEMC uses the DE-MC algorithm of Ter Braak (2006) to perform Markov chain Monte Carlo (MCMC); see Congdon (2001, 2003), Gelman et al. (2004) or Lee (2003). The DE-MC algorithm combines the genetic algorithm called Differential Evolution (DE) with MCMC. The values of the INITIAL parameter are used to generate n parameter sets, by generating d independent Normal deviates with means INITIAL and variance GRVARIANCE. Here, d is the number of parameters, and n is d multiplied by the value of the MULTIPLE option.

For each parameter set i (i=1…n), the algorithm selects two other parameter sets at random, and calculates the differences between their parameter values, multiplied by a parameter γ and a random number taken from the uniform distribution on (-UNIFORMLIMIT, UNIFORMLIMIT); γ generally takes the value 2.38/√(2×d), but the STEP1 option allows you to define generations in which γ takes the value 1 (by default there are none). These differences are then added to the parameter values in set i to form a new candidate set of values. The candidate set replaces set i if its log-posterior likelihood is greater than the log-posterior likelihood of set i + the logarithm of a random number from the uniform distribution on (0,1); see Ter Braak 2006).

References

Congdon, P. (2001). Bayesian Statistical Modelling. Wiley, Chichester, England

Congdon, P. (2003). Applied Bayesian Modelling. Wiley, Chichester, England.

Gelman, A., Carlin, J.B., Stern, H.S. & D.B. Rubin (2004). Bayesian Data Analysis, 2nd Edition. Chapman & Hall, London.

Lee, P.M. (2003). Bayesian Statistics an Introduction, 3rd Edition. Arnold, London.

Ter Braak, C.J.F. (2006) A Markov chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces. Statistics & Computing, 16, in press.

Example

CAPTION     'DEMC example',!t(\
  'Coagulation time data from Table 11.2 of Gelman, Carlin, Stern & Rubin',\
  '(2004). Bayesian Data Analysis, 2nd Edition, p. 299.'); STYLE=meta,plain
VARIATE     [VALUES=62,60,63,59,63,67,71,64,65,66,68,66,\
                    71,67,68,68,56,62,60,61,63,64,63,59] Coagulation_time
FACTOR      [LABELS=!t(A,B,C,D); VALUES=4(1),6(2,3),8(4)] Diet
VARIATE     [VALUES=4(0)] muvar
VCOMPONENTS Diet
REML        [PRINT=model,components] Coagulation_time
VKEEP       [SIGMA2=sigma2reml] Diet; COMPONENT=compreml; MEAN=mean
CALCULATE   muin=MEAN(mean)
EXPRESSION  p[1...5]; VALUE=\
  !E(muvar$[1...4] = mu1, mu2, mu3, mu4 ),\
  !E(fit = NEWL(Diet; muvar)),\
  !E(l1 = -12 * logs2 - 0.5 * SUM((Coagulation_time - fit)**2) / EXP(logs2)),\
  !E(l2 = -2 * logtau2 - 0.5 * SUM((muvar - mu)**2) / EXP(logtau2)),\
  !E(lposterior = l1 + l2 + logtau2 / 2 - 14 * LOG(2 * C('pi')))
DEMC        [PRINT=results,monitoring,histogram;\
            CALCULATION=p[]; LOGPOSTERIOR=lposterior;\
            DATA=Coagulation_time,Diet; PERCENTAGES=!(25,50,75);\
            NGENERATIONS=1000; SEED=349472; SDLOGPOSTERIOR=sdlposterior;\
            RHATLOGPOSTERIOR=rhlposterior; QUANTILESLOGPOSTERIOR=qu[8]]\
            mu1,mu2,mu3,mu4,mu,logs2,logtau2; INITIAL=#mean,muin,1,1

Updated on June 20, 2019

Was this article helpful?

Yes No