Contents - Index


Median chat


Estimation of the overdispersion parameter, c, for the global model is one of the key issues in applying Program MARK to encounter data.  The parametric bootstrap goodness-of-fit procedure was an attempt to develop a general procedure, but was found to be biased for Cormack-Jolly-Seber (CJS) data (White 2002)Program RELEASE provides useful goodness-of-fit (GOF) tests and estimates of c for the CJS data type, and programs ESTIMATE and BROWNIE provide similar capabilities for dead recovery data.  However, most of the data types in MARK do not have a useful GOF procedure to assess the validity of the global model.  The median chat procedure is an attempt to develop a general approach to the estimation of c.

Likelihood theory leads to the deviance and its associated degrees of freedom as a measure of the GOF of a model, with chat estimated as deviance/df.  Deviance is defined as the difference between -2log Likelihood for the model of interest and the -2log Likelihood of the saturated model.  Asymptotically, the deviance statistic is chi-square distributed.  However, for finite sample sizes, the deviance is not closely enough distributed as chi-square to be generally useful.  The median chat routine is an attempt to correct for the bias of the deviance chat.

The median chat approach is to simulate data with a range of c values, obtaining a deviance chat = deviance/df for each of the simulated data sets.  Then, a logistic regression is performed to estimate the value of c to simulate that would result in 1/2 of the simulated deviance/df values greater than the observed deviance/df, and hence 1/2 of the simulated values less than the observed deviance/chat.  The procedure requires the user to specify the range of c values to simulate (lower and upper bounds, and the total number of points based on these bounds), and the number of replicate simulations to generate for each of the specified range of c values.  Typically, a small set of c values over a wide range should be used to generate the resulting deviance/df values, to find out the approximate range in which to simulate c to focus the simulated data around the likely value of c that will result.  The logistic regression analysis is performed by MARK as a known fate model.  Output consists of the estimated value of c and a SE derived from the logistic regression analysis, with these estimates provided in a notepad window preceding the known fate output..  In addition, a graph of the observed proportions along with the predicted proportions based on the logistic regression model is provided.  The initial dialog box where the simulation parameters are specified also has a check box to request an Excel spreadsheet to contain the simulated values.  This spreadsheet is useful for additional analyses, if desired.

The median chat approach appears to work well.  In comparisons for the CJS data type to the RELEASE model, the median chat is biased high, as much as 15% in one case of phi = 0.5 with 5 occasions.  However, the median chat has a much smaller standard deviation for the sampling distribution than the chat estimated by RELEASE.  That is, the mean squared error (MSE) for the median chat is generally about 1/2 of the MSE for the RELEASE estimator.  Thus, on average, the median chat is closer to truth than the RELEASE chat, even though the median chat is biased high.

One of the current limitations of the median chat goodness-of-fit procedure is that individual covariates are not allowed. This is because the real parameters are passed to the simulator to generate the simulated data -- a fix to avoid having to deal with the multitude of link functions in the true model.