Bootstrap Goodness-of-Fit Procedure
The goodness-of-fit of the global model can be evaluated in 3 ways: assuming that the deviance for the model is chi-square distributed and computing a goodness-of-fit test from this statistic, using Program RELEASE (for live recapture data only) to compute the goodness-of-fit tests provided by that program, and using the parametric bootstrap procedure provided in MARK. Note that the parametric bootstrap procedure is different than the bootstrap data procedure.
The first approach is generally not valid because the assumption of the deviance being chi-square distributed is seldom met. I've only seen this approach to seem reasonable for very large band recovery data sets, and this approach has never been reasonable for live recapture data. Use of Program RELEASE is reasonable, but usually lacks statistical power to detect lack of fit because of the amount of pooling required to compute chi-square distributed test statistics. For these reasons, the bootstrap procedure was implemented in MARK, available from the Results Browser menu..
With the bootstrap procedure, the estimates of the model being evaluated for goodness of fit are used to generate data, i.e., a parametric bootstrap. The simulated data exactly meet the assumptions of the model, i.e., no over-dispersion is included, animals are totally independent, and no violations of model assumptions are included. Data are simulated based on the number of animals released at each occasion. For each release, a simulated encounter history is constructed. As an example, consider a live recapture data set with 3 occasions (2 survival intervals) and an animal first released at time 1. The animal starts off with an encounter history of 100, because it was released on occasion 1. Does the animal survive the interval from the release occasion until the next recapture occasion? The probability of survival is phi(1), provided from the estimates obtained with the original data. A uniform random number in the interval (0, 1) is generated, and compared to the estimate of phi(1). If the random number is less than or equal to phi(1), the animal is considered to have survived the interval. If the random value is greater than phi(1), the animal has died. Thus, the encounter history would be complete, and would be 100. Suppose instead that the animal survives the first interval. Then, is it recaptured on the second occasion? Again, a new random number is generated, and compared to the capture probability p(2) from the parameter estimates of the model being tested. If the random value is less than p(2), the animal is considered to be captured, and the encounter history would become 110. If not captured, the encounter history would remain 100. Next, whether the animal survives the second survival interval is determined, again by comparing a new random value with phi(2). If the animal dies, the current encounter history is complete, and would be either 100 or 110. If the animal lives, then a new random value is used to determine if the animal is recaptured on occasion 3 with probability p(3). If recaptured, the third occasion in the encounter history is given a 1. If not recaptured, the third occasion is left with a zero value.
Once the encounter history is complete, it is saved for input to the numerical estimation procedure. Once encounter histories have been generated for all the animals released, the numerical estimation procedure is run to compute the deviance and its degrees of freedom. These values along with c-hat (= deviance / df) are saved to a simulation output file. The entire process is repeated for the number of simulations requested.
When the requested number of simulations is completed, the user can access the bootstrap simulations results database to evaluate the goodness of fit of the model that was simulated. First, the deviances of the simulated data can be ranked (sorted into ascending order), and the relative rank of the deviance from the original data determined. Suppose that the deviance of the original model was 101.01, whereas the largest deviance from 1000 simulations was only 90.90. Then you can conclude that the probability of observing a value as large as 101.01 was less than 1/1000. As another example, suppose the 801th simulated deviance in the sorted deviance file is 100.90, and the 802nd value was 101.50. Then, you would conclude that your observed deviance was reasonably likely to be observed, with probability of 198/1000 (because 198 of the simulated values exceeded the observed value).
A similar procedure can be used to evaluate the observed c-hat by comparing its rank to the simulated values of c-hat. Typically, conclusions using c-hat and deviance are about the same, but different results may be obtained with sparse data sets where the degrees of freedom associated with the deviance vary a lot across the simulations.
The bootstrap simulations can also be used to estimate the over-dispersion parameter, c. Two approaches are possible, based on the deviance directly, and on c-hat. For the approach based on deviance, the deviance estimate from the original data is divided by the mean of the simulated deviances to compute c-hat for the data. The logic is that the mean of the simulated deviances represents the expected value of the deviance under the null model of no violations of assumptions (i.e., perfect fit of the model to the data). Thus, c-hat = observed deviance divided by expected deviance provides a measure of the amount of over-dispersion in the original data.
The second approach to estimating c-hat for the original data is to divide the observed value of c-hat from the original data by the mean of the simulated values of c-hat from the bootstraps. Again, the mean of the simulated values provides an estimate of the expected value of c-hat under the assumption of perfect fit of the model to the data.
I'm not sure of the benefits/disadvantages of the 2 procedures, and normally recommend using observed deviance divided by the mean of the bootstrap deviances because this approach does not rely on estimating the number of parameters, so is much faster. Bootstrap Options allows you to specify that you are only interested in the deviance, and not c-hat, from the bootstrap simulations. Generally, results are about the same, but can be different when the degrees of freedom of the deviance varies a lot across the bootstrap simulations (caused by a small number of releases).
One of the limitations of the bootstrap goodness-of-fit procedure is that individual covariates are not allowed.
White (2002) has demonstrated that the bootstrap goodness-of-fit procedure is biased low, with the bias increasing as the number of occasions increases and the apparent survival rate increases for the Cormack-Jolly-Seber data type. The median chat procedure seems to work better.