Contents
- Index
Subset Models
This option allows an automated way for the user to create many subset models from a more complex "full" model and can save time in model construction. The user first builds the full model which contains all the variables (i.e., columns in the design matrix of interest). Note that the full model may not have actually been run, i.e., the saved structure of the full model can be used to construct the subset models. Then, the user specifies columns from the full model design matrix to be used in a nested set of models.
To start the process, in the Results Browser Window, highlight the model with the design matrix that contains all the variables that you want to use in all possible subset model combinations. To make specification of the variables as easy as possible, you should provide informative column labels (Design Matrix Labels) for each column of the design matrix using the Appearance | Label Columns menu choices. Then, select the menu choices Run | Subset of DM Models to create and run all possible models.
The interactive interface that now appears gives you a list of all the columns in the design matrix. Columns with "intercept" in the label will be assumed to appear in all models, and so are given the value "Always" to designate this status. Other columns will be given values of A, B, ..., up to J to designate up to 10 variables, either singly or jointly. Currently, the limitation is set to 10 variables, which produces 2^10 = 1024 possible models. If you think you need to run more models, I suggest you think harder about the list of variables you are considering.
The letters A, B, ..., J identify how columns in the design matrix are related to each other and how they will be combined in subsets of models. So, as an example, suppose you have 6 columns in a known fate model: intercept, autumn, winter, spring, age (young and adult), and gender (male and female). The intercept will need to occur in every model and needs to be defined as "Always." The variables age and gender are stand alone variables, i.e., each defines a category with a single column in the design matrix. However, the 3 columns autumn, winter, and spring are a set of columns that define season, and never appear singly in a model, but should always appear together. So, they should have the same letter attached to them.
The default screen would be:
intercept Always
autumn A
winter B
spring C
age D
gender E
You should modify the values to make autumn, winter, and spring the same:
intercept Always
autumn A
winter A
spring A
age D
gender E
It is not necessary to label consecutively (e.g., age = B and gender = C) -- only that the labels are different than all the rest of the columns. This above specification will result in 2^3 = 8 possible models:
intercept only
intercept + season
intercept + age
intercept + gender
intercept + season + age
intercept + season + gender
intercept + age + gender
intercept + season + age + gender
Note that the last model is identical to the starting full model. However, this may not be the case for all full models because another option, "Never", can be specified to never include a column in any of the subset models.
If specification of the column attributes is particularly complex and error prone, create the list in Excel, copy the list to the clipboard, and then paste the list into the dialog window.
As another example, in the following screen from a Cormack-Jolly-Seber live recaptures data type, 3 variables for phi are specified, and 3 variables for p are specified. The result is 2^6 = 64 models. Notice that the column labels in the design matrix include whether the variable is to model phi or p.
Missing image: screenshot001.bmp
Two additional options are available to be checked at the bottom of the model specification screen. Instead of running the models immediately, the model structure can be saved and then all of the models run later in batch model.
The second option is to use the beta estimates from the full model as initial values for the subset models. However, these estimates may not be great, depending on the collinearity among the variables. Note that this option does not appear if the full model has not actually ben run (i.e., only the saved structure is used to specify the full model).
A common issue is that the user only wants to include a column when another column is included in the model. An example would be for linear trend (T) and the associated quadratic trend (TT). As an example, suppose that there is two additional variables, age and gender. One approach to only including TT when T is in the model is to do 2 sets of models. For the first set, only the T variable would be used:
intercept Always
age A
gender B
T C
TT Never
Then, a second set of models are constructed to always include TT with T:
intercept Always
age A
gender B
T C
TT C
Each set will produce 8 models, for a total of 16. However, the user will have 4 sets of duplicates when neither T or TT are included:
intercept
intercept + age
intercept + gender
intercept + age + gender
Sorting (ordering) the list of models by model name may help find the duplicates.
A second issue is that the user never wants 2 particular variables in the model at the same time. Suppose this is the case for length and weight. Again, a simple solution is to run 2 sets of models, specifying the Never key word first for length, and then for weight. However, again, some duplicate models will have to be removed.