SCIENCE, MODELS AND INFERENCE

Information-Theoretic Approaches

 

FW-696BV, Spring Semester, 2003

 

I am offering a graduate course in Science, Models and Inference for the Spring Semester, 2003.  I am providing a lengthy description of the course below.  The course is being offered as FW-696BV for 2 credits.  We will meet Tuesday and Thursdays from 3-5.  The course will end in mid-March, allowing people to begin field projects or spend time on other classes.  The course is best suited to PhD students.  This is not a highly mathematical course, rather there is a focus on philosophy and science.  Mid-term and final examinations seem unlikely, but an occasional quiz will be given to be sure participants are keeping up on the assigned readings.  Grading is pass/fail unless otherwise requested.  My permission is required before a student can register.  Registration is limited to 20 (no “sit-ins").  The required text is –

 

Burnham, K. P., and D. R. Anderson. 2002.

Model Selection and Multi-Model Inference: A Practical Information-Theoretic Approach. (2nd ed.) Springer-Verlag, NY.

 

Details on the Course

Much of science is concerned with hypotheses and models to represent them. This course is about making valid inferences from scientific data in the biological sciences when a meaningful analysis depends on a model.  The approaches explored in this course are most applicable where the problem and analysis is somewhat confirmatory, rather than exploratory.  Simple models (1-3 or 4 parameters) are not the focus of the course.  The level of the material in the course can best be judged by reviewing the text (above), except for Chapter 7.  I have several main objectives in offering this course:

 

1.  Outline a consistent strategy for issues surrounding the analysis of empirical data.  Then, “data analysis" leading to valid inference is the integrated process of careful, a priori model formulation, model selection, parameter estimation, and measurement of precision (including a variance component due to model selection uncertainty). 

 

A philosophy of thoughtful, science-based, a priori modeling is advocated.  Models are viewed as approximations to information in the data: there are no “true models."  Science and biology play a lead role in this a priori model building and careful consideration of the science problem.  Chamberlin's (1890) concepts concerning “multiple working hypotheses" will be emphasized without the silly notion of a “null hypothesis" that typically no one believes in.

 

2.  Explain and illustrate methods developed recently at the interface of information theory and mathematical statistics for ranking the candidate models from best to worst and then computing the relative likelihood of each model, given the data.  In particular, I review and explain the use of Akaike's Information Criterion (AIC) in the selection of a model (or small set of good models) for statistical inference.  Here, the focus is on evidence from several sources to support strong inference.

 

The course will build on Kullback-Leibler information (the dominant paradigm in information and coding theory) and likelihood theory (the dominant paradigm in theoretical statistics).  The practical use of information criteria, such as Akaike's, for model selection is relatively recent (the major exception being in time series analysis where AIC has been used for most of the past two decades). 

 

3.  Examine “model selection uncertainty" – inference problems that arise when using the same data for both model selection and the associated parameter estimation and inference.  If model selection uncertainty is ignored, precision is often overestimated, achieved confidence interval coverage is below the nominal level, and predictions are less accurate than expected. 

 

4.  Introduce formal theory allowing inference from more than one model; this approach has a number of practical advantages.  Several issues to be covered fall under the heading Multi-Model Inference (e.g., model averaging, confidence sets on models, unconditional variances, and ways to explore the relative importance of predictor variables). 

 

Modeling is an art as well as a science and is directed toward finding a good approximating model of the empirical data as the basis for statistical inference from those data.  These are all complex issues and the literature is often highly technical and scattered widely throughout books and research journals.  The course will provide an understandable synthesis of some of these issues.

 

The course is to be highly interactive and students should have a challenging data set to work on during the course.  I have offered a prototype of this course in 1997 and a full course in 1999 and 2001.  I offered similar material in October 2000 in a graduate course at the University of Zurich in Switzerland.

 

Who Should Take This Course?

I see 3 groups of graduate students that might benefit from this class:  those interested in science and general science methods, those interested in model-based inference in the biosciences, and those interested in general quantitative methods.

 

Prerequisites

Students should be keenly interested in science and have a good background in least squares methods (e.g., “regression") or likelihood methods and have some experience and interest in modeling.  Prior courses in nonlinear regression (e.g., logistic and log-linear models) would be helpful.  Some experience in building statistical models to represent science hypotheses is essential.  Students are required to bring a data set to class to serve as an example.  They should know the subject matter underlying these data in some detail.