|
Poisson regression: Fitting loglinear models using poisson regression
The loglinear model is the best known example of Poisson regression. A loglinear model analyzes the association between variables in a crossclassification. The logarithm of the expected frequencies is a linear combination of the explanatory variables in a loglinear model. By adding interaction terms to the model, we can study the association between two or more variables. To fit a loglinear model, we have to supply the data in the form of a column of frequencies. If the frequencies are a crossclassification of p variables, then we have p additional columns in the datatable indicating the categories. As an example, we fit the independence model E(yij) = Npipj , (18) where yij is the frequency of the crossclassification, N is the sample size, pi is the proportion observations in category i, and pj is the proportion observations in category j. This model is written in loglinear notation as log(E(yij)) = logN + logpi + logpj , (19) The indepence model usually fits badly to a square table because the elements on the diagonal are large compared to the off-diagonal elements. The quasi independence model provides a better fit to these tables by ignoring the diagonal elements, or by expanding the model in such a way that the diagonal elements are fitted perfectly. We no longer posit an independence model to all elements of the table but only for the off-diagonal elements. Note that this model has a parameter for every element on the diagonal giving a perfect fit on the diagonal. By utilizing the Poisson Multinomial equivalence this model can be fitted using a Poisson regression model. To fit a loglinear model, we have to supply the data in the form of a column of frequencies, and we have to use additional columns for the variables in the crossclassification. If the frequencies are a crossclassification of p variables, we have p additional columns in the datatable. These variables indicate the categories of the crossclassification. As an example we use the migration data between four US regions from Agresti(1990), p. 357. This tables lists frequencies of migrations among four regions, Northeast (1), Midwest(2), South (3), and West (4). Here, we have two variables for the crossclassification resulting in three variables (1980, 1985, and frequency) to describe the migration patterns. In the datatable, the first row corresponds to pattern (1,1), the second row to migration pattern (1,2), and so on. The majority (95 %) of the observations are on the main diagonal, thus relatively few people changed region. The quasi-independence model is fitted using an independence model that is extended with additional variables for the diagonal. For every row we add a variable equal to one if the row and column are the same and zero elsewhere, these variables are labeled d1, d2, d3, d4. In the Poisson regression dialog we add the variables for the rows and columns as factors and the additional variables as numeric scores. This is done in the datatype menu from the data menu. From the analysis, we conclude that the model fits better than the independence model but still does a poor job in providing an adequate description of the migration patterns. The deviance of the quasi-independence model is 69.5 on 5 degrees of freedom, indicating that the model fits badly. |