|
Logistic Regression
Many categorical response variables have only two values. For instance, in a medical study, the patient survives or dies. Or, in a marketing study a consumer switches brands or is loyal to one particular brand. Here, identifying the variables that are related to the probability of survival or switching is of practical or scientific interest. In these situations we can use logistic regression to determine the important variables. In these examples summary statistics are of direct intrest, and are often reported as percentages or proportions. Logistic regression models these proportions as function of explanatory variables and gives an answer to questions such as which variables are important in determining wheter a consumer switches brands or survival or death of a patient.

Logistic regression forms a predictor variable which is a linear combination of the explanatory variable. The values of this predictor variable are then transformed into probabilities by a logistic function. Such a function which has the shape of an S, and is shown in the figure on this page. On the horizontal axis we have the values of the predictor variable, and on the vertical axis we have the probabilities. A value of -2 on the predictor variable corresponds to a probability of .12, a value of -1 on the predictor value corresponds to a probability of .27.
Odds ratio
The odds ratio is a measure of effect size in logistic regression. First, the definition of the odds of an event is
odds= p/(1-p),
where p is the probability of the event of the study. Suppose the event is being cured, and when no treatment is given the probability
of being cured is .66, hence the odds is .66667/.33333=2. Now, a new therapy is introduced which cures 80%, how much better is this
new treatment. The odds of being cured is .80/.20 = 4. The odds ratio is the odds in the no treatment group divided by the odds in the treatment group.
The odds of being cured is two times higher with the new treatment compared to
no treatment (4/2) , hence the odds ratio is two.
Logit transformation
The logarithm of p/(1-p) is called the logit, and maps probabilities onto the the scale of the linear predictor in
logistic regression. The log odds is the logarithm of the odds of the probabilities.
Logistic Regression Example
The logistic regression procedure gives three tables, a table with a summary of analysis, a table with coefficients, and a deviance table.
As an example, we analyze the data from mendenhall et al. (1989) on the effect of radiotherapy. The first variable is the number of days of radiotherapy received by 24 patients, and the second variable is the absence (1) or presence (0) of disease three years after treatment.
A logistic regression with disease as response and therapy as explanatory variable gives 3.81944 as the intercept, and -0.08648 as the coefficient for radiotherapy. Each additional day of radiotherapy decreases the odds of absence of disease by about 8%.
| Summary of Statistical Analysis |
|
| Number of observations: |
24 |
| Observations with Missing values: |
0 |
| Response Variable: |
desease |
The table labeled parameters gives parameter estimates with asymptotic standard errors, Wald tests and p-values. The coefficient of therapy (-.086) indicates that therapy is helpfull in reducing the probability of desease.
|
Coefficient |
Standard Error |
Wald Test |
Chi Square > p |
| intercept |
3.81944 |
1.83516 |
4.33163 |
0.03741 |
| therapy |
-0.08648 |
0.04322 |
4.00424 |
0.04539 |
The deviance table decomposes the deviance into two components, a residual and a model component.
Because only one variable is used in the model, the hypothesis tested by the deviance statistic is the same as for the Wald test. The p-values of the two test statistics are very similar.
|
Deviance |
df |
p-value |
| Model |
4.81306 |
1 |
0.02824 |
| Residual |
27.78822 |
22 |
0.18281 |
|