Unlike OLS regression, logistic regression does not try to predict the value of a numeric variable given a set of predictor variables. Instead, the output is a probability that the given input point belongs to a certain class. The central premise of logistic regression is the assumption that your “input space” can be separated into two regions (one for each class) by a straight line boundary.
Logistic regression does not make many of the key assumptions of linear regression that are based on ordinary least squares mathematics (e.g., linearity, normality, homoscedasticity, and measurement level).
- Logistic regression does not require a linear relationship between the dependent and independent variables. It can handle all sorts of relationships, primarily because it applies a non-linear log transformation to the predicted odds ratio.
- The independent variables do not need to be multivariate normal (although multivariate normality is known to yield a more stable solution). Also the residuals (error) do not need to be multivariate normally distributed.
- Homoscedasticity is not required. That is, logistic regression does not need variances to be heteroscedastic for each level of the independent variables.
- Logistic regression can handle ordinal and nominal data as independent variables. That is, the independent variables do not need to be measured at the interval or ratio level.
Some assumptions, however, are retained. When logistic regression is used, keep the following requirements in mind:
- Logistic regression requires large sample sizes. Maximum Likelihood (ML) estimates are less powerful than OLS. Many researchers use a “rule of thumb” when using OLS: You need a bare minimum of 5 cases per independent variable in the analysis. A similar rule says that ML needs at least 10 cases per independent variable, with some researchers recommend at least 30 cases for each parameter to be estimated.
- Obviously, binary logistic regression requires the dependent variable to be binary and ordinal logistic regression requires the dependent variable to be ordinal. Reducing an ordinal or even a continuous variable to the dichotomous level throws away a lot of information. That makes binary logistic regression inferior compared to ordinal logistic regression in these cases.
- As with all statistical modeling, the model should be fitted correctly. Both “overfitting” and “underfitting” are problems. There are several statistical methods that aid in the selection of variables to include in a model. As with fitting OLS models, you should use theoretically driven modeling.
- Logistic regression requires each observation to be independent. That is that the data-points should not be from any dependent samples design (e.g., pretest-posttest designs, matching designs).
- The predictor variables should have little or no multicollinearity. That is, the independent variables should be independent from each other.
Logistic regression assumes linearity of independent variables and log odds. In other words, logistic regression does not require the dependent and independent variables to be related linearly (as does OLS), but it requires that the independent variables are linearly related to the log odds. Failure to meet this requirement can greatly diminish the power of hypothesis tests.
Logistic regression, binary, odds ratio, ordinal logistic regression, Homoscedasticity
Last Modified: 02/14/2019