Advanced Statistics | Section 2


Advanced Statistical Analysis:

A Primer

Adam J. McKee, Ph.D.


DRAFT - Do Not Distribute

This content is released as a draft version for comment by the scholarly community.  Please do not distribute as is.  


Linear Models and Least Squares

Shakespeare may have concluded that there isn’t much significance in a mere name, but they sure do cause a lot of confusion in statistics.  A big part of the problem is that different folks from different disciplines have done a lot of hard work to figure out how to analyze particular types of data.  As you’d expect in the ivory tower, different disciplines didn’t communicate very often.  The advent of computer hardware and ever increasingly powerful software have narrowed the gap.  Another reason for the confusion is the use of different terms differently by different folks.  At least some of this has to do with formality.  In my doctoral program, when someone said “regression” we automatically knew that they were talking about “ordinary least squares multiple linear regression.”   That’s way too big of a mouthful to go spouting all the time!  The shorthand works well if you are sure to qualify other types of regression when that’s what you mean.

Another point of confusion is between simple regression and multiple regression.  Often, textbook authors and kind professors differentiate between the two to provide a gentle introduction to basic concepts using the simplest possible models before adding layers of complexity.  Others texts will talk about multiple regression as if there were no such thing as simple regression, merging the two concepts.  This text, because it is designed to be a sort of gentle introduction, takes the kinder, gentler approach.  Just beware that some authors will merge the two concepts, and we can’t say they are “wrong” for doing so.

Why Focus On Regression?

Regression is a happy medium between power, versatility, and usability.  As a robust subset of the General Linear Model, it can provide the same results as a wide array of tests in the t and F families.  Another reason, I must confess, is my personal philosophy.  Statistics students are often dazzled and confused by the wide array of tests, and try to differentiate between them based on differences.   The problem with this is that they (please forgive the tired metaphor) often lose sight of the forest paying too close attention to individual trees.  Most statistical procedures have a small handful of objectives.  It all boils down to the explanation and prediction of variation in some variable of interest.  It is much easier to tackle the intricacies of multicollinearity and homoscedasticity once you have the big picture firmly in mind.

File Created: 08/24/2018
Last Modified:  08/24/2018

This work is licensed under an Open Educational Resource-Quality Master Source (OER-QMS) License.

Open Education Resource--Quality Master Source License


[ Back | Content | Next ]


Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.