Outliers

Fundamentals of Social Statistics by Adam J. McKee

An outlier can be visualized as that lone point on a graph that stands distinctly apart from the cluster of other points. It’s that score in a distribution that, due to its magnitude or position, looks distinct from the rest. The uniqueness of outliers is their deviation from what is typical or expected in a given set of data. For instance, if we consider the realm of household incomes, the earnings of tech moguls like Bill Gates would undoubtedly be outliers. His income would dramatically exceed the median, placing him far away from the average person’s earnings.

Defining Outliers Statistically

From a rigorous statistical standpoint, outliers are not merely data points that seem odd or out of place upon casual observation. They possess a specific definition grounded in the principles of statistics. Outliers are typically recognized as scores that deviate by more than three standard deviation units from the dataset’s mean or average. But what does this mean? The standard deviation is a crucial statistical measure that quantifies the amount of variability or dispersion a set of data exhibits.

In simpler terms, it tells us how spread out the numbers in a dataset are from their average value. So, when a particular data point is situated more than three times the standard deviation from the mean, it stands out significantly from the majority of other data points in that set. This substantial deviation from the norm, by the yardstick of standard deviations, earmarks that specific data point as an outlier. This criterion is a widely accepted method in statistics to identify and label data points that are remarkably different from the rest.

Outliers and “Abnormality”

Outliers in data are closely analogous to the concept of “abnormal” in psychology. Both outliers and what’s considered “abnormal” diverge from what is regarded as typical or expected within a particular context. In statistics, as discussed, outliers are those data points that stray considerably from the mean, especially those that lie more than three standard deviations away. Similarly, in psychology, “abnormal” behavior or mental processes are those that deviate markedly from societal norms or expectations.

“Abnormality” in psychology is a multi-faceted concept, defined and perceived differently depending on cultural, historical, and situational contexts. While statistical rarity, such as an extremely high IQ, can be considered atypical, it doesn’t always align with the psychological definition of abnormality. Many high achievers or prodigies could be deemed statistical outliers but are not necessarily labeled as having abnormal psychology.

Conversely, behaviors, thoughts, or emotions that lead to personal distress, dysfunction, or danger might be considered abnormal even if they’re statistically common. For instance, anxiety disorders are prevalent, but they might still be regarded as abnormal because they cause significant distress or impairment in daily functioning. Here, the term “abnormal” is less about rarity and more about the impact on well-being and functionality.

This interplay between statistics and psychology showcases the nuances in defining what’s considered “outside the norm.” While statistical methods provide a concrete measure to identify outliers, the realm of psychology requires a more holistic approach, integrating societal norms, personal distress, and functional impairment, among other factors, to classify something as abnormal. Understanding this distinction is crucial for professionals in both fields, as what is statistically unusual may not always be psychologically concerning, and vice versa.

Implications of Outliers

The presence of outliers can greatly influence statistical results. For instance, they can dramatically skew the mean, misleadingly inflating or deflating it. Additionally, outliers can impact the results of regression analyses, making predictions less reliable. Hence, before diving into data analyses, it’s pivotal for researchers to identify and appropriately address outliers to ensure robust and credible findings.

Outliers and Their Effect on Skewness

Skewness, in statistical terms, refers to the asymmetry of a distribution around its mean. A distribution can be said to be skewed when it has an uneven dispersion of scores leaning towards one side. Outliers play a pivotal role in determining the direction and magnitude of this skew.

Positive Skew

When the outliers predominantly exist on the right side or the higher end of a distribution, the distribution exhibits a positive skew. In such cases, the tail of the distribution stretches out towards the right, indicating that the exceptional scores are higher than the average. For example, in a community where most residents earn around $50,000 annually, but a few earn $1,000,000 or more, the distribution of incomes would be positively skewed due to these high-income outliers.

Negative Skew

Conversely, when the outliers are primarily on the left side or the lower end of a distribution, we observe a negative skew. Here, the tail of the distribution is extended towards the left. This indicates that the atypical scores are lower than the majority of data points. Imagine a scenario where most students in a class score above 85%, but a few score below 40%. The distribution of scores in this scenario would be negatively skewed due to these low-scoring outliers.

Addressing Outliers

Given the potential influence of outliers, it’s essential for analysts and researchers to decide how to handle them. Some common approaches include transforming the data to minimize the impact of outliers, using statistical techniques robust to outliers, or simply removing them. The choice often depends on the nature of the study and the data at hand.

Importance of Context in Identifying Outliers

While statistical definitions provide a general guideline, identifying outliers isn’t merely a mathematical exercise. Context is crucial. What may be considered an outlier in one context might be expected in another. For instance, a salary of $1 million might be an outlier among school teachers but not among top executives of large corporations.

Summary

Outliers, in a dataset, are distinctive data points that significantly deviate from the norm. Picturing a graph, outliers can be visualized as isolated points that stand out from the general cluster of data points. An example can be drawn from the field of household incomes, where earnings of tech moguls like Bill Gates would stand out as outliers, given their vast difference from median incomes. Statistically, outliers are usually defined by their distance from the mean, especially if they are more than three standard deviations away.

This concept draws parallels with “abnormality” in psychology, which refers to behaviors or patterns that significantly deviate from societal expectations. While some statistical outliers may not be considered “abnormal” in a psychological context, the term “abnormal” in psychology encompasses more than just statistical rarity; it also considers distress and dysfunction.

Outliers can have substantial implications in statistics, affecting analyses and skewing results. Skewness, which describes the asymmetry of data distribution, can be influenced by outliers, leading to either positive or negative skew. Addressing outliers is crucial in data analysis, and the approach varies depending on the context and nature of the study. Recognizing outliers, while guided by statistical measures, often requires contextual understanding to ensure accurate data interpretation.


[ Back | Contents | Next ]

Last Modified:  09/25/2023

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.