No single statistical summary point can be more informative than the average. Any time you have a large set of numerical data it's a convenient way to sum things up. But you have to be careful. The average can mean one of several different terms, and if you pick the right one, your one-point summary can wind up being unintentionally misleading.
Mean, median, or mode?
The problem is that there are three separate statistical terms that can be said to be the average. These three terms are mean, median, and mode. When presenting summary data it's often important to disclose which of these averages you're using.
The arithmetic mean is what we typically think of when we hear the term average. The mean is calculated by adding all the numbers in a set and dividing their sum by the cardinality of the set. (Cardinality is a special that mathematicians use to denote the size of a set. In this context, it just means the number of numbers, or size of a group of numbers.)
The median is just what it sounds like, the number in the middle of a set of numbers. Be careful though, you have to sort the numbers from lowest to highest (or vice versa) before you take the one in the middle. Also, don't remove any duplicates before you take the median.
The mode is simply the value that occurs most frequently in a set of numbers.
An Example in Joke Form
Bill Gates walks into a bar. The average income of the people in the room jumps up by 10,000%.
Okay, so what that joke lacks in originality, it makes up for in illustrative clarity (and brevity, thankfully). Say we're in a very upscale bar in Seattle. Lunch rush is over and Jimmy, the bar owner, is waiting for happy hour. There are only 8 patrons at the bar. The annual salaries of Jimmy and patrons are as follows
$70,000
$85,000
$87,000
$87,000
$91,000
$92,000
$95,000
$98,000
$105,000
In this case, the mean of the salaries is $90,000, the median, is $91,000, and the mode is $87,000. When Mr. Gates walks in with his net worth of around $50 billion, the mean skyrockets to just over $5 billion while the median rises to only $91,500 and the mode stays the same.
Some Conclusions
So what does this little modern-day parable tell us? Mainly it tells us that the first round should be on Bill. Secondly, it tells us that for numerical data that are not spread out evenly, like annual salaries and net worths, the arithmetic mean is not the best way to summarize the data. The mean and the mode are probably a better way to represent the middle of the group with this type of data.
How it's Abused
The average is used to mislead all the time, both intentionally and unintentionally. Go back and look at the original nine numbers used in the bar example. If this is your data set, it probably doesn't matter which of the three averages you use, because the numbers are are grouped fairly close together. The difference only becomes important when the data isn't grouped tightly, or more often, when some of the data is heavily skewed to one end of the scale.
The average can be used to mislead by carefully selecting which average you report. The New York Times recently reported that the U.S. national average salary increased by nearly 9% in 2005. This sounds very positive unless you realize that salaries are heavily skewed by the top 10% of incomes. If you remove those top salaries from the average, the remaining 90% saw our average salary drop by nearly a percent. Of course, the same New York Times article on the ever widening salary gap in the U. S. disclosed this fact as well. A less reputable source might have ignored the skewedness of the data.
When dealing with averages, remember that the goal should be to report the best summary for the set of data. If someone reports an average to you, you should think about how spread out the data might be, and if it might be skewed by a few high or low data points. Ask them if they reported the mean, median, or mode. If you don't take precautions, you could be very easily misled.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment