You are here: MHHE Home | Sociology Home | Statistics Primer for Sociology
Introduction

Representation of Data

Descriptive Statistics

Correlation Statistics

Inferential Statistics

Summary
Descriptive Statistics

Suppose you gained access to the hundreds, or thousands, of high school grade point averages of all the freshmen at your college or university. What is the most typical score? How similar are the scores? Simply scanning the scores would provide, at best, gross approximations of the answers to these questions. To obtain precise answers, sociologists use descriptive statistics, which include measures of central tendency and measures of variability.

Measures of Central Tendency

A measure of central tendency is a single score that best represents an entire set of scores. The measures of central tendency include the mode, the median, and the mean.

Mode

The mode is the most frequently occurring score in a set of scores. In the frequency distribution of exam scores discussed, the mode is 90. If two scores occur equally often, the distribution is bimodal. If the data set is made up of a counting of categories, then the category with the most cases is considered the mode. For example, in determining the most common academic major at your school, the mode is the major with the most students. The winner of a presidential primary election in which there are several candidates would represent the mode--the person selected by more voters than any other.

The mode can be the best measure of central tendency for practical reasons. Imagine a car dealership given the option of carrying a particular model, but limited to selecting just one color. The dealership owner would be wise to choose the modal color.

Learning Check #5: A researcher is interested in the effect of family size on self-esteem. To begin this study, 10 students are each asked how many brothers and sisters they have. The responses are as follows: 2, 3, 1, 0, 9, 2, 3, 2, 4, 2. What is the mode for this set of data?


Click here for Answer.

Mean

The mean is the arithmetic average, or simply the average, of a set of scores. You are probably more familiar with it than any other measure of central tendency. You encounter the mean in everyday life whenever you calculate your exam average, batting average, gas mileage average, or a host of other averages.

The mean of a sample is calculated by adding all the scores and dividing by the number of scores.

Exam Scores: 99, 92, 93, 94, 97

Learning Check #6: What is the mean number of brothers and sisters listed in Learning Check #5?


Click here for Answer.

Median

The median is the middle score in a distribution of scores that have been ranked in numerical order. If the median is located between two scores, it is assigned the value of the midpoint between them (for example, the median of 23, 34, 55, and 68 would equal 44.5). The median is the best measure of central tendency for skewed distributions, because it is unaffected by extreme scores. Note that in the example below the median is the same in both sets of exam scores, even though the second set contains an extreme score. The mean is quite different, due to the one extreme score on Exam B.

Exam A: 23, 25, 63, 64, 67

Exam B: 23, 25, 63, 64, 98

When Disraeli pointed out the ease of lying with statistics, he might have been referring, in particular, to measures of central tendency. Suppose a baseball general manager is negotiating with an agent about a salary for a baseball catcher of average ability. Both might use a measure of central tendency to prove their own points, perhaps based on the salaries of the top seven catchers, as shown in Table B.2. The general manager might claim that a salary of $340,000 (the median) would provide the player with what he deserves, based on an average salary of the other players. The agent might counter that a salary of $900,000 (the mean) would provide the player with what he deserves, based on an average salary of the other players. Note that neither would technically be lying: they would simply be using statistics that favored their position. As Scottish writer Andrew Lang (1844-1912) warned, beware of anyone who “uses statistics as a drunken man uses lampposts--for support rather than for illumination.”

Learning Check #7: What is the median number of brothers and sisters listed in Learning Check #5?


Click here for Answer.

Learning Check #8: Note that the mean number of brothers and sisters is quite a bit different than the median number of brothers and sisters. In this case, which measure of central tendency would be most appropriate to report? Why?


Click here for Answer.

Measures of Variability

Although a measure of central tendency is certainly important, it does not completely represent a distribution by itself. Given a measure of central tendency, you have an idea of where scores tend to fall, but you don’t know to what extent the scores differ from one another. A measure of the amount of dispersion contained within a data set is called a measure of variability. Except when all scores in a data set are identical, all sets of scores vary to some degree. Consider the members of your sociology class. They would vary on a host of measures, including height, weight, and grade point average. Measures of variability include the range, the variance, and the standard deviation.

Range

The range is the difference between the highest and lowest scores in a distribution. The range provides limited information, because distributions in which scores bunch up toward the beginning, middle, or end of the distribution might have the same range. Of course the range is useful as a rough estimate of how a score compares with the highest and lowest in a distribution. For example, a student might find it useful to know whether he or she did near the best or the worst on an exam. The range of scores in the distribution of 20 grades in the earlier example in Table B.1 would be the difference between 94 and 80, or 14.

Learning Check #9: A social researcher would like to know how many digits people in different age categories can recall with only one presentation of a list. She creates random lists of digits and presents them to participants. The number of digits recalled by the first 10 participants is as follows: 5, 9, 6, 10, 9, 7, 8, 7, 9, 12. What is the range of this data set?


Click here for Answer.

Variance

A more informative measure of variability is the variance, which represents the variability of scores around their group mean. Unlike the range, the variance takes into account every score in the distribution. Technically, the variance is the average of the squared deviations from the mean.

Suppose you wanted to calculate the variance for the sets of 10-point quiz scores in Quiz A and Quiz B (Table B.3). First, find the group mean. Second, find the deviation of each score from the group mean. Note that deviation scores will be negative for scores that are below the mean. As a check on your calculations, the sum of the deviation scores should equal zero. Third, square the deviation scores. By squaring the scores, negative scores are made positive and extreme scores are given relatively more weight. Fourth, find the sum of the squared deviation scores. Fifth, divide the sum by the number of scores. This yields the variance. Note that the variance for Quiz A is larger than that for Quiz B, indicating the students were more varied in their performances on Quiz A.

Standard Deviation

The standard deviation, or S, is the square root of the variance. The standard deviation of Quiz A would be

S = 3.19.

The standard deviation of Quiz B would be

S = 1.414.

Why not simply use the variance? One reason is that, unlike the variance, the standard deviation is in the same units as the raw scores. This makes the standard deviation more meaningful. Thus, it would make more sense to discuss the variability of a set of IQ scores in IQ points than in squared IQ points. The standard deviation is used in the calculation of many other statistics.

Learning Check #10: The exam scores for two sections of introductory sociology are listed below. Compute the standard deviation for each section. Section #1: 42, 45, 56, 56, 60, 62, 67, 68, 70, 71. Section #2: 57, 57, 57, 70, 75, 77, 79, 83, 83, 92.


Click here for Answer.

Learning Check #11: Suppose that there were two groups that discussed issues related to abortion. Each member of each group rated on a scale of 1 to 10 their opinion regarding abortion (1 = Totally against abortion; 5 = Neutral; 10 = Totally in favor of abortion). The mean for Group A was found to be 5 with a standard deviation of .02. For Group B the mean was also 5, but the standard deviation was 3.42. Which group would have the more lively debates?

Click here for Answer.

The Normal Curve and Percentiles

As illustrated in Figure B.5, the normal curve is a bell-shaped graph that represents a hypothetical frequency distribution in which the frequency of scores is greatest near the mean and progressively decreases toward the extremes. In essence, the normal curve is a smooth frequency polygon based on an infinite number of scores. The mean, median, and mode of a normal curve are the same. Many variable human characteristics, such as height, weight, and intelligence, fall on a normal curve.

One useful characteristic of a normal curve is that certain percentages of scores fall at certain distances (measured in standard deviation units) from its mean. A special statistical table makes it a simple matter to determine the percentage of scores that fall above or below a particular score or between two scores on the curve. For example, about 68 percent of scores fall between plus and minus one standard deviation from the mean; about 95 percent fall between plus and minus two standard deviations from the mean; and about 99 percent fall between plus and minus three standard deviations from the mean.

For example, consider an aptitude test, with a mean of 100 and a standard deviation of 15. What percentage of people score above 115? Because aptitude scores fall on a normal curve, about 34 percent of the scores fall between the mean and one standard deviation (in this case 15 points) above the mean. We also know that for a normal distribution 50 percent of the scores fall above the mean and 50 percent fall below the mean. Thus, about 84 percent (50 percent below the mean and 34 percent between the mean and a score of 115) of the scores fall below 115. If 84 percent fall below 115, then 16 percent (100 percent minus 84 percent) must fall above a score of 115.

Learning Check #12: An introductory sociology teacher who has taught for years has developed a comprehensive final exam that is normally distributed with a mean of 200 points and a standard deviation of 25 points. (a) What percentage of the students score above 200 points? (b) What percentage of the students score below 175 points? (c) What percentage of the students score more than 250 points?


Click here for Answer.

Scores along the abscissa of the normal curve also represent percentiles--the scores at or below which particular percentages of scores fall. Percentiles are frequently used, as they give us a quick idea of how a score compares with the rest of the data set. If a score is equal to the 10th percentile, then you know that 10 percent of the scores fell at or below that value and 90 percent of the scores were above that value. With respect to IQ scores, a score of 115 would have a percentile rank of 84.

Learning Check #13: What are the percentile ranks for the three scores listed in Learning Check #12: 200, 175, and 250?


Click here for Answer.

Learning Check #14: Suppose you take your daughter Emily to the doctor’s office for a well-check and find out that she is in the 5th percentile for height and 7th percentile for weight. What do you now know about Emily, as compared with other children her age?


Click here for Answer.

feedback form | permissions | international | locate your campus rep | request a review copy

digital solutions | work with us | customer service | mhhe home


Copyright ©2001 The McGraw-Hill Companies.
Any use is subject to the Terms of Use and Privacy Policy.
McGraw-Hill Higher Education is one of the many fine businesses of the The McGraw-Hill Companies.