Chi-Square with SPSS

 

This page contains information on Chi-Square Test for Goodness of Fit and the Test for Independence


The chi-square goodness of fit test and test for independence are both available on SPSS. Recall that chi-square is useful for analyzing whether a frequency distribution for a categorical or nominal variable is consistent with expectations (a goodness of fit test), or whether two categorical or nominal variables are related or associated with each other (a test for independence). Categorical or nominal variables assign values by virtue of being a member of a category. Sex is a nominal variable. It can take on two values, male and female, which are usually coded numerically as 1 or 2. These numerical codes do not give any information about how much of some characteristic the individual possesses. Instead, the numbers merely provide information about the category to which the individual belongs. Other examples of nominal or categorical variables include hair color, race, diagnosis (e.g., ADHD vs. anxiety vs. depression vs. chemically dependent), and type of treatment (e.g., medication vs. behavior management vs. none). Note that these are the same type of variables that can be used as independent variables in a t-test or ANOVA. In the latter analyses, the researcher is interested in the means of another variable measured on a interval or ratio scale. In chi-square, the interest is in the frequency with which individuals fall in the category or combination of categories.

 

Chi-Square Test for Goodness of Fit

A chi-square test for goodness of fit can be requested by clicking Statistics > Nonparametric Tests > Chi-square. This opens up a window very similar to other tests. Enter the variable to be tested into the Test Variable box. Then a decision about the expected values against which the actual frequencies are to be tested needs to be made. The most common choice is  "All categories equal." However, it is also possible to enter specific expected values by checking the other circle and entering expected values in order. The expected values used in computing the chi-square will be proportional to these values. The Options... button provides access to missing value options and descriptive statistics for each variable. To submit the analysis click the OK button. Results for a goodness of fit chi-square are shown below.

 

NPar Tests

Chi-Square Test

Frequencies

                                Class

  Observed N Expected N Residual
first-year 3 3.0 .0
Sophomore 3 3.0 .0
Junior 3 3.0 .0
Senior 3 3.0 .0
Total 12    

[The data were taken from the previous ANOVA example.]

[The "residual" is just the difference between the observed and expected frequency.]

 

[Warning: Using the Chi-Square statistic is questionable here because all four cells have expected frequencies less than 5. See your statistics textbook for advice if you are in this situation.]

  Class
Chi-Square .000
df 3
Asymp. Sig. 1.000

a. 4 cells (100.0%) have expected frequencies less than 5. The minimum expected cell frequency is 3.0.


The value under "Chi-Square" is .0000 because the cell frequencies were all equal. As usual, statistically significance results are indicated by  "Asymp. Sig.[nificance]"  values below .05. Obviously, this example is NOT statistically significant. In words, these results indicate that the obtained frequencies do not differ significantly from those that would be expected if all cell frequencies were equal in the population.

Return to the Top of the Page

 

 

Chi-Square Test for Independence

The chi-square test for independence is a test of whether two categorical variables are associated with each other. For example, imagine that a survey of approximately 200 individuals has been conducted and that 120 of these people are females and 80 are males. Now, assume that the survey includes information about each individual's major in college. To keep the example simple, assume that each person is either a psychology or a biology major. It might be asked whether males and females tend to choose these two majors at about the same rate or does one of the majors have a different proportion of one sex than the other major. The table below shows the case where males and females tend to be about equally represented in the two majors. In this case college major is independent of sex. Note that the percentage of females in psychology and biology is 59.8 and 60.2, respectively. Another way to characterize these data is to say that sex and major are independent of each other because the proportion of males and females remains the same for both majors.

 

  Psychology Majors Biology Majors
Females 58 62
Males 39 41

 

The next example shows the same problem with a different result. In this example, the proportion of males and females depends upon the major. Females compose 79.6 percent of psychology majors and only 39.2 percent of biology majors. Clearly, the proportion of each sex is different for each major. Another way to state this is to say that choice of major is strongly related to sex, assuming that the example represents a statistically significant finding. It is possible to represent the strength of this relationship with a coefficient of association such as the contingency coefficient or Phi. These coefficients are similar to the Pearson correlation and interpreted in roughly the same way.

 

  Psychology Majors Biology Majors
Females 82 38
Males 21 59

 

The method for obtaining a chi-square test for independence is a little tricky. Begin by clicking Statistics > Summarize > Crosstabs.... Transfer the variables to be analyzed to the Row(s) and Column(s) boxes. Then go to the Statistics... button and check the Chi-square box and anything that looks interesting in the Nominal Data box, followed by the Continue button. Next, click the Cells... button and check any needed descriptive information. Percentages are particularly useful for interpreting the data. Finally, click OK and the output will quickly appear.

Sample results are shown below. These data are from the ANOVA example so the number of observations in each cell is only two. This is a problematic situation for chi-square analysis and, should this be encountered in an actual analysis, consulting a textbook is recommended. Furthermore, the results are far from significant because the distribution of sex across class remains constant.

Crosstabs

Case Processing Summary

 

Cases

Valid

Missing

Total

N

Percent

N

Percent

N

Percent

SEX * CLASS

16

100.0%

0

.0%

16

100.0%

The "Case Processing Summary" provides some basic information about the analysis. In studies with large numbers of participants, this information can be very useful.

 

Sex * Class Crosstabulation

 

Class

Total

1.00

2.00

3.00

4.00

Sex 1.00 Count

% within Sex

% within Class

% of Total

2

25.0%

50.0%

12.5%

2

25.0%

50.0%

12.5%

2

25.0%

50.0%

12.5%

2

25.0%

50.0%

12.5%

8

100.0%

50.0%

12.5%

2.00 Count

% within Sex

% within Class

% of Total

2

25.0%

50.0%

12.5%

2

25.0%

50.0%

12.5%

2

25.0%

50.0%

12.5%

2

25.0%

50.0%

12.5%

8

100.0%

50.0%

50.0%

Total Count

% within Sex

% within Class

% of Total

4

25.0%

100.0%

25.0%

4

25.0%

100.0%

25.0%

4

25.0%

100.0%

25.0%

4

25.0%

100.0%

25.0%

16

100.0%

100.0%

100.0%

Note: The above results can be obtained by requesting all the available percentages in the cross-tabulation. In this simple example, the percentages are not very useful. However, when large numbers of participants are in the design, the percentages help greatly in understanding the pattern of the results. Also, when the analysis is presented in a research report, the percentages within one of the variables will help the reader interpret the results.


                                        Chi-Square Tests

Chi-Square Value df Asymp. Sig (2-sided)
Pearson [Standard computation]

Likelihood Ratio

Linear-by-Linear Association

N of Valid Cases

.00000

.00000

.00000

16

3

3

1

1.000

1.000

1.000

a. 8 cells (100.0% have expected count less than 5. The minimum expected count is 2.00.

The values for "Sig"  are probabilities. A statistically significant result has a probability of less than .05.


Other Helpful Features of SPSS

There are a number of additional features available in SPSS that can be extremely helpful for the beginning researcher. These features will be described briefly.

 

Return to the Top of the Page

 

 

 


Copyright 2000 The McGraw-Hill Companies. All rights reserved. Any use is subject to the Terms of Use and Privacy Policy.
McGraw-Hill Higher Education is one of the many fine businesses of The McGraw-Hill Companies.

If you have a question or a problem about a specific book or product, please fill out our Product Feedback Form.
For further information about this site contact mhhe_webmaster@mcgraw-hill.com
or let us know what you think by filling out our Site Survey.