# The significant number of 420 is 4

## Chi-square test

The chi-square test (χ² test) can be found in many studies in which**Frequencies** be compared. For example, while the t-test requires at least the interval scale, the chi-square test is used for nominally scaled (categorical) variables. The chi-square test then makes a statement about whether the observed frequencies differ significantly from those that one would expect.

In this article we will discuss both the**χ² goodness of fit test** as well as the**χ² test for independence**.

χ² statistics are used in many scientific areas, including for cohort studies (empiricism), case-control studies (medicine), hedging of options (economics) and option price theory (financial mathematics).

The calculation of the χ² value requires relatively simple - but relatively many - calculations. For each cell in a chi-square table, the following must be calculated:

However, this formula must be calculated for each cell in a crosstab. Therefore, the chi-square value is calculated as the sum of all these values:

### requirements

There are, however, a few**requirements** and rules that must be met in order to be able to calculate the χ² test:

**The expected frequencies in each cell must be greater than 5.**If this is not the case, the results of the χ² test will be somewhat inaccurate. Some authors are a little more generous and state that a fifth of all cell frequencies must be less than 5 for the χ² test to deliver valid results. On the other hand, some statistical programs calculate alternatives to the χ² test if this rule has been broken even once.- The χ² test is only allowed to
**Frequencies**are never applied to relative values such as percentages. - The sample is
**coincidentally**been taken.

If the first condition is not met, the**Fisher-Yates test** (also called the exact χ² test).

### restrictions

- As with all significance, this is also the case here
**Sample size**important: the larger the sample, the more likely small differences will become significant. Hence one says**significant**First of all, the result is nothing about the strength of the effect (connection). - The χ² test only says that there are differences - but not those
**direction**of the effect. From a χ² test that has become significant, it is therefore not possible to read off whether the observed values are larger or smaller than the expected ones, but only that there is a difference. - The χ² value says nothing about that
**Strength of the effect**out. To calculate the effect size, we need other measures such as Cramérs V (2 × 2 cross table) or the contingency coefficient C.

### Chi-square independence test

Column 1 | Column 2 | ... | Columns | Σ | |
---|---|---|---|---|---|

line 1 | H_{1,1} | H_{1,2} | ... | H_{1, n} | H_{1,•} |

line 2 | H_{2,1} | H_{2,2} | ... | H_{2, n} | H_{2,•} |

... | ... | ... | ... | ... | ... |

Line m | H_{m, 1} | H_{m, 2} | ... | H_{m, n} | H_{m, •} |

Σ | H_{•,1} | H_{•,2} | ... | H_{•, n} | H_{•,•} |

The chi-square independence test examines whether a frequency distribution of a nominally scaled variable is stochastically independent of another nominally scaled variable.

In a chi-square independence test, we compare two categorical variables. A simple tool to do this is a**Crosstab** (right). In a crosstab, the values of a variable are written in the columns and the values of the other variables are written in the rows. In the cells (symbolized here with an h) the **Compound frequencies** of both variables, i.e. the frequencies for which both the criterion from the row and that from the column apply. The sums of all values in the column are written in the last column; in the last line, all values in the line are added up in the same way. The last row in the last column (the cell at the bottom right) contains the sum of all values, h_{•,•} (also written as N). The dots in the subscript symbolically express this summation.

The test statistic of the chi-square test of independence is calculated as follows:

- H
_{0}: The random variables A and B are stochastically independent of each other - H
_{1}: The random variables A and B are not stochastically independent of each other

The**Degrees of freedom** (df) are calculated using the number of categories in an n × m crosstab of the two random variables

df = (n - 1) (m - 1)

### Example with explanation

In the cross table below we see the distribution of income on the highest level of educational attainment for 1,000 randomly interviewed people. We want to know whether this distribution corresponds to what we would normally have expected. In other words: we want to know whether income and educational qualifications are related, whether the educational qualification has an influence on income. Hence ours are **Hypotheses**:

- H
_{0}: Income and the highest level of educational attainment are independent - H
_{1}: Income and the highest level of education obtained are not independent

income | no Graduation | secondary schools | secondary school | High School | Bachelor/ master | promotion | Σ |
---|---|---|---|---|---|---|---|

less than 1,500 euros | 17 | 132 | 103 | 95 | 39 | 4 | 390 |

1,500 to 3,000 euros | 5 | 6 | 32 | 95 | 92 | 9 | 239 |

over 3,000 euros | 1 | 12 | 44 | 121 | 148 | 45 | 371 |

Σ | 23 | 150 | 179 | 311 | 279 | 58 | 1.000 |

Now that we have the observed frequencies, how do we calculate the expected frequencies from them?

#### Expected cell frequencies

The expected cell frequencies are calculated directly from the observed. To calculate it, we apply the formula from the definition above to each cell. The formula calculates the relative frequency for the row () multiplied by the relative frequency of the column (). Multiplication means AND. We have thus calculated the relative frequency of the value that meets both criteria. To convert this relative frequency to an absolute frequency, we need to multiply by the sample size N. However, this formula can be further simplified to the formula above:

If we apply this formula to all cells, we get:

income | no Graduation | secondary schools | secondary school | High School | Bachelor/ master | promotion | Σ |
---|---|---|---|---|---|---|---|

less than 1,500 euros | H_{1,•} | ||||||

1,500 to 3,000 euros | H_{2,•} | ||||||

over 3,000 euros | H_{3,•} | ||||||

Σ | H_{•,1} | H_{•,2} | H_{•,3} | H_{•,4} | H_{•,5} | H_{•,6} | N (H _{•,•}) |

Applied to our example data set, this means that we would have expected the following values:

income | no Graduation | secondary schools | secondary school | High School | Bachelor/ master | promotion | Σ |
---|---|---|---|---|---|---|---|

less than 1,500 euros | 8,97 | 58,5 | 69,81 | 121,29 | 108,81 | 22,62 | 390 |

1,500 to 3,000 euros | 5,497 | 35,85 | 42,781 | 74,329 | 66,681 | 13,862 | 239 |

over 3,000 euros | 8,533 | 55,65 | 66,409 | 115,381 | 103,509 | 21,518 | 371 |

Σ | 23 | 150 | 179 | 311 | 279 | 58 | 1.000 |

As you can see, the expected frequencies do not have to be whole numbers, even if this might contradict the question. Since we now have the observed and expected frequencies, we can calculate the following using the formula of the χ² test:

Our crosstab has 6 columns and 3 rows. This means that the χ² distribution has (6-1) · (3-1) = 10 degrees of freedom. We now want to know how likely it is to get a value of 319.28402 or even more extreme. Using the cumulative distribution function, we get

With a P value of rounded zero, we are below our previously defined significance level of 5%. The χ² test has therefore become significant; we have to reject our null hypothesis because school-leaving qualifications and income are not independent of one another in our data.

### Yates's correction

Yate's correction is a (somewhat outdated) correction to the calculation formula to ensure that the data fit the theoretical chi-square distribution. It was originally developed for 2 × 2 crosstabs. Yate's correction is easy to apply by subtracting 0.5 from the numerator's amount before we square it:

Yates's correction has practically no effect if the expected cell frequencies are high. If the expected cell frequencies are low, the test variable and thus also the statistical significance are reduced. Even if we don't use Yate's correction in our examples and calculations, it can be done. The decision whether or not to correct according to Yates's is therefore up to the statistician.

### Chi-square goodness-of-fit test

The chi-square goodness-of-fit test examines how well an observed frequency distribution of a nominal variable corresponds to an expected frequency distribution. Goodness-of-Fit will too**Goodness of fit** or just**Adaptation** called.

The test variable χ² is calculated as follows:

With the hypotheses:

- H
_{0}: The random variable has the specified distribution - H
_{1}: The random variable does not have the specified distribution

**Degrees of freedom:** Number of possible expressions of the variable - 1.

### Example with explanation

monthly Net household income | Proportion of, in percent | survey |
---|---|---|

less than 1,300 euros | 18,8 | 110 |

1,300 to 2,600 euros | 32,8 | 176 |

2,600 to 3,600 euros | 18,7 | 73 |

3,600 to 5,000 euros | 15,6 | 80 |

more than 5,000 euros | 14,0 | 61 |

The Federal Statistical Office regularly compiles statistics on income levels in Germany. Its information for 2011 is summarized in the table on the right. In the year {Y} another survey was carried out among 500 randomly selected people. We want to know whether the income situation in Germany has changed statistically significantly within these years.

First we have to**Hypotheses** put up. These are for this question:

- H
_{0}: The distribution of household net income from the {Y} is the same as in 2011 - H
_{1}: The distribution of household net income from the {Y} is not the same as that of 2011

We test at a significance level of 5%.

The concept behind the χ² goodness-of-fit testing is that**observed frequencies** with the**expected frequency** to compare, assuming that the distribution from year {Y} is the same as that from 2011. If the observed frequency roughly corresponds to the expected, then we do not reject the null hypothesis.

However, in order to be able to answer this question, we have to answer the following questions:

- What frequencies can we expect from our sample of 500 people if both distributions are the same?
- How can we decide whether both distributions are the same?

The first question is easy to answer: If both distributions are the same, we would have to expect that the observed values (roughly) agree with the expected ones. In our example we have 176 people who earn between 1,300 and 2,600 euros per month; we expected 32.8% · 500 = 164. We can therefore calculate the expected frequencies using the simple formula E = n · p, where n is the sample size and p is the relative frequency. With this formula we can calculate the expected frequencies for all further income levels.

Income level | watched Frequency B | expected Frequency E E = n * p | difference B - E | Square of the difference (B-E) ² | χ² (B-E) 2 / E |
---|---|---|---|---|---|

less than 1,300 euros | 110 | 94 | 16 | 256 | 2,7234 |

1,300 to 2,600 euros | 176 | 164 | 12 | 144 | 0,87805 |

2,600 to 3,600 euros | 73 | 93,5 | -20,5 | 420,25 | 4,49465 |

3,600 to 5,000 euros | 80 | 78 | 2 | 4 | 0,05128 |

more than 5,000 euros | 61 | 70,5 | -9,5 | 81 | 1,14894 |

Σ | 500 | 500 | 0 | 9,29632 |

In order to determine the goodness of fit of the observed and expected frequencies, we have to consider the difference between these two values, which can be found in the fourth column of the table. However, this value is not very helpful because its sum is always zero. So we square the difference (fifth column) and divide it by the corresponding expected frequency (sixth column). This value is the χ² subtotal. The sum of all χ² subtotals is the χ² value,

which we as**Test variable** use for the goodness of fit of the observed and expected frequencies. If the null hypothesis is H._{0} is true, the observed and expected frequencies should approximately match. Therefore, the test variable should be a value close to zero.

If the null hypothesis is true, the difference between the observed and expected frequencies should be small, and so should the test variable calculated from it. Conversely, this also means that larger test variables indicate that the null hypothesis is false.

As we have calculated, the χ² value in our example is 9.29632. Is this value still small enough that it could be said that it was due to a sampling error alone, or is it large enough that we have to reject the null hypothesis? To answer that question, we need the**Distribution function** know the χ² value.

The underlying distribution function is the chi-square distribution (χ² distribution), which has a further parameter that changes its appearance and thus also the basis of calculation. This parameter is called**Degrees of freedom** (English:*degrees of freedom*). How a change in the degrees of freedom affects the χ² distribution can be explored in the interactive distribution function below. In our example, the variable has five characteristics (income levels), so we have 4 degrees of freedom. From this it is calculated

Since we always test on the right-hand side for the goodness of fit, we have to subtract one from the function value of the distribution function. With a**P value** above 0.05 we cannot reject the null hypothesis. (The P-value can be calculated with our calculator for the χ²-distribution.) We can therefore assume that the observed values agree with the expected values within the statistical framework that we have established using the α-value of 5% . In concrete terms, this means that the income in our sample of {Y} does not differ statistically from the income distribution from 2011.

### Properties of the chi-square distribution

The χ² distribution is the underlying distribution function of all χ² tests. It is also one of the most widely used distributions in interference statistics, with numerous applications in many scientific disciplines.

The chi-square distribution has the following characteristics:

- The
**the whole area**enclosed by the χ² curve is 1 - The χ² curve is on the x-axis
**Are defined**from 0 and extends into positive infinity - It is skewed to the right or to the left
- The higher the degrees of freedom, the more similar the χ² curve becomes to the normal distribution
- There are an infinite number of χ² curves, the appearance of which is defined by the degrees of freedom
- The χ² curve is
**asymmetrical**; the greater the degrees of freedom, the more symmetrical it becomes

In the table on the right, Γ (x) is the gamma function. It works like the factorial function, only for all real (and complex) numbers. If your function argument x is a natural number, then Γ (x) = (x – 1)!

### Interactive chi-square distribution

### Chi-square calculator

The calculator can be used to calculate the distribution function, the cumulative distribution function as well as the quantiles and confidence intervals of the chi-square distribution.

{ChiSquare Calculator}

- Lutherans are the first Protestants
- How do you deal with currency in JavaScript
- Is hate a strong word
- What do rich people eat for lunch
- Can we have grapes when we cough
- How do people see death
- How is a forward yield curve structured?
- 00 is divisible by 4
- Who unmasked Mr. P. Chidambaram?
- Some mothers are jealous of their daughters

- What is the Most Effective Anorexic Diet
- How intelligent is Mark Zuckerberg
- Why aren't more charities using influencer marketing?
- Which foods cause sinus inflammation?
- Are NFL games rigged
- What is sales and marketing department
- What am I supposed to feel when I deadlift
- Why are Egyptian symbols considered Illuminati
- Which is the best plugin for WordPress comments
- Dark energy is affected by black holes
- What is the first rule of your job
- A journey through time is theoretically possible
- What is Singapore Products for Shopping
- Is coconut water good for the acid
- MailChimp sells my email address
- Harvard students have trouble attending
- What are some examples of sales scripts
- Which milk is the thickest milk
- We are human from birth
- What is it like to have childhood cancer
- Why does Pepsi taste bad
- What pets are illegal in Australia
- How is your external marriage
- Should I even take part in the ValueLabs interview