Test of normality “Confidence” Explained
Print Friendly View
written: 01/03/2007
last modified: 03/22/2024

The test of normality in Cpk and Histogram charts is based on the Chi Squared test for goodness of fit - the standard hypothesis test. The equations are described in appendix D the QC-CALC manual.

To summarize and paraphrase the equations, you could say the test of normality makes a statement "this data comes from a normal distribution" (called the "null hypothesis" or "H sub zero") and challenges the data to disprove it. The method of disproving it is to calculate the deviation in this data from the "ideal" normal distribution. The deviation is represented by the symbol "Chi Squared", hence the name of this test. The deviation is compared to a critical value at a specific risk, and if greater than the critical value the null hypothesis is rejected and the data is considered non-normal. The table that says "for a specific risk, the critical value = X" is a standard mathematical table. The published tables for critical value by risk are not continuous, so intermediate values are calculated by linear interpolation.

In any hypothesis test, there is always a risk that you reject a null hypothesis that is correct (say “non-normal” when the data actually came from a normal process) or fail to reject a null hypothesis that is incorrect (say “normal” when the data came from a non-normal distribution). The risk of incorrectly saying “non-normal” is called the “Producer’s risk” or Alpha. The risk of incorrectly saying “normal” is called the “Consumer’s risk” or Beta. In general, Alpha can be calculated but Beta cannot be. We only know that Beta increases when you decrease Alpha or decrease the sample size.

Now that we have risk defined, we can talk about confidence. If risk is the chance of being wrong, confidence is the chance of being right. This sounds confusing, but because we are dealing with probability not deterministic equations, there is always a chance that you are wrong. The confidence and the risk add to 100%, meaning if you say the data is non-normal with a confidence of 85% you are saying "I think the data is from a non-normal process, but there is a 15% chance that the data is from a normal process". Because Beta cannot be calculated, confidence statements are made based on Alpha and are most useful when the test says “non-normal”. If you say the data is normal with a confidence of 85%, what you are saying is that if you called the data non-normal you would have an 85% chance of being wrong.

There are situations where this information is relevant, but in most cases the process should be considered normal unless the test reports “non-normal” with a confidence greater than or equal to the predetermined minimum confidence, usually 90%, 95%, or 99%. Other minimum confidences are valid. The minimum confidence should be determined by comparing the cost of incorrectly saying “normal” to the cost of incorrectly saying “non-normal.”

If your company uses a fixed minimum confidence, you can set the QC-CALC custom reports to use that confidence by changing the “Distribution” line as shown below. This example uses a minimum confidence of 95%. Replace the 95 with your company’s minimum. If you use a minimum confidence, you should not print the calculated confidence.

"Distribution÷"+if((not Is_Normal) and Confidence>=95,"Non-Normal","Normal")

As a final warning, the Chi Squared test works best on samples of 500 to 1000 or more data points. If you have a smaller number of points, you should also look at Skewness and Kurtosis in the Histogram chart, and/or a probability plot. These are normality tests that work better with smaller samples. Remember that in all cases a test of normality is a best guess. No test gives an absolute answer!