Shapiro-Wilk Royston Test – Excel and Google Sheets

Written by

Editorial Team

Reviewed by

Steve Rynearson

Last updated on July 9, 2022

This tutorial will demonstrate how to use the Shapiro-Wilk Royston Test in Excel and Google Sheets.

Shapiro Wilk Royston Test

The Shapiro-Wilk Test is a statistical test conducted to determine whether a dataset can be modeled using the normal distribution, and thus, whether a randomly selected subset of the dataset can be said to be normally distributed.The Shapiro-Wilk test is considered one of the best among the numerical methods of testing for normality because of its high statistical power.

The original Shapiro-Wilk test, like most significance tests, is affected by the sample size and works best for sample sizes of n=2 to n=50. For larger sample sizes (up to n=2000), an extension of the Shapiro-Wilk test called the Shapiro-Wilk Royston test can be used.

This article examines the Shapiro-Wilk Royston test which is the more popular version of the Shapiro-Wilk test used by many popular statistical software packages. To learn more about how to perform the original Shapiro-Wilk test, see the Shapiro-Wilk test article.

How the Shapiro-Wilk Royston Test Works

The Shapiro-Wilk test tests the null hypothesis that the dataset comes from a normally distributed population against the alternative hypothesis that the dataset does not come from a normally distributed population.

The W statistics for a Shapiro-Wilk Royston test is given as follows:

Shapiro Wilk Royston Test 001

where x_(i)is the i ^thorder statistic (i.e. the i ^thdata value after the dataset is arranged in ascending order),

is the mean (average) of the dataset,

n is the number of data points in the dataset,

a=(a₁,…,a_n ) is the coefficient vector of the weights of the Shapiro-Wilk test representing the best linear estimate of the standard deviation of x_i, assuming normality, which we will approximate using Royston’s algorithm.

The vector a is anti-symmetric, that is a_(n+1-i)=-a_ifor all i, and a_(n+1)/2 =0 for odd n. Also, .

Royston’s Algorithm for the Approximation of a

Royston’s algorithm for the approximation of a for the Shapiro-Wilk test starts with the fact that W statistics is asymptotically equivalent to the statistic, where , is the expectation vector of x_(i)with n standard normal random variables, , and Φ is the normal cdf.

Using the values above and setting , we have the following approximations for a_i:

and

Shapiro Wilk Royston Test 009

Where: Shapiro Wilk Royston Test 010

The Shapiro-Wilk Royston Test’s Test Statistic

For values of n between 4 and 11 , the statistic, w=-ln⁡[0.459n-2.273-ln⁡(1-W) ] , can be modeled with normal distribution with a mean, μ=0.544-0.39978n+0.062767n²-0.0020322n³and a standard deviation,

σ = exp(1.3822-0.77857n+0.062767n²-0.0020322n³ )

Similarly, for values of n between 12 and 2000, the statistic, w = ln(1-W), is normally distributed with a mean,

μ = 0.0038915x³– 0.083751x²– 0.31082x – 1.5861, and a standard deviation,

σ = exp(0.0030302x² – 0.082676x – 0.4803), where x = ln n.

Thus, for the Shapiro-Wilk Royston test, the z-statistic is used as the test statistic and is given by

Shapiro Wilk Royston Test 011

To find the p-value of the test, the z-score obtained above refers to the upper (right) tail of the standard normal curve.

If the obtained p-value is less than the chosen significant (alpha) level, the null hypothesis is rejected, and it is concluded that the dataset is not from a normally distributed population, otherwise, the null hypothesis is not rejected and it is concluded that there is no statistically significant evidence that the dataset does not come from a normally distributed population.

How to Perform the Shapiro-Wilk Royston Test in Excel

Background: A sample of the heights, in inches, of 14 ten years old boys are presented in the table below. Use the Shapiro-Wilk Royston method of testing for normality to test whether the data obtained from the sample can be modeled using a normal distribution.

Shapiro Wilk Royston Test 012

First, select the values in the dataset and Sort the data: Data > Sort (Sort Smallest to Largest) to arrange the values in ascending order as shown below:

Shapiro Wilk Royston Test 013

And the arranged values are as follows:

Shapiro Wilk Royston Test 014

Alternatively, with newer versions of Excel, you can use the SORT Function to sort the data:

=SORT(B2:B15)

Shapiro Wilk Royston Test 015

Next, calculate the W denominator of the statistic, , as shown in the picture below:

Shapiro Wilk Royston Test 017.png

Complete the rest of the column and then calculate the sum (shown in green background) as shown in the picture below:

Shapiro Wilk Royston Test 018

Thus, the denominator of the W statistic is 189.895.

Next, obtain the values of using the NORM.S.INV Function with the formula:

Shapiro Wilk Royston Test 019

The formula and the value of m₁ are shown in the picture below:

Shapiro Wilk Royston Test 021

*Note that for our case, n = 14 because we have 14 data points.

Complete the rest of the column as shown in the picture below:

Shapiro Wilk Royston Test 022

Now, since is a column vector, then it follows that.

Shapiro Wilk Royston Test 023

Thus, the value of can be calculated in Excel using the SUMSQ Function.

Thus, calculate the values of c_iusing the formula:

The formula for the value of C₁ is shown in the picture below:

Shapiro Wilk Royston Test 026

Complete the rest of the column as shown in the picture below:

Shapiro Wilk Royston Test 027

Next, use the formula given above to calculate the values of a_n and a_(n-1) for n=14, and because of the anti-symmetric property of a_i, a_n+1-i= – a_i. That is, a₁₄= a₁, a₁₃= a₂, etc.

The formula and values of a₁₄ and a₁ are shown in the picture below:

Shapiro Wilk Royston Test 028

Similarly, the formula and values of a₁₃ and a₂are shown in the picture below:

Shapiro Wilk Royston Test 029

Next, obtain the value of , noting that, as established above, . The formula and the value of ϕ are shown in the picture below:

Shapiro Wilk Royston Test 032

Then, using the value of ϕ, obtain the values of the rest of the a_i column using the formula: for . The picture below shows the formula and value of a_{3 :}

Shapiro Wilk Royston Test 034

Thus, the complete values of the a_i column are shown in the picture below:

Shapiro Wilk Royston Test 035

Next, multiply the a_i values with the corresponding (already arranged) values in the dataset to get the a_ix_(i)

column. The calculation and the value for the first data point are shown in the picture below:

Shapiro Wilk Royston Test 036

Complete the rest of the a_ix_(i)column and calculate the sum (shown in green background) as shown in the picture below:

Shapiro Wilk Royston Test 037

The denominator of the W statistic as obtained previously is 189.895 , and the numerator is the square of the sum of the a_ix_(i) column. Thus, we have as follows:

Shapiro Wilk Royston Test 038

Therefore, the W statistic is as shown below:

Shapiro Wilk Royston Test 039

*Note that the value of the W statistic will always be between 0 and 1.

Next, obtain the values of the w statistic, μ and σ using the formulas stated previously in this article.
For our case, n=14, so we use the formula w=ln⁡(1-W) as shown in the picture below:

Shapiro Wilk Royston Test 040

Calculate the value of μ as shown in the picture below:

Shapiro Wilk Royston Test 041

Also, calculate the value of σ as shown in the picture below:

Shapiro Wilk Royston Test 042

Next, obtain the z-score using the formula stated previously as shown in the picture below:

Shapiro Wilk Royston Test 043

Finally, obtain the p-value using the NORM.S.DIST Function.

Since the p-value of the Shapiro-Wilk Royston test is the upper tail of the standard normal curve, we used the formula: p-value 1 – NORM.S.DIST(z, TRUE) to obtain the p-value as shown in the picture below:

Shapiro Wilk Royston Test 044

The p-value is 0.14697, which is greater than α=0.05, hence, the null hypothesis is not rejected.

Therefore, we conclude that there is not enough evidence that the dataset is not drawn from a normally distributed population. That is, we can assume that the dataset is normally distributed.

Shapiro-Wilk Royston Test in Google Sheets

Shapiro-Wilk Royston test can be conducted in Google Sheets in a similar way as done in Excel as shown in the picture below.

Shapiro Wilk Royston Test G Sheet