# Shapiro-Wilk Royston Test – Excel and Google Sheets

*This tutorial will demonstrate how to use the Shapiro-Wilk Royston Test in Excel and Google Sheets.*

**Shapiro-Wilk Test**is a statistical test conducted to determine whether a dataset can be modeled using the normal distribution, and thus, whether a randomly selected subset of the dataset can be said to be normally distributed.The Shapiro-Wilk test is considered one of the best among the numerical methods of testing for normality because of its high statistical power.

The original Shapiro-Wilk test, like most significance tests, is affected by the sample size and works best for sample sizes of n=2 to n=50. For larger sample sizes (up to n=2000), an extension of the Shapiro-Wilk test called the Shapiro-Wilk Royston test can be used.

This article examines the Shapiro-Wilk Royston test which is the more popular version of the Shapiro-Wilk test used by many popular statistical software packages. To learn more about how to perform the original Shapiro-Wilk test, see the Shapiro-Wilk test article.

**How the Shapiro-Wilk Royston Test Works **

The Shapiro-Wilk test tests the null hypothesis that the dataset comes from a normally distributed population against the alternative hypothesis that the dataset does not come from a normally distributed population.

The W statistics for a Shapiro-Wilk Royston test is given as follows:

where x_{(i) }is the i ^{th }order statistic (i.e. the i ^{th }data value after the dataset is arranged in ascending order),

is the mean (average) of the dataset,

n is the number of data points in the dataset,

a=(a_{1},…,a_{n} ) is the coefficient vector of the weights of the Shapiro-Wilk test representing the best linear estimate of the standard deviation of x_{i}, assuming normality, which we will approximate using Royston’s algorithm.

The vector a is anti-symmetric, that is a_{(n+1-i) }=-a_{i }for all i, and a_{(n+1)/2} =0 for odd n. Also, .

**Royston’s Algorithm for the Approximation of a**

Royston’s algorithm for the approximation of **a** for the Shapiro-Wilk test starts with the fact that W statistics is asymptotically equivalent to the statistic, where , is the expectation vector of x_{(i) }with n standard normal random variables, , and Φ is the normal cdf.

Using the values above and setting , we have the following approximations for a_{i}:

and

Where:

**The Shapiro-Wilk Royston Test’s Test Statistic**

For values of n between 4 and 11 , the statistic, w=-ln[0.459n-2.273-ln(1-W) ] , can be modeled with normal distribution with a mean, μ=0.544-0.39978n+0.062767n^{2}-0.0020322n^{3 }and a standard deviation,

σ = exp(1.3822-0.77857n+0.062767n^{2}-0.0020322n^{3} )

Similarly, for values of n between 12 and 2000, the statistic, w = ln(1-W), is normally distributed with a mean,

μ = 0.0038915x^{3 }– 0.083751x^{2 }– 0.31082x – 1.5861, and a standard deviation,

σ = exp(0.0030302x^{2} – 0.082676x – 0.4803), where x = ln n.

Thus, for the Shapiro-Wilk Royston test, the z-statistic is used as the test statistic and is given by

To find the p-value of the test, the z-score obtained above refers to the upper (right) tail of the standard normal curve.

If the obtained p-value is less than the chosen significant (alpha) level, the null hypothesis is rejected, and it is concluded that the dataset is not from a normally distributed population, otherwise, the null hypothesis is not rejected and it is concluded that there is no statistically significant evidence that the dataset does not come from a normally distributed population.

**How to Perform the Shapiro-Wilk Royston Test in Excel**

**Background:** A sample of the heights, in inches, of 14 ten years old boys are presented in the table below. Use the Shapiro-Wilk Royston method of testing for normality to test whether the data obtained from the sample can be modeled using a normal distribution.

First, select the values in the dataset and Sort the data: **Data > Sort (Sort Smallest to Largest)** to arrange the values in ascending order as shown below:

And the arranged values are as follows:

Alternatively, with newer versions of Excel, you can use the SORT Function to sort the data:

`=SORT(B2:B15)`

Next, calculate the W denominator of the statistic, , as shown in the picture below:

Complete the rest of the column and then calculate the sum (shown in green background) as shown in the picture below:

Thus, the denominator of the W statistic is 189.895.

Next, obtain the values of using the NORM.S.INV Function with the formula:

The formula and the value of m_{1} are shown in the picture below:

*Note that for our case, n = 14 because we have 14 data points.

Complete the rest of the column as shown in the picture below:

Now, since is a column vector, then it follows that.

Thus, the value of can be calculated in Excel using the SUMSQ Function.

Thus, calculate the values of c_{i }using the formula:

The formula for the value of C_{1} is shown in the picture below:

Complete the rest of the column as shown in the picture below:

Next, use the formula given above to calculate the values of a_{n} and a_{(n-1)} for n=14, and because of the anti-symmetric property of a_{i }, a_{n+1-i }= – a_{i }. That is, a_{14 }= a_{1 }, a_{13 }= a_{2}, etc.

The formula and values of a_{14} and a_{1} are shown in the picture below:

Similarly, the formula and values of a_{13} and a_{2 }are shown in the picture below:

Next, obtain the value of , noting that, as established above, . The formula and the value of ϕ are shown in the picture below:

Then, using the value of ϕ, obtain the values of the rest of the a_{i} column using the formula: for . The picture below shows the formula and value of a_{3 :}

Thus, the complete values of the a_{i} column are shown in the picture below:

Next, multiply the a_{i} values with the corresponding (already arranged) values in the dataset to get the a_{i }x_{(i)}

column. The calculation and the value for the first data point are shown in the picture below:

Complete the rest of the a_{i }x_{(i) }column and calculate the sum (shown in green background) as shown in the picture below:

The denominator of the W statistic as obtained previously is 189.895 , and the numerator is the square of the sum of the a_{i }x_{(i)} column. Thus, we have as follows:

Therefore, the W statistic is as shown below:

*Note that the value of the W statistic will always be between 0 and 1.

Next, obtain the values of the w statistic, μ and σ using the formulas stated previously in this article.

For our case, n=14, so we use the formula w=ln(1-W) as shown in the picture below:

Calculate the value of μ as shown in the picture below:

Also, calculate the value of σ as shown in the picture below:

Next, obtain the z-score using the formula stated previously as shown in the picture below:

Finally, obtain the p-value using the NORM.S.DIST Function.

Since the p-value of the Shapiro-Wilk Royston test is the upper tail of the standard normal curve, we used the formula: p-value 1 – NORM.S.DIST(z, TRUE) to obtain the p-value as shown in the picture below:

The p-value is 0.14697, which is greater than α=0.05, hence, the null hypothesis is not rejected.

Therefore, we conclude that there is not enough evidence that the dataset is not drawn from a normally distributed population. That is, we can assume that the dataset is normally distributed.

# Shapiro-Wilk Royston Test in Google Sheets

Shapiro-Wilk Royston test can be conducted in Google Sheets in a similar way as done in Excel as shown in the picture below.