# Wilcoxon signed-rank test

The Wilcoxon signed-rank test is a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it is a paired difference test). It can be used as an alternative to the paired Student's t-test (also known as "t-test for matched pairs" or "t-test for dependent samples") when the distribution of the difference between two samples' means cannot be assumed to be normally distributed.[1] A Wilcoxon signed-rank test is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution.

## History

The test is named for Frank Wilcoxon (1892–1965) who, in a single paper, proposed both it and the rank-sum test for two independent samples (Wilcoxon, 1945).[2] The test was popularized by Sidney Siegel (1956) in his influential textbook on non-parametric statistics.[3] Siegel used the symbol T for a value related to, but not the same as, ${\displaystyle W}$. In consequence, the test is sometimes referred to as the Wilcoxon T test, and the test statistic is reported as a value of T.

## Assumptions

1. Data are paired and come from the same population.
2. Each pair is chosen randomly and independently[citation needed].
3. The data are measured on at least an interval scale when, as is usual, within-pair differences are calculated to perform the test (though it does suffice that within-pair comparisons are on an ordinal scale).

## Test procedure

Let ${\displaystyle N}$ be the sample size, i.e., the number of pairs. Thus, there are a total of 2N data points. For pairs ${\displaystyle i=1,...,N}$, let ${\displaystyle x_{1,i}}$ and ${\displaystyle x_{2,i}}$ denote the measurements.

H0: difference between the pairs follows a symmetric distribution around zero
H1: difference between the pairs does not follow a symmetric distribution around zero.
1. For ${\displaystyle i=1,...,N}$, calculate ${\displaystyle |x_{2,i}-x_{1,i}|}$ and ${\displaystyle \operatorname {sgn}(x_{2,i}-x_{1,i})}$, where ${\displaystyle \operatorname {sgn} }$ is the sign function.
2. Exclude pairs with ${\displaystyle |x_{2,i}-x_{1,i}|=0}$. Let ${\displaystyle N_{r}}$ be the reduced sample size.
3. Order the remaining ${\displaystyle N_{r}}$ pairs from smallest absolute difference to largest absolute difference, ${\displaystyle |x_{2,i}-x_{1,i}|}$.
4. Rank the pairs, starting with the pair with the smallest non-zero absolute difference as 1. Ties receive a rank equal to the average of the ranks they span. Let ${\displaystyle R_{i}}$ denote the rank.
5. Calculate the test statistic ${\displaystyle W}$
${\displaystyle W=\sum _{i=1}^{N_{r}}[\operatorname {sgn}(x_{2,i}-x_{1,i})\cdot R_{i}]}$, the sum of the signed ranks.
6. Under null hypothesis, ${\displaystyle W}$ follows a specific distribution with no simple expression. This distribution has an expected value of 0 and a variance of ${\displaystyle {\frac {N_{r}(N_{r}+1)(2N_{r}+1)}{6}}}$.
${\displaystyle W}$ can be compared to a critical value from a reference table.[4]
The two-sided test consists in rejecting ${\displaystyle H_{0}}$ if ${\displaystyle |W|>W_{critical,N_{r}}}$.
7. As ${\displaystyle N_{r}}$ increases, the sampling distribution of ${\displaystyle W}$ converges to a normal distribution. Thus,
For ${\displaystyle N_{r}\geq 20}$, a z-score can be calculated as ${\displaystyle z={\frac {W}{\sigma _{W}}}}$, where ${\displaystyle \sigma _{W}={\sqrt {\frac {N_{r}(N_{r}+1)(2N_{r}+1)}{6}}}}$.
To perform a two-sided test, reject ${\displaystyle H_{0}}$ if ${\displaystyle z_{critical}<|z|}$.
Alternatively, one-sided tests can be performed with either the exact or the approximate distribution. p-values can also be calculated.
8. For ${\displaystyle N_{r}<20}$ the exact distribution needs to be used.

### Example

${\displaystyle i}$ ${\displaystyle x_{2,i}}$ ${\displaystyle x_{1,i}}$ ${\displaystyle x_{2,i}-x_{1,i}}$
${\displaystyle \operatorname {sgn} }$ ${\displaystyle {\text{abs}}}$
1 125 110 1 15
2 115 122  –1 7
3 130 125 1 5
4 140 120 1 20
5 140 140   0
6 115 124  –1 9
7 140 123 1 17
8 125 137  –1 12
9 140 135 1 5
10 135 145  –1 10
order by absolute difference
${\displaystyle i}$ ${\displaystyle x_{2,i}}$ ${\displaystyle x_{1,i}}$ ${\displaystyle x_{2,i}-x_{1,i}}$
${\displaystyle \operatorname {sgn} }$ ${\displaystyle {\text{abs}}}$ ${\displaystyle R_{i}}$ ${\displaystyle \operatorname {sgn} \cdot R_{i}}$
5 140 140   0
3 130 125 1 5 1.5 1.5
9 140 135 1 5 1.5 1.5
2 115 122  –1 7 3  –3
6 115 124  –1 9 4  –4
10 135 145  –1 10 5  –5
8 125 137  –1 12 6  –6
1 125 110 1 15 7 7
7 140 123 1 17 8 8
4 140 120 1 20 9 9

${\displaystyle \operatorname {sgn} }$ is the sign function, ${\displaystyle {\text{abs}}}$ is the absolute value, and ${\displaystyle R_{i}}$ is the rank. Notice that pairs 3 and 9 are tied in absolute value. They would be ranked 1 and 2, so each gets the average of those ranks, 1.5.

${\displaystyle W=1.5+1.5-3-4-5-6+7+8+9=9}$
${\displaystyle |W|
${\displaystyle \therefore {\text{failed to reject }}H_{0}}$ that the two medians are the same.
The ${\displaystyle p}$-value for this result is ${\displaystyle 0.6113}$

### Historical T statistic

In historical sources a different statistic, denoted by Siegel as the T statistic, was used. The T statistic is the smaller of the two sums of ranks of given sign; in the example, therefore, T would equal 3+4+5+6=18. Low values of T are required for significance. T is easier to calculate by hand than W and the test is equivalent to the two-sided test described above; however, the distribution of the statistic under ${\displaystyle H_{0}}$ has to be adjusted.

${\displaystyle T>T_{crit(\alpha =0.05,\ 9{\text{, two-sided}})}=5}$
${\displaystyle \therefore {\text{failed to reject }}H_{0}}$ that the two medians are the same.

Note: Critical T values (${\displaystyle T_{crit}}$) by values of ${\displaystyle N_{r}}$ can be found in appendices of statistics textbooks, for example in Table B-3 of Nonparametric Statistics: A Step-by-Step Approach, 2nd Edition by Dale I. Foreman and Gregory W. Corder (http://www.oreilly.com/library/view/nonparametric-statistics-a/9781118840429/bapp02.xhtml).

## Limitation

As demonstrated in the example, when the difference between the groups is zero, the observations are discarded. This is of particular concern if the samples are taken from a discrete distribution. In these scenarios the modification to the Wilcoxon test by Pratt 1959, provides an alternative which incorporates the zero differences.[5][6] This modification is more robust for data on an ordinal scale.[6]

## Effect size

To compute an effect size for the signed-rank test, one can use the rank-biserial correlation.

If the test statistic W is reported, the rank correlation r is equal to the test statistic W divided by the total rank sum S, or r = W/S. [7] Using the above example, the test statistic is W = 9. The sample size of 9 has a total rank sum of S = (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9) = 45. Hence, the rank correlation is 9/45, so r = 0.20.

If the test statistic T is reported, an equivalent way to compute the rank correlation is with the difference in proportion between the two rank sums, which is the Kerby (2014) simple difference formula.[7] To continue with the current example, the sample size is 9, so the total rank sum is 45. T is the smaller of the two rank sums, so T is 3 + 4 + 5 + 6 = 18. From this information alone, the remaining rank sum can be computed, because it is the total sum S minus T, or in this case 45 - 18 = 27. Next, the two rank-sum proportions are 27/45 = 60% and 18/45 = 40%. Finally, the rank correlation is the difference between the two proportions (.60 minus .40), hence r = .20.

## Software implementations

• R includes an implementation of the test as wilcox.test(x,y, paired=TRUE), where x and y are vectors of equal length.[8]
• ALGLIB includes implementation of the Wilcoxon signed-rank test in C++, C#, Delphi, Visual Basic, etc.
• GNU Octave implements various one-tailed and two-tailed versions of the test in the wilcoxon_test function.
• SciPy includes an implementation of the Wilcoxon signed-rank test in Python
• Accord.NET includes an implementation of the Wilcoxon signed-rank test in C# for .NET applications
• MATLAB implements this test using "Wilcoxon rank sum test" as [p,h] = signrank(x,y) also returns a logical value indicating the test decision. The result h = 1 indicates a rejection of the null hypothesis, and h = 0 indicates a failure to reject the null hypothesis at the 5% significance level
• Julia HypothesisTests package includes the Wilcoxon signed-rank test as "value(SignedRankTest(x, y))"

• Mann–Whitney–Wilcoxon test (the variant for two independent samples)
• Sign test (Like Wilcoxon test, but without the assumption of symmetric distribution of the differences around the median, and without using the magnitude of the difference)

## References

1. ^ "Paired t–test - Handbook of Biological Statistics". www.biostathandbook.com. Retrieved 2019-11-18.
2. ^ Wilcoxon, Frank (Dec 1945). "Individual comparisons by ranking methods" (PDF). Biometrics Bulletin. 1 (6): 80–83. doi:10.2307/3001968. hdl:10338.dmlcz/135688. JSTOR 3001968.
3. ^ Siegel, Sidney (1956). Non-parametric statistics for the behavioral sciences. New York: McGraw-Hill. pp. 75–83.
4. ^ Lowry, Richard. "Concepts & Applications of Inferential Statistics". Retrieved 5 November 2018.
5. ^ Pratt, J (1959). "Remarks on zeros and ties in the Wilcoxon signed rank procedures". Journal of the American Statistical Association. 54 (287): 655–667. doi:10.1080/01621459.1959.10501526.
6. ^ a b Derrick, B; White, P (2017). "Comparing Two Samples from an Individual Likert Question". International Journal of Mathematics and Statistics. 18 (3): 1–13.
7. ^ a b Kerby, Dave S. (2014), "The simple difference formula: An approach to teaching nonparametric correlation.", Comprehensive Psychology, 3: 11.IT.3.1, doi:10.2466/11.IT.3.1
8. ^ Dalgaard, Peter (2008). Introductory Statistics with R. Springer Science & Business Media. pp. 99–100. ISBN 978-0-387-79053-4.