Mann Whitney U For comparison data Using Mann Whitney U Non-parametric i.e. no assumptions are made about data fitting a normal distribution Is used to compare the medians of two sets of data It measures the overlap between the two data sets You must have between 6 and 20 replicates of data The data sets can have unequal numbers of replicates Example of normally distributed data Frequency silver birch tree height at Rushey Plain 12 Frequency 10 8 6 4 2 0 10 to 15 to 20 to 25 to 30 to 35 to 40 to 45 to 50 to 55 to 60 to 14 19 24 29 34 39 44 49 54 59 64 Tree height When to use Mann-Whitney U-test Frequency of silver birch and beech trees at Broom Hill 12 Frequency 10 8 Silver Birch 6 Beech 4 2 0 10 to 15 to 20 to 25 to 30 to 35 to 40 to 45 to 50 to 55 to 60 to 14 19 24 29 34 39 44 49 54 59 64 Tree height Curve not normally distributed ie. non parametric Compares overlap between two data sets The Equation U1 = n1 x n2 + ½ n2 (n2 + 1) - R2 U2 = n1 x n2 + ½ n1 (n1 + 1) - R1 The Equation U1 = n1 x n2 + ½ n2 (n2 + 1) - R2 U2 = n1 x n2 + ½ n1 (n1 + 1) - R1 U1 = Mann - Whitney U for data set 1 n1 = Sample size of data set 1 R1 = Sum of the ranks of data set 1 U2 = Mann - Whitney U for data set 2 n2 = Sample size of data set 2 R2 = Sum of the ranks of data set 2 Where: Method 1. Establish the Null Hypothesis H0 (this is always the negative form. i.e. there is no significant correlation between the variables) and the alternative hypothesis (H1). H0 - There is no significant difference between the variable at Site 1 and Site 2 H1 - There is a significant difference between the variable at Site 1 and Site 2 2. Copy your data into the table below as variable x and variable y and label the data sets Rank 1 R1 Data Set 1 Beech Hill (m) 23 21 23 20 24 25 Data Set 2 Rushey Plain (m) 16 18 19 17 20 21 Rank 2 R2 22 3. Treat both sets of data as one data set and rank them in increasing order (the lowest data value gets the lowest rank) Rank 1 R1 Data Set 1 Beech Hill (m) 23 21 23 20 24 25 Data Set 2 Rushey Plain (m) 16 18 19 17 20 21 22 Rank 2 R2 Start from the lowest and put the numbers in order: 16, 17, 18, 19, 20, 20, 21, 21, 22, 23, 23, 24, 25 The lowest data value gets a rank of 1 When you have data values of the same value, they must have the same rank. Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5) 5 6 1 2 3 4 5.5 5.5 16 17 18 19 20 20 21 21 22 23 23 24 The same thing is done for all data values that are the same 25 The lowest data value gets a rank of 1 When you have data values of the same value, they must have the same rank. Take the ranks you would normally assign (5 and 6) and add them together (11) and divide the ranks between the data values(5.5) 5 6 7 8 10 11 1 2 3 4 5.5 5.5 7.5 7.5 9 10.5 10.5 12 13 16 17 18 19 20 21 22 23 24 25 20 21 The assigned ranks can then be put into the table 23 10.5 7.5 10.5 5.5 12 13 9 Data Set 1 Beech Hill (m) 23 21 23 20 24 25 22 Data Set 2 Rushey Plain (m) 16 18 19 17 20 21 1 3 4 2 5.5 7.5 Rank 1 R1 Rank 2 R2 4. Sum the ranks for each set of data ( R) R1 = 10.5 + 7.5 + 10.5 + 5.5 + 12 + 13 + 9 = 68 R2 = 1 + 3 + 4 + 2 + 5.5 + 7.5 = 23 5. Calculate the number of samples in each data set (n) Count the number of samples in each of the data sets 10.5 7.5 10.5 5.5 12 13 9 Data Set 1 Beech Hill (m) 23 21 23 20 24 25 22 Data Set 2 Rushey Plain (m) 16 18 19 17 20 21 1 3 4 2 5.5 7.5 Rank 1 R1 Rank 2 R2 n1 = 7 n2 = 6 6. Calculate the Values for U1 and U2 using the equations U1 = n1 x n2 + ½ n2 (n2 + 1) - R2 U2 = n1 x n2 + ½ n1 (n1 + 1) - R1 It is a good idea to break the equations down into three bite size chunks that will then give you a very easy three figure sum U1 = n1 x n2 + ½ n2 (n2 + 1) - R2 U2 = n1 x n2 + ½ n1 (n1 + 1) - R1 6. Calculate the Values for U1 and U2 using the equations U1 = n1 x n2 + ½ n2 (n2 + 1) - R2 (7x6) +3(6+1) -23 (7x6) +3(7) -23 U1 = 42 +21 -23 = 40 U2 = n1 x n2 + ½ n1 (n1 + 1) - R1 (7x6) +3.5(7+1) -68 (7x6) +3.5(8) -68 U2 = 42 +28 -68 = 2 6. Compare the smallest U value against the table of critical values The smallest U value is U2 = 2 If the U value is less than (or equal to) the critical value then there is a significant difference between the data sets and the null hypothesis can be rejected Critical Values for the Mann Whitney U test (at the 0.05 or 95% confidence level i.e. we are 95% confident our data was not due to chance) We us the values of n1 and n2 to find our critical value Value of n2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 0 0 0 1 1 1 1 1 2 3 4 n1 0 1 1 2 2 3 3 4 4 5 5 0 1 2 3 4 4 5 6 7 8 9 10 5 0 1 2 3 5 6 7 8 9 11 12 13 14 6 1 2 3 5 6 8 10 11 13 14 16 17 19 7 1 3 5 6 8 10 12 14 16 18 20 22 24 8 0 2 4 6 8 10 13 15 17 19 22 24 26 29 9 0 2 4 7 10 12 15 17 20 23 26 28 31 34 10 0 3 5 8 11 14 17 20 23 26 29 33 36 39 11 0 3 6 9 13 16 19 23 26 30 33 37 40 44 12 1 4 7 11 14 18 22 26 29 33 37 41 45 49 13 1 4 8 12 16 20 24 28 33 37 41 45 50 54 14 1 5 9 13 17 22 26 31 36 40 45 50 55 59 15 1 5 10 14 19 24 29 34 39 44 49 54 59 64 Is there a significant difference? Is 2 (our smallest U value) smaller or larger than 6 (our critical value from the Mann Whitney Table)? Smaller If the U value is less than (or equal to) the critical value then there is a significant difference between the data sets and the null hypothesis can be rejected The smallest U value is less than the critical value; therefore the null hypothesis is rejected The alternative Hypothesis can be accepted – There is a significant difference between the tree heights of Beech Hill and Rushey Plain Use the following data to calculate U values independently Abundance of Gammarus pulex in pools and riffles of an Exmoor stream Rank 1 R1 Velocity cm.s-1 Pools 12 12 5 6 14 9 20 8 Velocity cm.s-1 Riffles 54 55 61 56 31 47 68 54 27 39 43 2 0 1 3 9 85 16 80 18 3 5 63 150 Rank 2 R2 Rank 1 R1 Abundance of Gammarus pulex Pools Abundance of Gammarus pulex Riffles Rank 2 R2 Key questions Is there a significant relationship? Which data value/s would you consider to be anomalous and why? What graph would you use to present this data?
© Copyright 2025 Paperzz