The Spearman's Rank Correlation Coefficient is a moderately complex tool
(Excel recommended) used to determine and measure the strength of the
correlation between two sets of data. Spearman's Rank Correlation Coefficient Formula where d is the difference between the ranks and N is the number of ranks.
Limitations:
IMPORTANT: you must rank BOTH sets of data from highest to lowest (ie: the highest value gets rank 1, the 2nd highest gets rank 2, etc). If 2 lines or more are tied with the same rank, you must give them the same rank number, equal to the middle between those 2 or more lines ( r = (t+2b+1)/2, where t = # of lines which are tied and b is the # of the last rank immediately before the lines that are tied)
The Spearman's Rank Correlation Coefficient is used to determine the strength or a correlation between two sets of data. A scatter graph can already suggest if there is a strong/weak negative/positive correlation (see below) but the Spearman's Rank Correlation Coefficient will allow us to quantify that correlation (in case there is one).
Note that the Pearson Product-Moment Correlation Coefficient is more sophisticated that the Spearman's Rank Test: it gives more accurate results because it uses the actual measured values of the data rather than their relative rankings. For the Pearson Product-Moment test to be used with validity however, the data MUST come from a normally distributed population. If unsure, use the Spearman's Rank Correlation test below.
Limitations:
- The data must be linear (draw a scatter graph with the line of best fit)
- The data must be independent from each other (ex: HDI and Fertility does NOT work because HDI is calculated using Fertility)
- There should be between 10 and 30 pairs of data
- Note that a strong correlation does not necessarily mean cause and effect. Ex: % of households owning a camera and % of person dying of lung cancer in the United States are both increasing between 1950 and 2000... But both elements are clearly NOT correlated!
IMPORTANT: you must rank BOTH sets of data from highest to lowest (ie: the highest value gets rank 1, the 2nd highest gets rank 2, etc). If 2 lines or more are tied with the same rank, you must give them the same rank number, equal to the middle between those 2 or more lines ( r = (t+2b+1)/2, where t = # of lines which are tied and b is the # of the last rank immediately before the lines that are tied)
The Spearman's Rank Correlation Coefficient is used to determine the strength or a correlation between two sets of data. A scatter graph can already suggest if there is a strong/weak negative/positive correlation (see below) but the Spearman's Rank Correlation Coefficient will allow us to quantify that correlation (in case there is one).
Note that the Pearson Product-Moment Correlation Coefficient is more sophisticated that the Spearman's Rank Test: it gives more accurate results because it uses the actual measured values of the data rather than their relative rankings. For the Pearson Product-Moment test to be used with validity however, the data MUST come from a normally distributed population. If unsure, use the Spearman's Rank Correlation test below.
1. Verify that you can run this test:
|
2. Calculate the Spearman's Rank Correlation Coefficient (Rs):
Use Excel to rank each sets of data and calculate Rs using the formula found at the top of this page (see podcast below). IMPORTANT 1: you must rank BOTH sets of data from highest to lowest (ie: the highest value gets rank 1, the 2nd highest gets rank 2, etc) IMPORTANT 2: if 2 lines or more are tied with the same rank, you must give them the same rank number which must be the average between the tied ranks. Example: if two lines are tied for rank #5 and rank #6, give them both the rank (5+6)/2 = rank #5.5. The ranking will then resume with the following line at rank #7, since technically speaking ranks #5 and #6 have been "used". |
3. Interpret the result:
4. Verify if the result is meaningful
There is a possibility that the correlation you have observed is not meaningful, but just happened by chance. In other words, if you had taken a different sample, you might have found completely different results. If the correlation is truly meaningful (i.e.: it doesn't happen by chance), you will find similar results no matter which sample you take. By default, the minimum "level of certainty" or "confidence level" required is 95% (i.e.: there should be at least a 95% chance that the correlation is NOT a coincidence). It's the same a saying, in reverse, that the "significance level" must be no more than 100% - 95% = 5% (i.e.: there should be no more than 5% chance that the correlation was a coincidence).
To determine if the correlation is meaningful, use the diagram below using Rs and the degrees of freedom of your sample (df = # of pairs of data - 2):
5. Conclude while pointing out the limitations of the test:
Click here to see two examples
- Note: Rs is always between -1 and +1
- If Rs is close to 0, the correlation is weak.
- If Rs is close to ± 1, there is a strong correlation, as follows:
- If Rs is close to -1, there is a strong negative correlation (one factor increases when the other one decreases)
- If Rs is close to +1, there is a strong positive correlation (both factors increase or decrease at the same time).
4. Verify if the result is meaningful
There is a possibility that the correlation you have observed is not meaningful, but just happened by chance. In other words, if you had taken a different sample, you might have found completely different results. If the correlation is truly meaningful (i.e.: it doesn't happen by chance), you will find similar results no matter which sample you take. By default, the minimum "level of certainty" or "confidence level" required is 95% (i.e.: there should be at least a 95% chance that the correlation is NOT a coincidence). It's the same a saying, in reverse, that the "significance level" must be no more than 100% - 95% = 5% (i.e.: there should be no more than 5% chance that the correlation was a coincidence).
To determine if the correlation is meaningful, use the diagram below using Rs and the degrees of freedom of your sample (df = # of pairs of data - 2):
- If the result you find is BELOW the line "5%", the confidence level is too low: this means that you cannot reliably say that the correlation you have observed is really meaningful and probably just happened by luck. If you took a different sample of data, you might obtain different results.
- If the result you find is ABOVE the line "5%" (the closer to the upper right hand corner of the diagram, the better!), it means that the correlation is significant with less than a 5% margin of error (i.e.: a level of certainty of 95% or more)
5. Conclude while pointing out the limitations of the test:
- If your sample does not meet the required confidence level (95% or more), you cannot conclude: either there is no correlation at all, or the correlation you have measured is just the result of luck, not of an actual underlying trend
- If Rs is close to +/- 1 AND your sample meets the required confidence level (95% or more), you can conclude that there is a strong and reliable correlation. HOWEVER, a strong correlation does NOT necessarily mean cause and effect! You would need more research more information (which this test does NOT provide)
Click here to see two examples