For those of you that dont know, Jon Scott operates the best site imaginable for historical information about Kentucky basketball at Bigbluehistory.net. There is no site or person that is a better resource for the ins and outs of UK basketball history and it is the one stop shop for all you need to know about the Cats past and present. He is also a person who loves numbers and he has given KSR this post about the RPI index and how it is misused by fans. A good read for statheads everywhere:
Every season the college basketball world works itself into a lather obsessing over the RPI and how it might affect their team. It’s not surprising, given that nearly every media outlet and numerous sportswriters around the country constantly mention the RPI when discussing teams, even if it’s not relevant to their point (which is most of the time). This has only gotten worse as media outlets have learned that including their own version of the RPI on their websites is a good way to attract fans who don’t know any better, and a cottage industry has been built up around it.
Given the constant emphasis on the RPI and its supposed importance to the NCAA committee, it’s useful to step back and look at just how well the RPI actually mirrors the NCAA tournament field. To accomplish this, I performed a simple r-squared analysis of how well the actual field correlated with what the RPI would predict. [For those who slept through high school math, r-squared analysis best-fits the data with a straight line and calculates how closely the actual data are to the line, returning a value between 0 and 1.0. A r-squared of 1.0 is a perfect fit.]
In addition to the RPI, other polls and models were also considered. The results are below, both for the top-six seeds (to look at the highly ranked teams) along with the top-12 seeds (to look at the entire at-large field).
Top 6 Seeds
|ESPN/USA Today Poll||0.789||0.764|
What this shows is that in comparison to other models, the RPI has a relatively poor correlation to the tournament field, with only Pomeroy’s rating worse. Others, such as the Associated Press poll and the Sagarin and Massey ratings were superior. Beyond that, an average of all the poll and ratings was also taken and this showed a superior correlation [in fact analysis of past seasons have shown that usually the average is better than any single individual rating; this was the first year I’ve seen where the average was just barely beaten (by Massey’s correlation with the at-large field)].
The RPI has improved. I did this type of analysis a few times about ten years and back then the r-squared of the RPI was around 0.5 to 0.6, which is extremely poor. The improvement in correlation is no doubt due to the changes the NCAA made to the formula to make it better resemble reality, such as giving additional weight for away games, although margin of victory is still taboo in the NCAA’s eyes and is not used.
The bottom line is that the RPI, which at one time was terrible, is now merely a relatively poor indicator of which teams make the tournament and where they’re seeded. Maybe this will be kept in mind next season when the pundits start revving up the RPI hype once again?
For additional details and information on how the above was calculated and other findings, please check the following link.