Friday, December 26, 2008

Predicting the HOF Vote

For a few years I've been working on a predictive model for HOF voting. Not a method for determining (and advocating for) the players I think *ought to be* in the Hall of Fame, but a way to predict how many votes a player will actually receive, based on a study of actual voting patterns.

I haven't written about it or published any predictions for a couple of reasons. The biggest one is that I don't think many people are interested in it. There's a big market for predicting player statistics, thanks to the folks who play fantasy baseball. But nobody really pays close attention to the Hall of Fame balloting. If anything, they just take note of the new inductees each winter. Even if I could be 100% accurate in my predictions, so what?

Second, the model still needs some work. My approach had been to come up with a model, then to go back and run the predictor for past seasons to see how well it would have performed in predicting ballot totals for those seasons. I keep running through these regressions and refining the model, learning from the anomolies.

We've got a new set of results coming out in a couple of weeks, so I wanted to throw this out there, both to get my predictions on the record and to spark some interest from the handful of people who think about such things.

Some of the rules for the Hall of Fame selection process have changed somewhat over the years, but the basic process has remained essentially the same since the Hall opened in 1936. To appear on the ballot, you had to have played at least ten seasons in the major leagues, and you must be retired for at least five seasons before you can be considered. The voters -- active baseball writers with at least 10 years of service -- can vote for up to 10 players each season, and any player who receives 75% of the votes is put into the Hall of Fame. If a player falls short of that threshold but receives at least 5% of the vote, his name carries over to the next year. If a player isn't selected after 15 years on the ballot, he's dropped from the process, although he can later be considered by the Veteran's Committee.

The results of the voting process have been published each year, giving us a wealth of data to study. I've been primarily concerned with asking three questions.

1) How likely is it that a player will be voted in this year?
2) How likely is it that this player will ever be voted in?
3) What percentage of votes is a first year player likely to get, given his playing record?

Question #3 is the toughest one to study. Subjective factors have a huge influence, and it's almost impossible to measure what sort of impact those will have on voters. Last year was a perfect example, with slugger Mark McGwire appearing on the ballot for the first time. My methodology predicited he'd get around 50% of the votes, but he ended up with 24%. This was due largely to the allegations of steroid use, and his disastrous appearance before a congressional committee investigating the subject. While we're aware of these subjective influences, it's very difficult to measure them.

Questions #1 and #2 are much more straightforward, and while it's not an exact science, the voting totals do reveal some patterns.
  • Players who get at least twenty percent of the votes in their first year on the ballot have an 80% chance of eventually being voted in. (Another 7% will be inducted by the Veteran's Committee).
  • Only one player got higher than 35% in his first year of eligibility and didn't eventually make it into the Hall of Fame: Steve Garvey.
  • If players ever get as high as 25 percent of the vote, their chances of getting in eventually (either through a future ballot or by the Veteran's Committee) are roughly 60%.
  • Only two players have gotten over 40% on a ballot and not eventually gotten in: Ron Santo and Tony Oliva.

There are 23 players on this year's ballot, and I feel fairly safe in predicting that only two players will reach the 75% needed to get in: Rickey Henderson, in his first year on the ballot, and Jim Rice, in his final year of eligibility. Here are my predictions for the percentage of votes each player will receive. Players in their first year of eligibility are marked with an asterisk.

95 Rickey Henderson *
77 Jim Rice
62 Andre Dawson
60 Bert Blyleven
46 Lee Smith
41 Jack Morris
38 Tommy John
29 Tim Raines
27 Mark McGwire
19 Alan Trammell
15 Don Mattingly
14 Dave Parker
12 Mark Grace *
11 Dale Murphy
9 David Cone *
7 Harold Baines
5 Mo Vaughn *
4 Matt Williams *
3 Jesse Orosco *
3 Greg Vaughn *
2 Ron Gant *
2 Jay Bell *
1 Dan Plesac *

Other than Henderson, there aren't any decent first year candidates. Frankly, I'm puzzled why some of them made it through the screening process, but I suppose I'd rather have more players make it through than less. Let the voters have their say. Here are a few comments about specific candidates.

There's no precedent for someone coming as close as Rice was last year and not making it. Nineteen players have received between 70 and 75 percent on a ballot. Sixteen of them made it the next year. The other three (Nellie Fox, Jim Bunning, Orlando Cepeda) were in their last year of eligibility and had to wait for the VC to put them in.

The voting pattern for Andre Dawson suggests that he's probably a year away from going in, maybe two. Blyleven has five years of eligibility left, and while he's within striking distance (61.9% last year), he's entering into the steepest part of the climb. Bunning was at 65.7% in his 10th year on the ballot and didn't make it. Neither did Gil Hodges, who was at 59.5%.

Lee Smith seems to have reached a plateau in the mid 40s, which is the range in which a lot of the candidates who fall short seem to stall. Ron Santo and Roger Maris are a couple of prominent examples. Smith does have one thing going for him: the voters seem to have broken the bottleneck on closers, inducting Dennis Eckersley in 2004, Bruce Sutter in 2006, and Rich Gossage in 2008. I'd put Smith's chances somewhere north of 50-50 at this point.

Tommy John is in his last year of eligibility and doesn't have a shot. Jack Morris isn't gaining much traction, and I think he is heading down a very similar path. Raines got 24% of the vote in his first year, but he has a lot of supporters from the Sabermetric community, and his is the sort of candidacy that could gain a lot of momentum. If his supporters remain vocal in making an impassioned case for him, I think it will take six years to get him in -- the class of 2013.

McGwire's case is perhaps the most interesting. I'll be curious to see how many people have softened their stance towards him having had a year to put his candidacy into perspective. I'd wager there's a 2/3 chance his vote total will only move by +/-3 percentage points, and a 1/3 chance he jumps up 10-15 points.

Results will be announced on January 12.