Infovis Breakdown: Predicting Major League Baseball 2012

by Jon Follett

Earlier this week, we released our Predicting Major League Baseball 2012 interactive information visualization with our picks for the playoffs this year. The visualization made its debut in the inaugural post on our recently launched channel on BostInno.

After the heartbreak of the Red Sox collapse last year, at Involution Studios we felt that, in order to maintain our sanity this season, we should hedge our bets by applying some data crunching and information design to the problem of winning baseball games. Our theory was that if, in baseball, money can be used as a proxy for talent, and talent a proxy for wins, then payroll and past win history could be a strong indicator of future success. So we constructed a lightweight Moneyball algorithm to determine if the Sox would be, once again, putting the Fenway faithful through the emotional wringer.

Predicting Major League Baseball 2012

We created this information visualization to better understand the relationship between wins and payroll in Major League Baseball.

We crunched the numbers on win and payroll data for the past six years of Major League Baseball, and came up with a few postseason predictions based on our calculations. If you’re an avid baseball fan there’s probably not a huge amount of controversy in our predictions for this year’s postseason teams. We’re picking the New York Yankees as the AL East division winners, and the Red Sox as one of the two AL wild card teams. Check out the rest of our predictions for MLB 2012 here.

The 2012 Red Sox Season

In the infovis, you can view team data regarding wins and payroll for every year 2006 - 2011, as well as a prediction for 2012.

You can play around with our MLB Wins vs. Payroll interactive information visualization yourself and see if it sparks any additional insights. In the interactive visualization, you can drill down into each team to see their wins for each year, their cost per win, and how their payroll varies each year from the team average payroll.

Information Design
From an information design perspective, the challenge was to show both the big picture — payroll vs. wins for all 30 Major League Baseball teams over the past six years — while enabling the user to view data in the context of individual team performance. We accomplished this by creating two views, so that users could easily compare team performances in a single year or switch to the detailed view for a particular team. For the individual team view, we adapted a bubble graph to accommodate payroll and wins for multiple years. The bubble sizes were designed to proportionally represent the number of wins for a team, while color gradients provide a comparative reference for wins.

Red Sox Team Analysis

We adapted a bubble graph to accommodate payroll and wins for multiple years.

To render the visualization, we settled on the D3 library, which among other great features, allows you to bind data to DOM elements. Once the data has been bound, working with D3 is as easy as picking the attributes you wish to use as visual keys and setting them accordingly. Another huge plus for D3 is its ability to create scalable vector graphic (SVG) content that is easily manipulated by the user, for an engaging interactive experience. While HTML itself is capable of drawing rectangles, lines and maybe even circles — depending on your knowledge of CSS3 — SVG can create curves, triangles, and almost any shape you can imagine.

The Raphael Javascript library also does a good job of seamless SVG integration. However, we did not choose Raphael for two reasons: Because the library is not as data driven, it would have forced hard coding of data elements into the SVG itself; and while Raphael gives additional control over the objects being rendered, it is convoluted compared to the clear cut D3 library.

In order to arrive at our calculations behind the predictions for Major League Baseball 2012, we created what we'll call our Moneyball Ratio for each team — the change in payroll divided by change in wins, based on their six year averages. And, since we knew the 2012 payroll for each team, using this ratio, it was a simple matter to calculate projected wins based on the payroll change for this year. You can view our spreadsheet here, with data compiled from Major League Baseball, ESPN, and others.

We promise to follow up with some analysis at the end of the season to see if our predictions for the Red Sox and Major League Baseball were spot on, or way off. Let the baseball season begin!

Topics: Design, major league baseball, red sox, infovis, Ideas, d3, javascript, Blog, UX