Through an odd series of personal connections, I was recently approached with the possibility of interviewing Sky Andrecheck, former blogger and current member of the analytics department of the Cleveland Indians. Sky graciously agreed and responded to a series of questions and follow-ups about his work, the Indians use of quantitative data, and a few other bits of miscellanea via e-mail. This is part one of that interview. The second one will come out later this week. Together, they offer an interesting look into the inner workings of the Tribe's decision making process.
First off, what is your technical position within the organization and what is your primary responsibilities within the organization?
My technical title is Baseball Analyst. We have an analytics department, consisting of Keith Woolner, Jason Pare, and myself and our job is basically to quantitatively analyze all aspects of the game for our baseball operations team. As you might imagine, it involves a lot of building of systems and algorithms to analyze players, create forecasting models, and things of that nature. The job itself can range from analyzing anything from broad topics like trying to figure out which players are going to be good 10 years from now, down to analyzing the break and movement on the pitches thrown by last night’s starting pitcher. I would say about 80% of the job is building these types of systems and coming up with general models and ways to measure players and performance, while the other 20% is carrying out specific analysis requests from Chris Antonetti and Mike Chernoff regarding the impact of players we might be acquiring or moves we may be thinking about making.
The Indians are generally regarded as a sabermetric-savvy organization. A recent piece at the website Fangraphs.com suggested the Indians were an organization that incorporated sabermetrics into their scouting, statistical analysis and business operations. Do you think this assessment of the organization is fair?
I think that’s fair. Obviously we do employ three full-time analysts which implies a certain amount of commitment to quantitative analysis and sabermetrics. I would say our organization is data driven and process driven as a whole. The people at the top try to gather as much information we can on players before making decisions. That includes not only the stuff that we do as analysts but also opinions from our expert scouts, medical information, information on a guy’s make-up. Then we try to quantify and catalogue that information so that we can later look back at how our process is working and how we should be weighing each piece of information.
At the recent SABR meetings in Arizona, both Chris Antonetti and Mark Shapiro were featured as speakers. Shapiro, commenting on the analysis of defense, said that the statistical analysis of defense was somewhere around the equivalent of using batting average to measure offense. At the risk of offending your boss, do you think this statement is accurate?
I think it’s pretty well established that defensive metrics are rougher than offensive and pitching metrics at this point. One of the tough things is that it’s hard to catalogue opportunities with fielders. For pitchers and batters it’s easy. Each at-bat is an opportunity to either get a hit from the batter’s perspective or make an out from the pitcher’s perspective. With fielders it’s tough because it’s hard to tell if a player even had a chance to record an out. So calculating runs saved can be tough. We can try to be clever about it but that can only carry you so far. I might say fielding metrics are more like RBI – the number of opportunities is unclear and there are a lot of factors outside of a player’s control that can influence the result.
If so, how accurate do you think the statistical analysis of defense can be given the limitations of sample size (i.e. players only receive so many defensive chances of a given kind during a given window of their career), interaction effects (i.e. how players position themselves or respond on defense differently given the presence of other specific defenders and/or pitchers) and the limits to the data available to us?
I think there are limitations with the data that’s currently out there. Like I said, it’s just inherently tough to measure. If we gave a guy 10,000 balls with a randomly selected set of pitchers and hitters we’d have a pretty good measure of a guy’s ability. But we just don’t have that. It’s going to take some new types of data before we can move much further beyond where we are now from a pure statistical perspective.
You say the organization tries to incorporate quantitative analyses and process based decision making throughout the organization. How much of the work that you and the baseball analysts do finds its way to coaches, and ultimately to the players themselves within the organization?
I would say Manny Acta and the coaching staff are open to looking at statistical analysis. At the end of the day they are trying to win ballgames, so they’re open to anything they think can help the team achieve that goal. As far as the players go, our analytical staff doesn’t really have much if any contact with them. Any findings that might affect them would be filtered down through the player development staff, coordinators, and coaching staff.
Do the Indians incorporate statistical analysis into their assessment and/or prediction of injury risk in their players, particularly pitchers?
We try to quantify everything as best we can. A lot it doesn’t necessarily fall under hard statistical analysis, but we do try to notice general trends in what types of injuries tend to occur with which types of players, and then if they do get injured, how likely is it that they can bounce back or remain healthy. And of course a lot of that analysis is informed by the medical opinions and hypotheses of our training staff.
One of the big developments within sabermetrics over the past several years is the availability of data that tracks specific action on the field such as pitch f/x, and possibly hit f/x and field f/x. How much do the Indians use these data?
We use them. I hesitate to say how much or in what way we use them, but we definitely use them. In general, the F/X data is a great tool for looking beyond the basic stats of hits, outs, homeruns, etc, and actually see how hard players are hitting the ball or what kind of movement a pitcher’s slider has. That said, there are limitations in that type of data as well and we try to blend all of our sources of information together when evaluating guys.
Part two of the interview will be posted later this week.