Skip to content

UZR

Ultimate Zone Rating

From baseball’s earliest days, managers, players, reporters and fans have all tried to answer a seemingly simple question: What value does a player contribute to his team defensively? For the first 100 or so years, we relied on nothing more than fielding percentage – (balls played – errors)/(balls played). We all understand that this isn’t a good measure of value. A 50 year old man who plays third base, only fields the slowest of ground balls, makes a lazy but accurate lollipop throw to first, has a fielding percentage of 1.000. Ryan Zimmerman, who dives for balls in the camera well and makes throws from the left field tarp, will have a less than perfect fielding percentage. Still, every team in baseball would be happy to have Zimmerman at third.

History

Bill James tried to improve on the fielding percentage stat when he created Range Factor (put outs + assists)/(games played). This was a better stat than fielding percentage, but still had room for improvement. In 2003 Mitchel Lichtman (or MGL) introduced a new stat at the Baseball Think Factory called Ultimate Zone Rating (UZR).  MGL tried to account for external factors such as variance in pitching, variances in ball-parks, and luck. He also converted from a games played stat to an innings played stat. In theory, UZR measures how well a player converts a batted ball into an out. This is a positional stat. You can’t compare the UZR of a right fielder to the UZR of a shortstop (the old apples and oranges thing).

How Is UZR Computed?

The field is segmented into 78 zones – 64 of which are used in UZR calculations. Every play is entered into a huge database with items such as the zone number where the ball landed, type of hit (Ground Ball, Fly Ball, Line Drive, Pop Up), etc. Here’s a chart of the zone:


To adjust for ball park effects, outfield foul balls are ignored. Also, infield line drives, which are more the result of positioning than skill, are ignored, as are infield pop-flies. Pitchers and catchers are not included in UZR.

After every play is entered, we start with the math. Algorithms are run for every zone, determining the number of balls hit, the type of hit, what percentage of time the ball was fielded for an out, what percentage of time the ball was fielded by position etc. For example, consider a zone between shortstop and third base. For simplicity sake, say there were 250 balls that landed in the zone. 50 landed for hits, 150 were fielded by the third baseman for outs, and 50 were fielded by the shortstop for outs. The MLB expected average would be computed for that zone, and stored in an expectancy matrix. Then, each player is compared for his position against the matrix. Now, our 50 year old man who records 10 outs in 250 chances in that zone is compared against the expectancy matrix of 150 outs, and receives a -140 for that zone. Zimmerman, who might record 190 outs in that zone, gets a +40. These computations are made for each zone of responsibility on a positional basis (the apples and oranges thing again), to create each player’s UZR. The UZR/150 you see on Fangraphs also makes adjustments for handedness of the pitcher and batter, the game state (number of outs/runner position will determine where a throw is made), double plays turned, batted ball speed, and errors made. The 150 on UZR/150 means that Fangraphs has normalized UZR calculations so that all players are compared over a 150 game season (more on this later).

How Reliable is UZR?

Over time, a player’s UZR regresses to his likely ability. The question is, how much time does it take (or in the math world – how big a sample do you need) for UZR to be accurate? Sample size is always a factor in statistics. If I flip a penny 10 times and get 8 heads and 2 tails, I can’t project  1 million flips as 800,000 heads and 200,000 tails because my sample size (number of flips) isn’t big enough. As mathematicians say, given a  big enough sample, the accumulation of individual events will regress to the mean. That’s a fancy way of saying, if I flip a penny 1 million times, I’m very likely to have nearly 500,000 heads and 500,000 tails. That’s the real problem with UZR. Depending on which mathematician you listen to, 1 season of defensive data equates to 50-75 games worth of plate appearances. That’s not nearly enough data for reliable statistical analysis.

How Many Games Do We Need?

Tom Tango, co-author of Inside the Book, believes that 200 Plate Appearances (PA) equals 400 Balls in Play (BIP). He has also found that different defensive positions receive a different number of chances in a game. His research shows that SS and 2B get on average 5 BIP per 9 innings, 3B and CF get 4 BIP/9 Innings, and LF, RF, and 1B get 3 BIP per nine innings. Think about that. If Adam Dunn plays 150 games at 1B for the Nats this year, he will only see 450 BIP, or the equivalent of 225 PA. We would never judge a player’s offensive abilities on 225 PAs. We shouldn’t judge a player’s defensive abilities on 450 BIP. In reality, our defensive statistics sample size doesn’t reach critical mass until roughly 3 seasons of data have been entered.

UZR/150

We talked about how Fangraphs normalizes UZR data to a 150 game season. Now that we know we need 3 full seasons (or 450 games) worth of defensive stats to have a reliable sample size, we can see how unreliable this stat really is. If a player has played a half season (75 games) at a position, this is only 1/6th of the data we need for reliable analysis. Extrapolating these 75 games to 150 is no different from extrapolating 10 coin flips to a million. We can create a fancy formula to come up with a number, but there isn’t enough data to make the number meaningful.

What To Do?

Much like we use the slash stats (AVG/OBP/SLG) in tandem to get a more complete look at a player’s offensive abilities, there are multiple stats that try to measure a player’s defensive ability. In addition to fielding percentage, zone rating and UZR, John Dewan devised a stat called Plus/Minus. (For more information, go here). Plus/Minus breaks the field into zones, and is very similar to UZR. One of the biggest improvements is applied to the 1B position. UZR does not account for a 1B holding a runner. So, teams whose pitchers have a higher number of base runners have 1B susceptible to a lower UZR. Plus/Minus corrects that omission by adding “runner on 1st” as one of the game state adjustments. Of course, we still need 3 years of stats for Plus/Minus to achieve the desired level of confidence.

The bottom line is this – none of these stats paint a perfect picture of a player’s true defensive value. UZR and Plus/Minus are better than fielding percentage. Maybe we should start a new defensive slash stat called Fielding Percentage/UZR/Plus-Minus?

Epilogue

When I posted this on FederalBaseball.com, I received two responses back from MGL. Since this is his baby, we’ll let him have the last word. Here are his comments.

“UZR does not account for a 1B holding a runner. So, teams whose pitchers have a higher number of base runners have 1B susceptible to a lower UZR.”
Sure it does. UZR adjusts for outs and baserunners so a runner on first and no one on second (where the runner is held by the first baseman) is treated separately.
MGL
by mgl on Mar 29, 2010 5:27 AM EDT
Two more things in response to a comment above: One, first base is the only position where we don’t see an aging decline from the get go. Two, there is not a symmetrical confidence interval around a sample UZR like -7 to -27, because a skill like defense is typically normally distributed, or at last part of a normal curve. A traditional confidence interval (which is symmetrical) assumes that all values above and below the midpoint are equally likely. If that were true, we would not regress the sample number (-17 for Dunn) towards zero (or some mean which represents the population the player comes from).
MGL
by mgl on Mar 29, 2010 5:33 AM EDT

Leave a comment