Correlated Run Contribution

JP Hochbaum

Well-known member
Joined:
May 22, 2012
Posts:
2,012
Liked Posts:
1,282
I do believe I have settled on a name for what I want to call this correlation stat for an at bat instance. It comes with finding how the leagues batting average, on base percentage, and slugging percentages correlates with average earned runs per nine innings (ERA). In the previous post I had shown how little batting average correlates with earned run average and am going to cover how on base, and slugging percentages correlate with ERA.

View attachment 2105

In the above graph it is hard to determine how much on base percentage correlates to earned runs but that is why Excel has the handy correlation coefficient. So for the stat that I created I found the correlation to be at .83, which is much higher than what batting average was, which is not surprising in the least.

Now let’s take a look at slugging percentage:

View attachment 2104

Slugging percentage when combined with ERA on a graph makes it almost look like an exact correlation, and it almost is with a correlation coefficient of .94, and if I were to graph OPS it would show a correlation of .97, which is statistically significant. So what to do with all these correlations?

I ended up just taking a batting average, an on base percentage and a slugging percentage and multiplying them by it’s correlation coefficient, thus reducing them to it’s true effect on creating a run, and then I added all those percentages up to get what I call the correlated run contribution.

So in the national league the league leaders in 2014 looked like this:

Andrew McCutchen 1.04
Giancarlo Stanton 1.03
Anthony Rizzo* .99
Justin Morneau* .97
Buster Posey .96
Yasiel Puig .95
Matt Kemp .94
Josh Harrison .943
Jayson Werth .935
Jonathan Lucroy .933
If you were to rank the top ten hitters by OPS, a few guys would shift around here, Puig would be ahead of Posey and Morneau, and Freddie Freeman would have knocked out Lucroy of the top ten here. So what is the difference? The slight advantage a hitter has in batting average, so if a hitter had a higher batting average but lower OPS there were times where the .62 correlated run contribution made a large enough difference to be more valuable than getting on base.

This is the kind of result I had intended to see when creating this stat, as I thought that although OPS had an incredibly high correlation to runs being created, it did leave out the anomalous hitters who hit for high contact and thus have higher batting averages. So in some cases, some hitters that hit for higher average, but draw fewer walks can indeed contribute more to a run scored than a guy who hits for a lower average but walks more, of course they would have to be very close to each other in OPS for the contact hitter to jump ahead. Thus if you are GM and you had two similar OPS hitters in free agency and needed a 3-5 hitter you would probably want the guy who had a higher CRC, and if you were looking for a 1-2 hitter a guy with higher OPS.

https://sportsstatsandscience.wordpress.com/2015/04/24/correlated-run-contribution/
 

WrigleyvilleTimes

Paul Sigrist
Joined:
Apr 7, 2015
Posts:
16
Liked Posts:
7
Location:
Chicago, IL
JP,

This is very interesting research. As a fellow sabermetrician, a few questions for you:

- Within the article you cited "coefficient". Not to doubt your statistical abilities, but rather for clarity/reassurance, you meant coefficient (r-squared) not correlation. Correct?
- How large was your sample size to determine the coefficient?
- What years does your sample size encompass? (i.e. 2004 - 2014)
- What minimums did you set for a player to qualify for your sample size? (i.e. AB's, GP, etc.)

Assuming the sample size is valid from a size and scope standpoint, this is quite a significant correlation you've found! Very impressive! Is sabermetrics something you're interested in pursuing a bit more aggressively, or is it a hobby for you? Truly, great work here!
 

JP Hochbaum

Well-known member
Joined:
May 22, 2012
Posts:
2,012
Liked Posts:
1,282
JP,

This is very interesting research. As a fellow sabermetrician, a few questions for you:

- Within the article you cited "coefficient". Not to doubt your statistical abilities, but rather for clarity/reassurance, you meant coefficient (r-squared) not correlation. Correct?
- How large was your sample size to determine the coefficient?
- What years does your sample size encompass? (i.e. 2004 - 2014)
- What minimums did you set for a player to qualify for your sample size? (i.e. AB's, GP, etc.)

Assuming the sample size is valid from a size and scope standpoint, this is quite a significant correlation you've found! Very impressive! Is sabermetrics something you're interested in pursuing a bit more aggressively, or is it a hobby for you? Truly, great work here!

The sample size for the correlations were 1871 to 2014 :)
 

Zvbxrpl

Well-known member
Joined:
Oct 3, 2014
Posts:
2,306
Liked Posts:
2,353
:fap::fap::fap::fap::fap:

ZOMGZ Sabremetrics :cum::cum::cum:
 

beardown28

That's What She Said
Donator
Joined:
Apr 18, 2010
Posts:
1,584
Liked Posts:
509
Location:
Scranton, PA
This stuff confuses the fuck out of me. I feel like I'm in stats again.
 

JP Hochbaum

Well-known member
Joined:
May 22, 2012
Posts:
2,012
Liked Posts:
1,282
In normal speak it is how much a player contributes to a run scored. So the higher the number the more he contributes to a run.
 

Top