Making Meaningful Correlations

Something that I’ve always found difficult as a basketball coach is figuring out which players to put into the game and when to put them in.  I may want to put a player in because I know they can put in the ball in the basket more often than not, so I choose them.  Or I’ll choose to put a player in because I know they will rebound the ball and my team may be desperately needing to rebound.  There are different aspects of the game that a coach may value more than others, so that can help a coach decide who they want in the game that will give the team the best chance to win.  Which players do you want possessing the ball in the order to win the game?  Which parts of the game as a coach do you value most?  Would you rather have scorers on the court, who may be lackadaisical on defense?  Or do you value taking care of the ball the most?  Obviously, as a coach it’s not always possible to control which of your players are having the ball in their hands and making the most decisions during a game, but you certainly can encourage certain player to have the ball in their hands.

Keeping this idea in mind, I wanted to look at the correlations of the amount of possessions per game a player gets with different aspects of the game.  For example, you would expect a player who gets a lot of possessions a game to score more points per game than a player who gets a smaller amount of possessions per game.  So, logic tells us that a player who possesses the ball the most should score the most points.  If as a coach you value scoring, then you would want the player who scores the most points to have the most possessions.  Now I will get a bit mathematical, so I apologize if you start to drift off to sleep.  When looking at the statistical correlation between Poss/Game and Points/Game in a given team there should be an extremely strong positive correlation.  To measure the strength of the correlation we use Pearson’s r, which derives the linear correlation of 2 variables.  r always results in a value between +1 and −1 inclusive, where 1 is total positive correlation, 0 is no correlation, and −1 is total negative correlation.  In a perfect basketball world, this means that the r value should be as close to 1 as possible.  As a basketball analyst, the closer the r value is to 1, it means that the right players are in the game, possessing the ball, and taking the right shots.  If the r value results in a smaller number, then I see this as a red flag and where a change or improvement needs to be made to which players are possessing the ball.  This means that there may be some players who are scoring more points than a teammate who is possessing the ball more than them.  Hopefully some of you still haven’t drifted off to sleep yet!   In short, this particular correlation is looking at the metric Points Per Possession (PPP) in a different way.  Thus far this season, the table below shows the correlation between Poss/Game and Points/Game:

Poss:Game Pts:Game

It’s no surprise that all teams have a strong positive correlation, so not much can really be derived from this data.

To apply this same logic to a different variable is something that can be extremely helpful to a coach.  This can come into play when a coach wants to play certain players who have demonstrated a certain efficiency in an aspect of the game that the coach values.  So, I analyzed the correlation between a player’s Poss/Game and that player’s Turnover Percentage (%TO).  %TO is the percentage of the player’s possessions she turns over the ball.  I performed this correlation for each team in the WBBL and the resulting r values are shown in the table below:

Poss:Game TO.png

Now how to interpret these values… The closer the value is to -1 would mean that the players who have more possessions commit a less percentage of turnovers than those players who don’t have as many possessions.  It’s actually easier to think of it visually.  Let’s take Leicester’s data and plot it on a scatter plot.

Leicester plot.png

I want to look at Leicester because their r value is moderately strong (r = -0.7926).  Each blue dot represents a player on Leicester.  The x-axis is Poss/Game each player averages and the y-axis is the %TO that player commits.  As you can tell, if you were to put a best fit line connecting all the dots, it would result in a negative slope, which tells us that the players who have more possessions turnover the ball a fewer percentage of time than the players who get less.  This tells you that the players who are getting more possessions are taking better care of the ball.  This can also tell you that those players having less than 7 possessions a game need to work on committing less turnovers!

However, when a team’s data reveals no correlation (r value very close to zero) then this doesn’t tell us much.  It just means that there is no relationship between the 2 variables.  Where there’ll be a red flag is when there’s a stronger positive correlation.  This means that the players possessing the ball most of the game are turning over the ball at a higher percentage than the players who don’t possess the ball a lot.  This informs the coach that they’ll need to reevaluate the players they choose to play if they desire to take better care of the ball.

Likewise, this analysis can be done to just about any part of the game (rebounding, steals, fouls, assists, etc).  Then, to take it a step further we can look into which of these stats more heavily predict success.  This can be done using statistical regression, which I hope to utilise once the data becomes more robust.  I must also mention (again) that all the data I’ve gathered has been an extremely small sample size since each team has only played 3-5 games.  This can easily skew the results.  Also noteworthy is that the r values don’t assume causation, instead they merely tell us the strength of a relationship between 2 variables.  If we had previous season data available, this could be exciting to look at how players and teams have evolved.

I hope this wasn’t a snooze fest for you guys.  I just get so into the numbers behind it all!  I’ll now wrap it up by saying I can’t wait until more data is available as the season carries on, so that more meaningful analysis can be performed!!!  Until next time…

Happy Holidays! 🙂


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s