Predicting Player Performance in European Basketball using Machine Learning
Future Performances with Data from the Past
High roster turnover has come to be standard in European Basketball. It is now very common that a high percentage of players on your favorite team will be exchanged during seasons, very often with new players from different leagues. Often fans then wonder, whether this player from the Bulgarian League they never heard of who is supposed to be their team´s lead guard is any good. This opens the natural question, can we predict how good a player will be in the next season, considering the unique circumstances of European Basketball?
Goal of the Project
This scope of this model is not predicting a player’s long term career or potential of a young player or anything like that. I want to project how a player will perform in the next season, taking into consideration where he played the season before and where he will play in the next.
Setting up the Model
For this purpose I scraped box-score season data from 30 different European leagues since 2009 from proballers.com. I calculated various advanced statistics for each player (Offensive Rating, Offensive Role, Passer Rating, Estimated Plus-Minus and more) and selected all players who at least two consecutive seasons, numbering over 40000. I selected the metric Estimated Plus-Minus to be my target as it is a solid one-number metric for player quality. A player with an EPM over 3 is in the MVP conversation in his respective league, while a player with an EPM below -1 should not be starting, just to give you a sense of scale.
I trained my Machine Learning (ML) model using data from 2022 and earlier and applied this model to predict player performances in 2023 and 2024 to avoid data leakage.
We give the model detailed information about each players current season and situation. For this purpose I calculated relative strength ratings of all European leagues in an earlier post. It seems straightforward to suspect that a player coming from the Liga ACB and going to the Second French Division should see a certain uptick in quality and production due to the difference in opponent quality.
Additionally, we give the model the assumed role of the player. This is a rather big assumption, but for most players these roles don’t really change much season to season. The Offensive Role here is to be understood as a sliding scale from 1 to 5 indicating involvement on offense with 1.0 being a player shouldering the heaviest offensive load on the team (think TJ Shorts for example) and 5.0 the lowest (your local backup center). So we project a player’s production based on the involvement we think he will have on offense.
Results
Good news first: this somehow works really well. We have an error of around ±1, and an R² score of 0.70 when comparing the predicted EPM values in the testing data to the real player performances.
This means the ML model can explain around 70% of the variation in the player performance in the next season. The remaining 30% likely include things my data cannot capture, e.g. coaching decisions, team chemistry, nagging injuries or off-court things like adjusting to living in a foreign country.
The features most relevant to the prediction are the following:
Offensive Role in the upcoming season
EPM in the current season
Current level the player is playing
Level the player will play at
Current Offensive Role
Age of the player
Current Minutes Per Game
Current Team Quality of the player’s new team
By far the most influential input for the model is the Offensive Role the player will play in in the next season. This immediately makes sense, the best players will have the ball more often and have bigger roles on offense. If we tell the model that the player will not be very involved on offense it concludes that the player is not as good and the prediction will reflect is. This is illustrated below, with real EPM values. You can easily see that the Player EPM Value is heavily dependent on the player’s offensive role.
But the player quality in the current season of course is also important. If we tell the model the player will again be very involved on offense next season, but he also wasn’t very good there this season, the prediction again will reflect this. The next two most influential parameters were indeed the competition levels in the leagues, which makes me happy that all this work going into this was indeed necessary.
Of importance is also the player’s age and here I want to go to a small tangent to show the relation between age and performance in Europe. I was as surprised as you likely are that there seems to be no drop-off due to age here. But I suspect we have a prime example of survivorship bias here, as European teams don’t really give out long contracts and thus older players either are good enough to stay relevant in good team or they move to lower leagues or retire. At least this is my interpretation. We still see a nice performance relation to age of young players whose growing experiences leading to better performances when they get closer to their primes.
Usefulness
What I would like to look at now is where I trust my model.
This is the predicted data for 2023 and -24 plotted with the actual EPM data. What I now believe my model can do is, predict very good and very bad players.
When looking at players in the last two years my model predicted an EPM over 2 for, only one player produced a negative season and most went on to have very productive seasons.
On the other hand, when my model says a player will be bad next season, it also tends to be right.
The blind spot in my opinion is when my model predicts a player will be average as there is due to the error of ±1 quite a large difference between a +1 and a -1 player which my model cannot really separate from one another. But in my opinion this shouldn’t worry us too much as I would use it to project the best players for each season and I am not really interested who will be the most average.
When examining the predicted values from my model I found that it tends to be quite conservative. It never projected a player to reach EPM values over 3.5 despite that happening quite regularly. I managed my way around that by looking the percentile values for each Offensive Role, so if my model predicts a player to reach 2.0 EPM in the Offensive Role 2 this player actually is in the Top 5% of all player predictions in the database.
Concrete Examples
Now with the technical stuff out of the way, lets get into some predictions for the 2023 and 2024 seasons to show some different cases. I want to reiterate that, despite these seasons already being over, my model had no knowledge of that as it was not trained on these seasons. Below are all players my model projected to reach over 2.0 EPM with some notable players highlighted.
Michael Weathers, 2024, Actual: 2.9, Predicted: 2.2 (Top 10%)
Michael Weathers (already known from my BBL Team of the Season) was an all around amazing guard playing in the Austrian Bundesliga. Efficient on offense, amazing on defense But! He will be moving from the - not good - Austrian league, to the decent German Bundesliga and will be tasked with saving a bad Heidelberg team. And he did it! His offensive brilliance dipped a little but he was still an All-League defender and led Heidelberg to the semifinals where they even won a game against Bayern München in the SAP Garden.
Bastien Vautier, 2023, Actual 2.6, Predicted: 2.5 (Top 5%)
Bastien Vautier is a Center that moved from the second to the first division in France and became one of the best Players in France immediately. Very strong on offense, some weaknesses on defense. He posted a strong EPM of 3.5 in the Second Division and my model predicted he would maintain his production in the First Division.
Ethan Happ, 2023, Actual 3.3, Predicted: 2.8 (Top 2%)
Ethan Happ is a very good all-round Big Man who played a great season for a sub-par Rio Breogán team and got a move to the Eurocup team Gran Canaria. My model predicted he would have another great season and he delivered, turning in an even better 2023 season.
Dylan Van Eyck, 2023, YoY improvement: 4.9, Predicted: 3.1
Another fun use case is Dylan Van Eyck, who moved down from playing in the decent Spanish Second Division to the not so good Danish First Division. He was not a good player in his 2022 season, posting an EPM of -1.1. However, my model projected him to play way better in a weaker league with an increased role on Offense and in 2023 his EPM shot up to 3.8, even higher than my conservative model predicted.
Closing Arguments
I would like to point out that this model in no way is able to replace real scouting as there are a lot of things my database consisting of simple box-score stats and continuing evaluations and metrics do not capture. However, it’s recommendations might be solid. If a different team than Heidelberg would have approached Michael Weathers playing in the Austrian League they might have gotten the star guard instead.
In an upcoming post, when the rosters for the next season are set, I will deploy my model on interesting players to see what they might perform like next year and to project who might already be an early MVP candidate in his domestic league. Some candidates here will be Brae Ivey in Würzburg, T.J. Shorts who went to Panathiniakos, Tyson Carter to Red Star Belgrade, and whoever else you want to see.