Excel LADZ

Excel LADZ - Building an NBA Team-Based Model

Added 2024-11-24 10:02:11 +0000 UTC

G’day, lads! While the ‘Excel LADZ - 2024-25 NBA Model’ is calibrating to each player’s role and influence on their respective teams for the new season, I decided to build a “Team-Based NBA Model”. Instead of determining a team’s strength by adding up individual player contributions, this model evaluates each team’s strength based on their performance so far this season.

The simplicity of a model like this is both a positive and a negative. By not accounting for detailed player stats or injury data, the model becomes significantly more efficient to run. For instance, while the NBA Model generates 1,000 simulations in 5 seconds, this team-based model can produce 2,500 simulations in just 1 second. The increase in simulations reduces the variance and leads to more stable and reliable probability estimates.

However, the model is not able to account for significant lineup changes. For example, Anthony Davis being unavailable for the Lakers should significantly impact their chances of winning. As the model doesn’t rely on player data, it has no ability to notice this absence. On the other hand, the original NBA Model has the ability to replace AD with another Center, making the simulations more realistic and the results more “predictive”.

In this article, I’ll run through the mathematical reasoning behind the model and important concepts. I hope that it proves interesting for everybody! If you’d like to join our community on Discord and download both the Player-Based and Team-Based NBA Model, click the links below!

2024-25 Player-Based NBA Model: https://www.patreon.com/posts/excel-ladz-2024-114792748

2024-25 Team-Based NBA Model: https://www.patreon.com/posts/excel-ladz-2024-116432682

Excel LADZ Instagram (Follow to Win 1 of 5 Free Patreon Memberships in December!): https://www.instagram.com/excel_ladz/

1. Importing the Data

I love to get basketball data from the website ‘Basketball Reference’. In this model, I’ve established a connection to Basketball Reference through a Power Query. This is so that each team’s stats can be pulled in automatically rather than the user having to copy & paste data into the spreadsheet every day. Here’s a link to the page I used: https://www.basketball-reference.com/leagues/NBA_2025.html

Unfortunately, the process of building a Power Query is different for a lot of Excel users. It depends on your version of Excel and your computer. I use Excel on a Mac, so including that process in the video would be a waste of time for most users. In case it helps, here’s how I did it: https://www.youtube.com/watch?v=jNbL3j1EJxs&t=4s

One of my favourite features of Excel is successfully setting up a Power Query. It makes Excel models so much more efficient. I recommend everybody watch a Power Query tutorial on YouTube as it is such a useful skill to have. Even better is the Power Query Editor…

2. Compiling the Data

Depending on how the data from the Power Query appears in your spreadsheet (it is usually able to be presented much more nicely on Windows), the next step is to compile the data for analysis. In the video, I’ve used a series of VLOOKUP formulas to compile the data.

3. Creating ATT & DEF Ratings

In this model, I’ll use the Expected Score formula: x_score = ATT Rating Opposition’s DEF Rating Competition Average.

As there are three methods of scoring (3 pts, 2 pts & ft’s), I compiled 6 columns of ratings below.

Each Rating represents what each team scores or concedes relative to the league average. For example, Cleveland score 1.11 2 Pointers for every 1 scored by the average team. On the other hand, they concede 0.96 2 Pointers for every 1 conceded by the league average team. Both these ratings are considered “good”; every team aims for a high ATT Rating and a low DEF Rating.

To calculate the ATT Ratings, I divided the number of 2P’s, 3P’s and FT’s each team scores per game by the average number of 2P’s, 3P’s and FT’s scored per game. It is the same process for the DEF Ratings, except I am dividing the number of 2P’s, 3P’s and FT’s conceded.

3. Simulating a Single Game

There’s no special formula to find each team’s probability of winning a game, covering the spread or the total points hitting. Instead, a game needs to be simulated. After being simulated many times, we can determine probabilities for each market using experimental probability. This means the probability of an event occurring is simply the number of times it happens divided by the total number of trials.

Firstly though, we need to simulate a single game. In this model, I’ll consider each team’s simulated score as the sum of their simulated 2P’s, 3P’s and FT’s. I’ll use the Binomial Distribution and the BINOM.INV function to generate realistic outcomes for each scoring category.

Using the ATT & DEF Ratings, a team’s x_2pts = 2PT ATT Opposition’s 2PT DEF League Average 2P Scored Per Game * Home/Away Advantage (2.5%).

After every expected value is calculated, each team’s xPoints can be found: x_2pts 2 + x_3pts 3 + x_fts. While comparing each team’s xPoints will tell you the expected winner of the contest (and by how much), it doesn’t give any insight into each team’s chances. For that, each scoring method must be simulated.

To simulate the points from 2 Pointers for each team, I’ll use the formula: BINOM.INV (10000, x_2pts / 10000, RAND()). This formula uses the binomial inverse function, which helps us simulate outcomes based on probabilities.

When the number of trials (n) is very large, and the probability of success (p) for each trial is very small, the binomial distribution closely approximates the Poisson distribution. Even though the BINOM.INV function is simulating outcomes using a binomial process, under these specific conditions, its output behaves like a Poisson-distributed random variable. This method ensures that the simulated results are realistic.

After doing this for the 3 Pointers and Free Throws, the Regular Time Simulated Score for each team is equal to sim_2pts2 + sim_3pts3 = sim_fts. If team’s are tied after regular time, overtime periods of 5 minutes a piece are played until a winner can be decided. This involves using the same formula, but multiplying the probability of success by 5/48 in the BINOM.INV function.

The final result should look like this below. This is a single simulated game, which didn’t need to go into overtime.

4. Simulating a single game, 2500 times.

To be able to analyse the data, we need to run simulations a large number of times to learn the true distribution of the results. In this model, I chose to run 2500 simulations.

I have set that up using a data table, and then using a series of SUMPRODUCT functions in the “Dashboard” worksheet, the Moneyline, Spread and Under/Over probabilities can be found.

To update the data, after having brought it in using a Power Query, go into the ‘Data’ tab and select the “Refresh All” icon. This will update the model’s connection with the internet, bringing in the latest stats for the model to run off.

Let me know if you have any questions in the comments below! Thanks for reading!