SamuKata
Excel LADZ
Excel LADZ

patreon


Excel LADZ UFC Model | 72% Accuracy

G’day lads,

In this article I’ll run through how I built the Excel LADZ UFC Model using a process called Logistic Regression. If you have any questions, be sure to comment on the post! The Excel file is attached to this post.

Collecting Data

I collected the UFC Fight Data from the following link: https://www.kaggle.com/code/nisdotexe/ufc-fighter-data-data-wrangling-final-project/input?select=preprocessed_data.csv

I selected the latest 1,500 rows from the ‘data.csv’ dataset. All I grabbed from this dataset was the two fighters and the winner. I then grabbed the Red and Blue Fighter’s 8 different UFC Stats from the ‘raw_fighter_details.csv’, and added them into the table.

As mentioned in the video, the Red Fighter recorded is usually the higher-ranked fighter leading into the fight. As a result, often they can be considered the fight favourite. They will be the focus in our model - the equation we will develop will find the Win Probability for the Red Fighter; the Blue Fighter’s Win % will be a by-product.

Transforming the Stats

Firstly, I’ve replaced the ‘Winner’ column with a new Winner column. In Logistic Regression, the outcome variable (probability of Red Fighter winning) needs to be numerical. By transforming the categorical outcome into 1’s and 0’s, the model can better understand and predict the relationship between the “predictor” variables (fighter stats) and the binary outcome.

So, the IF function built in Column A grants a value of 1 if the Red Fighter wins, and 0 if otherwise.

Next, the 8 different stats recorded for each fighter is directly from the UFC Stats Website and are very useful in assessing a fighter’s strength. However, they gain even more predictive power when the figures are compared to their opponent’s. To a certain degree, this accounts for tactics within a UFC Fight - fighters with a very significant advantage in one skill will most likely look to win in that way. 

As a result, 16 columns of data (8 for each fighter) were reduced to 6. 4 of the new statistics were relatively simple to calculate: Red Fighter’s Metric - Blue Fighter’s Metric. These stats were:

However, the difference in the fighter’s striking and takedown accuracy were calculated by considering the opponent’s defence.

For example, Strickland’s Expected Striking Accuracy can be calculated by multiplying his own Striking Accuracy by 1 - Du Plessis’ Strike Defence %. This scales Strickland’s Striking Accuracy up if Du Plessis has a weak defence, and vice versa.

The calculation is then performed for Du Plessis (the Blue Fighter), and the difference between the two fighters (Red Fighter - Blue Fighter) is the final figure. The same concept is applied to the Expected Takedown Accuracy Difference.

Training the Regression Model

Now that we’ve collected all relevant data, we can process the data using the Regression tool in Excel. Regression analysis processes the X Range, our “predictors”, and finds a relationship with the Y-Range (what we are predicting - in this case, whether the Red Fighter wins).

Instead of taking all 1,500 rows of data into consideration, I’ve only done the Regression analysis on rows 380:1504. That is, I’m going to train the model on 75% (1,125 rows) of the data and leave 25% of unseen data in order to test the accuracy of the model.

Using the newly produced coefficients and the intercept in the ‘Regression’ worksheet, a Score for each fighter can be calculated with a linear equation. Then, using the Logistic Regression Function, a value between 0 and 1 is returned. This represents the fighter’s Win Probability according to our first draft of the regression equation.

A useful website for understanding this process: https://aws.amazon.com/what-is/logistic-regression/#:~:text=Logistic%20regression%20is%20a%20data,outcomes%2C%20like%20yes%20or%20no.

Then, a Likelihood column has been added. This is important, and measures how well the model predicted the outcome of the fight. A high value indicates the favourite was selected by the model and won. On the other hand, a low value indicates the model favoured a fighter who ended up losing.

After converting these likelihoods to log values, the sum of Rows 380 - 1504 is computed. By maximising this summed value, the likelihoods, on average, will become higher. This means that the model will be more able to predict the winning fighter in the ‘Win Probability’ column.

The most efficient way to do this in Excel is using the Solver tool. By maximising the Likelihood Function, Solver has changed the coefficients for the logistic regression model that best fit the data. This process helps in obtaining the best-fitting logistic regression model for predicting the outcome variable based on the predictor variables.

Testing the Model

We can test the accuracy of the fitted logistic regression model, by finding the number of times the likelihood of the unseen data (Rows 5 to 379) is above 0.5. Put simply, the number of times the Predicted Favourite won the Fight.

This returns a percentage of 72% - an excellent value for a simple model in Excel.

Implementing the Model

Now that we’ve come up with the optimal coefficients for our linear equation, we can grab each fighter’s UFC Stats and use the equation to come up with the Red Fighter’s ‘Score’. Then, using the EXP function (exponential function), this is converted to a Win % for the Red Fighter.

The Blue Fighter’s Win % is obtained by subtracting Red’s Win % from 100%. Below is an example.

Excel LADZ UFC Model | 72% Accuracy

Comments

After you use Solver to adjust the coefficients based on the log likelihood, is there a way to find the P-values for the new coefficient values?

Reed Blazer

Any similar model for tennis or other 1 v 1 sport ?

Sten Hash

Hello Sten, welcome 🔥 I trained and tested the data using fights for both men and women, so it should work for both 👍 You can check out the fights the model was trained on in the "Rating Test" worksheet. Let me know if you have any more questions lad 😃

Excel LADZ

Hi. It is mixed model for man and woman ?

Sten Hash

Understood. Thank you for responding.

Jorge Nava

Hi lad! This model was built using a process called logistic regression. As a result, the formula to come up with the Win % was derived using historical data that doesn't need to be updated. However, the fighter stats determine the Win %'s, so they need to be entered in appropriately. For example, if you were modelling Sean O'Malley's chance in a fight, you would manually enter his current UFC stats into the model. They can be found here: http://ufcstats.com/fighter-details/b50a426a33da0012

Excel LADZ

how often is the data updated?

Jorge Nava


More Creators