G’day lads,
In this article I’ll run through how I built the Excel LADZ UFC Model using a process called Logistic Regression. If you have any questions, be sure to comment on the post! The Excel file is attached to this post.
Collecting Data
I collected the UFC Fight Data from the following link: https://www.kaggle.com/code/nisdotexe/ufc-fighter-data-data-wrangling-final-project/input?select=preprocessed_data.csv
I selected the latest 1,500 rows from the ‘data.csv’ dataset. All I grabbed from this dataset was the two fighters and the winner. I then grabbed the Red and Blue Fighter’s 8 different UFC Stats from the ‘raw_fighter_details.csv’, and added them into the table.
As mentioned in the video, the Red Fighter recorded is usually the higher-ranked fighter leading into the fight. As a result, often they can be considered the fight favourite. They will be the focus in our model - the equation we will develop will find the Win Probability for the Red Fighter; the Blue Fighter’s Win % will be a by-product.
Transforming the Stats
Firstly, I’ve replaced the ‘Winner’ column with a new Winner column. In Logistic Regression, the outcome variable (probability of Red Fighter winning) needs to be numerical. By transforming the categorical outcome into 1’s and 0’s, the model can better understand and predict the relationship between the “predictor” variables (fighter stats) and the binary outcome.
So, the IF function built in Column A grants a value of 1 if the Red Fighter wins, and 0 if otherwise.
Next, the 8 different stats recorded for each fighter is directly from the UFC Stats Website and are very useful in assessing a fighter’s strength. However, they gain even more predictive power when the figures are compared to their opponent’s. To a certain degree, this accounts for tactics within a UFC Fight - fighters with a very significant advantage in one skill will most likely look to win in that way.
As a result, 16 columns of data (8 for each fighter) were reduced to 6. 4 of the new statistics were relatively simple to calculate: Red Fighter’s Metric - Blue Fighter’s Metric. These stats were:
However, the difference in the fighter’s striking and takedown accuracy were calculated by considering the opponent’s defence.
For example, Strickland’s Expected Striking Accuracy can be calculated by multiplying his own Striking Accuracy by 1 - Du Plessis’ Strike Defence %. This scales Strickland’s Striking Accuracy up if Du Plessis has a weak defence, and vice versa.
The calculation is then performed for Du Plessis (the Blue Fighter), and the difference between the two fighters (Red Fighter - Blue Fighter) is the final figure. The same concept is applied to the Expected Takedown Accuracy Difference.
Training the Regression Model
Now that we’ve collected all relevant data, we can process the data using the Regression tool in Excel. Regression analysis processes the X Range, our “predictors”, and finds a relationship with the Y-Range (what we are predicting - in this case, whether the Red Fighter wins).
Instead of taking all 1,500 rows of data into consideration, I’ve only done the Regression analysis on rows 380:1504. That is, I’m going to train the model on 75% (1,125 rows) of the data and leave 25% of unseen data in order to test the accuracy of the model.
Using the newly produced coefficients and the intercept in the ‘Regression’ worksheet, a Score for each fighter can be calculated with a linear equation. Then, using the Logistic Regression Function, a value between 0 and 1 is returned. This represents the fighter’s Win Probability according to our first draft of the regression equation.
A useful website for understanding this process: https://aws.amazon.com/what-is/logistic-regression/#:~:text=Logistic%20regression%20is%20a%20data,outcomes%2C%20like%20yes%20or%20no.
Then, a Likelihood column has been added. This is important, and measures how well the model predicted the outcome of the fight. A high value indicates the favourite was selected by the model and won. On the other hand, a low value indicates the model favoured a fighter who ended up losing.
After converting these likelihoods to log values, the sum of Rows 380 - 1504 is computed. By maximising this summed value, the likelihoods, on average, will become higher. This means that the model will be more able to predict the winning fighter in the ‘Win Probability’ column.
The most efficient way to do this in Excel is using the Solver tool. By maximising the Likelihood Function, Solver has changed the coefficients for the logistic regression model that best fit the data. This process helps in obtaining the best-fitting logistic regression model for predicting the outcome variable based on the predictor variables.
Testing the Model
We can test the accuracy of the fitted logistic regression model, by finding the number of times the likelihood of the unseen data (Rows 5 to 379) is above 0.5. Put simply, the number of times the Predicted Favourite won the Fight.
This returns a percentage of 72% - an excellent value for a simple model in Excel.
Implementing the Model
Now that we’ve come up with the optimal coefficients for our linear equation, we can grab each fighter’s UFC Stats and use the equation to come up with the Red Fighter’s ‘Score’. Then, using the EXP function (exponential function), this is converted to a Win % for the Red Fighter.
The Blue Fighter’s Win % is obtained by subtracting Red’s Win % from 100%. Below is an example.
Reed Blazer
2024-10-01 17:09:34 +0000 UTCSten Hash
2024-08-30 22:09:48 +0000 UTCExcel LADZ
2024-08-14 11:09:58 +0000 UTCSten Hash
2024-08-13 10:48:01 +0000 UTCJorge Nava
2024-04-12 14:34:54 +0000 UTCExcel LADZ
2024-04-12 13:17:27 +0000 UTCJorge Nava
2024-04-11 21:21:00 +0000 UTC