Part 1- Prologue and Background

League of Legends is a highly popular multiplayer game where two teams of 5 face off in a match. Each team member has a role, and they can pick over 140 characters (aka champions). There is even esports and prize competitions held for this game.

In this project I am going to utilize the Riot API to scrape data from their database on ranked matches. I am going to gather a match's team composition, position, and the team that won and put it into a SQLite database. Only matches from Diamond rank will be used to limit any data points that may be caused by lack of player skill (such as seen in lower ranks).

Part 2 - Introduction

The data has been collected by various python code to generate lists of over 10k entries. All the data has been put into a SQLite database to read from this notebook.

I am going to utilize machine learning to attempt making an algorithm that can predict a match's outcome based on both team's composition. Ideally we may want an algorithm that may be able to predict match outcome >70% of the time. However, this may be an indication that the game may not be balanced if an outcome can solely be determined by character selection.

Part 3 - Cleaning and Organizing

The data has been gathered and now we need to clean and organize the data. First, each entry is marked by a Match ID; each Match ID row represents a single player's champion and lane and team. We need to organize the same Match ID's into the same row to have each row represent one match.

The positions in league of legends are as follows: Top, Mid, Jungle, ADC, and Support. Each name in a column is the champion selected for that role.

We have 2004 matches to analyze in our data set.

Part 4 - Exploration

Now that the data has been organized, we can begin exploring the data before attempting to make our algorithm.

First, lets see how many unique entries we have in each lane.

With each role we near ~90 unique entries. This will be problematic for our algorithm as we would need to convert each text name into a unique identifier number for the algorithm to read.

This phenomenon is known as "The Curse of Dimensionality". As the number of unique dimensions increase (Unique Champions per lane), the number of data points needed for good performance increases exponentionally. As a result, our model may not perform well with reading new data.

To explore this, we will go ahead and organize this original data into a training and test set. We can compare the algorithms made using this data.

As we explore the data, we run into another issue as well. In League of Legends, one team may pick a champion based on the other team's champion pick. This is known in the game as "Counter Picking". Ignoring player skill, one champion may outplay the other based on the champion's kit.

We will need to change our dataframe once more to account for this. This time we will focus on the winning outcome of blue team as the predictor variable. Our algorithm will predict whether blue team wins based on a match composition.

In preparation for constructing the machine learning algorithm, we will generate a cleaned dataset where we reduce the amount of unique entries in each lane, hence decreasing dimentionality.

Through exploration, we notice there are many champions only selected one or twice per role compared to over 20 times. We will label champions picked 15 times or less as "Uncommon" champions picked. The code will count an entry for each time a champion is selected in the role, and then label it as uncommon if they do not exceed 15 picks in the match list.

As we can see from the output below, we reduced the amount of champions as many as 60%.

Part 5 - Algorithm Construction

Since we are working with categorical variables as input and as predictor, we will use a confusion matrix to output our results.

We will output the accuracy for the training set (What the model learns from), and the validation set (What the model preforms seeing new data).

Key-

Onehot encoding is a technique to take categorical data (Like names of each champion) and encode them to a specific numeric variable. This allows our algorithm to read the data point and use for predictions.

Output of Algorithms

We will use a Random Forest Classifier and a xgboost classifer as our algorithms.

As we can see on both models, we get similar results of 50% accuracy on the validation.

Does this mean our model can only predict accurately 50% of the time? Or do we still have issues with dimentionality?

We will attempt to improve our models with a few ways. First, we will implement K-Fold Cross Validation. This will help ensure our model utilizes as much training data as it can.

As we can see on both models, we get similar results of 50% accuracy on the validation still.

Perhaps we can get better insight if we impliment counterpicking as discussed earlier.

Implementing Counterpicking

In order to simulate counterpicking, we will need to generate a new database on what champion counters who, and then using that, implement whether a matchup has an advantage or disadvantage due to counterpicking.

We will gather counterpicking from This Counterpicking Website. Bear in mind this was accessed on 04/01/2022 and the game may have changed since then on counters as champions get updated or added in.

We will also add in the uncommon champion picks as well to this dataframe.

Now to apply Onehot encoding.

Implement K-Cross Validation. Create the model. And print out results.

Conclusion

Despite this work, we still get an accuracy rating of only ~50%.

This could mean one of two reasons. The first reason is that perhaps our model requires more adjustment. With the curse of dimensionality, our model may need many more datapoints in order to accurately make predictions. Despite reducing the amount of unique entries with the "Uncommon" champion cleanup, we still have maybe 20-40 unique entries for each role. Improvement of this model may require skills I currently do not have at this time.

The other reason could be the model is predicting accurately 50%. This would support the idea that perhaps the game is balanced as it is. Indeed, if a match outcome could be predicted just by looking at the match champion composition alone, it may not be a very balanced game. The reality is League of Legends match outcomes vary by player skill, how they "build" their characters, how communicative they are, and how they play the game (Safe or Aggressive). While match composition matters, game developers may want it to only matter 50% of the time, and the other 50% of the time be dictated by players. A match wouldnt be fun if you can reliably tell who would win by the composition!

This project helped to explore Riot's API and generating Databases with SQLite. It also helped to explore manipulating datasets, explore the Curse of Dimentionality, and creating categorical predictor algorithm models.