Dota 1 dota 2 hero equivalents of weights
Tensorflow Dota Predictor¶
Google’s new TensorFlow looks set to be the neural net library of the future, so I wanted to do a simple project to get to grips with it.
Predicting Dota matches is a fairly straight forward problem as far as neural nets go. Dota is a multiplayer online battle arena type game where two teams (Radiant and Dire) of 5 players try to destroy each others base. Each player can choose to play from a pool of 111 different characters (heros). Each character has a unique set of abilities and has a role to play in the game. This role is generally one of damage dealer, supportive, or something in between. The theory is that a successful team should be composed of characters whose roles and abilities work well together, and against the enemy team. The hope is that a neural would be able to pick up on these successful combinations and be able to predict the winner based on the characters picked by the players.
The problem definition then is fairly straight forward. The input is a binary table of the the characters chosen by each team and the output is the winner of match, the Radiant team or the Dire team. However, its clear that there will be a large irreducible error, since individual player skill generally trumps character choices.
Previous Work¶
Kevin Conley and Daniel Perry of Stanford University worked on a paper doing almost the same thing. Their paper focused on recommending characters for players to pick against an enemy line up. They train a logistic regression model to predict the winner of each match and use this to build a simple recommendation engine. Their best model gets to 69.8% validation accuracy. This seems like a solid result, its well above the 50% baseline, and supports the idea of large irreducible error as mentioned before. But lets try to replicate it anyway.
Note: When the authors gathered their dataset, there were 108 characters to choose from. Now there are 111, but I will continue using the out dated dataset for the sake of consistency.
Something funnys going on in the paper, they report 69.8% accuracy using logistic regression but I have been unable to replicate it unless I restrict the dataset to the first
20000 entries. This might suggest some error in the first
20000 entries of the dataset. I compared logistic regression to my usual first choice for binary datasets, the multinomial Naive Bayes classifier. MultinomialNB seems to do much better over the whole dataset, and roughly in the ball park of the authors original model, so no harm done I guess.
When developing our neural net model, it will be useful to use MultinomialNB as a baseline.
Tensorflow Model¶
First split our dataset into training, validation and test sets
Initialise the TensorFlow session. InteractiveSession works much better for iPython Notebooks
Now the placeholders. This are essentially all the information that you might want to pass into your graph. The reason for splitting up the x variable will be explained just a little later.
The following function creates a fully connected layer with the matching weights/biases
Now I’ll explain the network architecture.
Since there are two teams in Dota and we intuitively want to network to first learn about the composition of each team, then pit them against each other, it makes sense to split the network in two at the bottom. One neural net to learn about the Radiant team, and another to learn about the Dire team, and combine them later to predict the winner.
But a good team on the Radiant side is still be a good team on the Dire side. It makes sense that if our neural net model learns that a certain combination of characters on one side is a good it should transfer that knowledge to the other side. How do we do this? We make both sides use the same weights! This solves both problems. It allows the neural net to concentrate on learning what makes anyone one side effective at the lower layers and leaves how to combine that information to the higher levels. Its a nice hierarchical structure, which is exactly what neural nets are good at.
We’re going to project the output to 2 columns, which i’ll explain below.
And since we’re using TensorFlow, we get a graph representation of our net for free!
Now after propagating through the network, we need to analyse our result. As mentioned, our result will be project to two columns instead of just one. We’ll one-hot encode the winning team so that one column represents a win for the Radiant and the other a win for the Dire. I got better preformance from the network doing softmax + cross-entropy on the two columns rather than sigmoid + binary cross-entropy on one column. I believe this might be because the network gets two points of information this way rather than one.
To improve generalisation, I regularise the network by adding the sum of the l2 norms of all the weights and biases to the loss value. I found this helped a lot. Mean loss is used as a reporting metric to compare training and validation loss.
For training, I used the Adam Optimizer. The best introductory resource I’ve found for choosing the right optimizer is by Sebastian Ruder here. As he suggests, using an optimizer that implements adaptive learning rates for each parameter is usually advisable given sparse data. Dota 1 dota 2 hero equivalents of weights Since some heros are picked much less frequently than others, the Adam optimizer is a good choice here.
Finally, for accuracy prediction, we pair off the prediction with the ground truth values to and check if they’re equal.
Tensorflow allows us to generate some nice visualisations in Tensorboard using summary objects
Now that we’re just finished setting up our model, initialize all the variables we created
We’ll create a helper function to help create the various data feeds we need
And a helper function to generate the mini-batches for our dataset
After 100 or so epochs, it looks like the network has more or less converged. We also get some more pretty graphs for free from TensorBoard. Although its kinda annoying how there isn’t currently anyway to put the training and validation loss plots on the same graph.
Finally, only after we’ve convinced ourselves that our model is pretty much finalised, do we get to peek at the test score.
Neat, 72.21% test accuracy! Earlier, the MultinominalNB model got to about 71.5% accuracy. Before drawing the conclusion that our model is definitely better though, I think there are two things worth noting.
The first is that there is a good chance we got kinda lucky with our test data and that it was relatively easy to predict. Its quite unusual to get a higher test score than validation score. This could be rectified by doing some proper cross validation, i. e repeatedly choosing different training, validation and test sets and seeing how the model preforms. But this is a bit of a chore when you have to worry about long training times.
The second thing worth noting is the vast difference between the complexity of building each model. It is far from insignificant that the MultinominalNB model could be built and cross validated in one line of code. Its clear which one would be easier to maintain and debug. The added complexity of the neural net also brings the relative unexplainability of each decision. The naive bayes model can be analysed using some bayesian statistics, but analysing neural nets and understanding why they arrive at the answers they do is still an active area of research.
Because of the points above, after all this, I think its fair to declare the one-line MultinominalNB model the winner. Either way, I learned a lot while writing this up which was always my primary goal.

