MMORPG’s business model
Nowadays, MMORPGs are so popular that their market is worth over a billion dollars in 2019 and it is expected to keep growing during the following years. They can count over 10 million active monthly players, and, as expected, their huge base of players is one the keys to their success. In fact, these kinds of games belong to the (FTP) Free-To-Play category, which means that everyone can play them for free, but it is still possible to pay some micro-transactions to get additional contents such as new skins for its own avatar, new items or pieces of equipment, exclusive maps or in-game virtual coins. These micro-transactions usually have a price that can range from one dollar to hundreds of dollars and are often the only source of revenue of these FTP games since the advertisements are often absent because they would severely impact the user’s experience. Despite their huge amount of players, only a small part of them regularly engage in these optional purchases, but it’s still enough to maintain these games very profitable and successful. That’s why having a lot of players is not enough but it is also important to keep them in the game longer, creating addiction, and convince them to make some in-game purchases so that the company of the game can keep running.

Importance of predicting players’ departure
Considering the company’s reliance on these in-game purchases, the cessation of play from part of its users will lead to a negative impact on the company’s revenue. The business strategist Fred Reichheld stated that within the financial services industry “A 5% increase in customer retention produces more than a 25% increase in profit”. That’s why it is very important to predict when players are about to leave the game (churn prediction) and immediately take actions to remedy this problem dissuading them from leaving it. In this post, we are going to use the term churn to refer to the process of a player leaving the game indefinitely and discontinuing to be a customer. Many players might be intentioned to leave a game for many different reasons such as lack of content, monotonous game-play, or game imbalances (too easy or excessively hard). Sometimes, a player may just leave the game because they realized it was a waste of time or because they had more important things to do such as important exams or work. A research conducted by SuperData found that gamers tend to abandon games in groups, with 34% of churned players indicating that they had left a game simply because their friends stopped playing. Thus, player departure should indicate low user satisfaction and if we can predict it and find the problem, we may have a chance to stop them from leaving and make further improvements to keep the game interesting.
Following, we have listed (in frequency order) the most frequent reasons for players leaving a game:
- They had more important things to do, such as obligatory military service or school entrance exams.
- They become bored with it.
- Their friends left.
- They realized that it is a waste of time playing MMORPG after all.
- Their accounts were hacked.
- They turned to other newer games.
- They had no more money to spend on entertainment.
Predicting player’s departure
In this post, we are going to analyze the data of 37,354 avatars of the game World of Warcraft sampled every 10 minutes during the year 2008 for a total of 10,826,734 monitored sessions in order to try to predict whether they are about to churn (ATC) or not. It is preferable detecting ATC players rather than churned players for the fact that it is easier to encourage a player who is about to churn to keep playing rather than convincing a churned player to go back to the game. The dataset is the same dataset described in the player’s behavior prediction task discussed in a previous post and can be downloaded at the following link. We are going to make these predictions using two different models: SVM and LSTM.

Data preprocessing
The idea is the one to split the data of each player into a different dataset and keep one entry for each week of the year (from week 1 to week 52) by grouping the sessions with the same ID and in the same week, therefore, the attribute timestamp
has been transformed such that it indicates the appropriate week rather than the date and the time. Next, we dropped all the original attributes except timestamp
and ID
and replaced them with new features calculated as a function of the original columns.
evolution
: The level of the avatar at the beginning of the week minus the level reached at the end of the same week.lvl_avg
: It means “average level” and is calculated as the mean between the level of the avatar at the beginning of the week and the level reached at the end of the same week.time_hours
: the number of hours a player played during the week. Note playtimes of than 15 minutes have been smoothed to be 0. For example, a player who is about to leave the game might play fewer hours than the daily mean hours of normal players and could play less frequently because less interested in the game.
The following next two attributes involve also the player’s history to be computed (previous rows). Since both are based on the feature time_hours
, if a player played for less than 15 minutes in a week, that week would be considered as the player didn’t play at all.
current_absence
: Number of weeks since the player last played.weeks_present_ratio
: Number of weeks the player has been active divided by the total number of weeks since he first registered to the game.
Since ours is a supervised approach, we also need to set the appropriate label for each week of each player. For simplicity, we refer to the week the player has been active as ING (in-game), the weeks where he was about to churn as ATC, and the weeks during which he churned as CHR. Next, we define a churn window (C) such that if there is no data for a player during at least a C weeks period, then that player is considered to have churned during that time, thus the relative weeks have been labeled as CHR. Note that this means that it is also possible for a player to churn for a period of time and then return. Moreover, the C weeks before a player churned have been labeled as ATC and the rest of the weeks have been labeled as ING. It is worth noting that ATC sequences might be shorter than C weeks long, this can happen if, for example, an ATC period is surrounded by CHR periods and doesn’t have an activity period of at least C weeks in a row.
ID | EVO | LVL_avg | tm_h | status | cA | wpr |
---|---|---|---|---|---|---|
35 | 0 | 70 | 21.667 | ING | 0 | 1 |
36 | 0 | 70 | 16.167 | ING | 0 | 1 |
37 | 0 | 70 | 3.667 | ING | 0 | 1 |
38 | 0 | 70 | 11.833 | ATC | 0 | 1 |
39 | 0 | 70 | 19.5 | ATC | 0 | 1 |
40 | 0 | 70 | 12.167 | ATC | 0 | 1 |
41 | 0 | 70 | 4.667 | ATC | 0 | 1 |
42 | 0 | 70 | 0 | CHR | 1 | 0.976 |
43 | 0 | 70 | 0 | CHR | 2 | 0.953 |
44 | 0 | 70 | 0 | CHR | 3 | 0.932 |
(An example of player history within a range of 10 weeks. It can be clearly seen the transitions between ING, ATC, and CHR periods.TM_H=Time Hours, CA=Current Absence, WPR = Week Present Ratio.)
Secondly, we extract sequences of weeks of length L from each player (history) where the label of each sequence is the same of the label of the last week of the sequence, for example, a sequence composed of ING, ING, ATC, ATC weeks would have label ATC. Anyway, we are going to discard the CHR labeled sequences since they are not useful for our goal. Following, have been discarded as much ING sequences such to have an equal number of ING and ATC sequences, they have been standardized, mixed, and split in training and test set with a test set size of 0.2.
Important: the code can be found on the following repository: https://github.com/davide97l/WoW-dataset-analysis
Classification
In our first attempt, we use an SVM classifier with C=4 and L=1. L=1 means that the classifier takes in input only a single week and C=4 means that the 4 weeks before a player churned have been labeled as ATC. We will call this classifier as SVM_4_1.
In a second try, we use again an SVM classifier but this time with parameters C=4 and L=7. In this way, we don’t only have the current week but also the history of the 6 previous weeks to use for our prediction. This model will be called as SVM_4_7.
Next, we use an LSTM deep recurrent model with recurrent layers of 50 neurons, and a final single neuron layer output with softmax activation. We train it for 30 epochs with a batch size of 64, binary crossentropy as loss function, and adam as optimizer algorithm. In addition, we set a value of C=4 and L=10 and we call this network as LSTM_4_10.
Finally, we are going to train again the SVM_4_7 and LSTM_4_10 classifiers but this time setting the parameter C=1. It means that only the week before a player churned has been labeled as ATC. We are going to call these last two classifiers as SVM_1_7 and LSTM_1_10 respectively.
Prediction results
In the following table are showed the classification results regarding the prediction of the active weeks (ING sequences). Note that the accuracy value has been calculated over all labels, thus, it results to be the same for both ING and ATC sequences.
classifier | acc | prc | rec | f1 |
---|---|---|---|---|
SVM_4_1 | 0.765 | 0.819 | 0.738 | 0.777 |
SVM_4_7 | 0.773 | 0.842 | 0.740 | 0.788 |
SVM_1_7 | 0.838 | 0.886 | 0.804 | 0.843 |
LSTM_4_10 | 0.783 | 0.855 | 0.749 | 0.798 |
LSTM_1_10 | 0.850 | 0.880 | 0.828 | 0.853 |
Next, we are going to show the classification results regarding the prediction of the about to churn weeks (ATC sequences).
classifier | acc | prc | rec | f1 |
---|---|---|---|---|
SVM_4_1 | 0.765 | 0.710 | 0.797 | 0.751 |
SVM_4_7 | 0.773 | 0.704 | 0.817 | 0.757 |
SVM_1_7 | 0.838 | 0.793 | 0.878 | 0.833 |
LSTM_4_10 | 0.783 | 0.710 | 0.829 | 0.765 |
LSTM_1_10 | 0.850 | 0.821 | 0.875 | 0.847 |
We can conclude by saying that predicting player’s departure is a quite challenging task and the best we can get is estimating the number of players who are about to churn and take measures to counter the problem. We can also see how LSTM, which is a network specialized in temporal series, produces slightly better results than SVM and the value of predictions with C=1 outperforms the ones with C=4. Of course, a dataset collecting more statistics about players’ activities such as quest completed and enemy killed would have surely had a good impact on the overall result. Moreover, we know that each player plays according to a specific behavior as we analyzed in a previous post. Therefore, one possible solution would be dividing the players into four datasets, one for each behavior and make predictions separately taking advantage of their characteristic features. For example, if an explorer, who is a player who enjoys exploring as many maps as possible, should suddenly decrease its number of weekly maps visited, it might be a signal that the player could feel tired about the game, thus he is going to churn.