We analyzed 35,000 Steam Video Game Reviews to learn exactly what players think about video games, and determining wider player sentiment and regards towards certain genres
Steam reviews can either make or break a video game publisher, with certain notorious video games like Battlefield 2042 and ATLAS achieving a large number of negative reviews due to botched releases or overly aggressive microtransactions.
Having such a large active community of players, Steam offer a wealth of data generated from their reviews and post recommendations.
By using natural language processing techniques, we have extract insights from this data, and can identify what are the reviews that are most similar to each other, as well as check if there is a match with the reviewed games.
By analyzing the data within positive video game reviews, we found that:
- There were 378 reviews that were clustered together which referenced co-operative video games.
- The most positive and popular recommendations within a two clusters of 351 & 323 reviews, specifically praised the gameplay.
- 129 positive reviews were made praising video games as being fun to play.
Whilst the above positive reviews offered those takeaways, the really interesting data insights came from analyzing the negative reviews.
We took a look at the most common causes for negative reviews in video games and discovered the following:
- 204 negative reviews that were made in relation to bad game play or poor game play quality.
- In 193 reviews, users complained about being banned by Steam.
- 173 negative reviews were made with reference to “mods”.
- In some games, users can add their own content by adding modifications.
- These reviews were related to mods causing bugs that made their game crash or become unstable.
- 159 users complained about experiencing lag and latency issues within multiplayer video games.
- 153 negative reviews blamed games for crashing and glitching that rendered them unplayable or disruptive.
- In two different clusters that contained 147 & 142 each, negative reviews were posted, voicing complaints regarding hackers, which is common in online multiplayer games.
Technical Write up
By using the all-MiniLM-L6-v2 encoder, 35000 reviews on different games have been converted to 786-dimensional vectors. Afterward, a clustering algorithm (K-means) has been applied to check which reviews were most similar to each other with a total of 240 clusters.
Because there were other features in correspondence with every review, such as the name of the game reviewed, the total number of hours spent by the player on the game, and the kind of review, we have been using this extra data to exploit the best from the clustering Relevance AI application to show additional data.
Each cluster shows what are the representative reviews as well as the total and average amount of hours spent by all the players in the cluster playing those games.
In the third column, we can see what are the games that have been reviewed in that particular cluster.
The Steam Reviews Dataset was downloaded from Kaggle which contained a total of 380.000 reviews scraped from the Steam marketplace.
To clean the dataset, I saved the data regarding each user’s play stats and reviews for every played game, then I selected the reviews for the encoding.
Because the dataset was quite big, I only used a sample of 35.000 games with a random selection to ensure sufficient variance in the distribution of reviewed games.
Demo application link:
View the Clustering results here – https://cloud.relevance.ai/dataset/steam_reviews_35k_zeroshot/deploy/cluster/517b37aee3ebc3920275/bHcwVEgzOEJGd0pIb0s3SUZRYWw6RzIzU042cFpUU2loM0NUOUU5aFcwQQ/QDUKH38BQZpB7BTDVb_v/us-east-1/?sort=sum_Not Recommended&order=desc
Colab link to create the app here – https://drive.google.com/file/d/11EAQN_xYhIBmjU0ItAs5hxvW0AlqVe92/view?usp=sharing
Cleaned dataset source available here – https://github.com/RelevanceAI/michelangiolo_experiments_repo/tree/main/220119_automatic_tweet_scraper/apps/steam_reviews_35k_zeroshot