|Photo: Mark Gibbs|
So the UK has just given itself a national headache. Whether you think the Brexit was the right decision or a dangerous and unmitigated screw-up (as I do), the consequences of the referendum will be non-trivial and take years to complete. But the mechanics of the UK exiting the European Union aside, the question of how people now feel about the Brexit is interesting. Are they awash in jubilation or has buyer’s remorse set in? An intriguing post by MonkeyLearn attempts to answer this question by analyzing tweets and, as a bonus, provides tools that you might well find useful for similar exercises.
First, let me explain what MonkeyLearn is: The service defines itself as a “[highly] scalable Machine Learning API to automate text classification.” To use MonkeyLearn you assemble your text data, train and test a machine learning model with that data, then, using a custom API for your model, have your application code interact with the API to perform analysis and classification of new data.
You can also provide your data to MonkeyLearn by pasting it into their Web interface or uploading CSV files or Excel spreadsheets.
The beauty of MonkeyLearn’s service is that you don’t need to know much at all about the mechanics of machine learning although there’s still some technology to master to get the best out of the service. Interestingly, you don’t even need to have training data available to create classifiers as MonkeyLearn has more than 100 pre-built classifiers for functions such as classifying retail products from their descriptions, English tweet and product review sentiment analysis, and keyword and entity extraction.
First, we used a python library called tweepy to connect to the Twitter stream and get more than 450,000 tweets that used the hashtag #Brexit.
Afterwards, we filtered these tweets by language using our language classifier and kept only those that were in English (around 250,000 tweets). Then, we analyzed these tweets using MonkeyLearn with some public, pre-trained and ready-to-use machine learning models. We performed sentiment analysis on these tweets to understand if people talking were talking positively, negatively or neutrally about the brexit.
Finally, we wanted to go a step deeper and better understand the different point of views, so we performed keyword extraction on the tweets of the different sentiments we analyzed to know the words or phrases people were using to get a better picture and more context.
It's important to note that the tweets collected were a random sample expressing the sentiments of the Twitter universe rather than just those of people in the UK but the results were interesting all the same. MonkeyLearn found that from a final sample of 133,605 tweets, 47% were classified as positive with (natch) 53% negative which is extremely close to the actual final UK voting results of 48% for and 52% against. If you wanted to specifically measure UK sentiment, you'd have to restrict the analysis to tweets with attached geolocation data.