Throughout Euro 2016, I ran a Python script capturing every tweet that mentioned the tournament hashtag “#Euro2016”, or any team hashflag such as “#ENG” or “#ISL”.
The script captured over 13 million tweets, which I’m making available to the data visualisation and analytics communities to play with. If you want to try messing about with a big text dataset, download and go wild!
The dataset includes every game from Saturday 11th June up to the final and is provided raw, with no filters imposed. It is missing France’s opener played on the evening of the 10th due to a technical hiccup and has very occasional short breaks where there was an interruption to the Twitter streaming API connection.
Download the file here. It’s a gzipped text file, rows end with LF and columns are semicolon delimited. I’ve pasted a python 3.x code snippet below that will get you started loading the data if you need it.
import gzip f = gzip.open('Euro2016_Tweets.txt.gz', 'r') for line in f: line_in = line.decode().split(';') #Split string to list on semicolon line_in = [x.rstrip() for x in line_in] #strip whitespace from the right of each element #Now push the line to a database table or whatever else you want to do with it. f.close()
[created_at] ,[text] ,[source] ,[screen_name] ,[link] ,[follower_count] ,[user_created_at] ,[geo_enabled] ,[timestamp_ms] ,[geo_link] ,[geo_type] ,[place_name] ,[country_code] ,[country] ,[point_type] ,[latitude] ,[longitude] ,[tweet_time] ,[YEAR] ,[MONTH] ,[DAY] ,[tweet_date] ,[tweet_date_time_BST]
Please do tweet me if you build anything! I’d love to see it.