Twitter has a lively (and occasionally argumentative) football analytics community, encompassing a wide variety of amateur analysts, professionals, coaches, journalists, scouts and pundits. Today’s post is a new version of my visualisation of the community and the connected individuals within it.
James Yorke dubbed the first version of this project the “Ego Viz”, which I rather like. Now it’s back, refreshed with up to date data and a few new tricks to help you find people to follow in football analytics.
Starting from here, I used Python to grab a list of followers and following for each of over 200 Twitter users. The Twitter list I’m using as a seed has been populated manually over the past few years and for a while you could also get onto it simply by tweeting the hashtag #footballanalysis. Unfortunately, that led to a load of porn bots being added, so I had to turn off the automatic add feature. Who knew porn bots liked Expected Goals so much?
If you’re not on the original list, don’t worry. As long as you’re followed by at least five people who are, you’re in the data set I used to draw the vis.
I do limit the number of followers to the first 10,000 to keep the data manageable.
After running the Python script for a few days, I ended up with a list of 938,000 unique follower / following relationships. Then it was time to break out the fantastic open source software ‘Gephi‘ and draw this.
(click the image for bigger)
Gephi groups together Twitter accounts that have closer follower / following relationships, so if a group of users all follow each other, they’ll end up close together.
We can also use Gephi to colour communities within the network. The software identifies five sub-communities, which give the nodes their different colours.
The segmentation algorithm is automatic, but we can work out afterwards what it seems to have found. I think we’ve got…
Blue: Public analysts
I won’t say ‘fanalyst’ (because I hate that word), or ‘amateur’, because some of these people are professional football statisticians, but they all publish public work.
Orange: (mostly) Americans
‘International’ might be more accurate, but most of the large accounts here are USA based.
Pink: The mainstream
Dark Green: Professional analysts
Light green: Coaching
People who tweet more about coaching and tactics than statistical analysis.
The size of each user’s node (circle) is determined by the number of people within the community, who follow them. If it was simple total number of Twitter followers, @barackobama and @kanyewest would dominate pretty much any Twitter community you chose to analyse!
Would you like an interactive version to play with? I’m sure you would. Click here and give it a minute to load – it’s quite a large page.
(click the image for interactive)
What else can we do with this data? Well how about a browser to help you find new accounts to follow? Gephi provides scores for ‘authority’ and ‘eigenvector centrality’, which are measures of a user’s importance to the network. Being followed by a lot of people gives you a higher score and being followed by accounts that are also well followed, boosts your score even more. It’s similar to the way that Google ranks web pages – lots of high quality inbound links gets you a better rank.
Having higher quality followers is more important to these rankings than having lots of followers (although both matter). Also keep in mind the original Twitter list that was used as a seed. I built the list so it naturally has elements of my bias, although I do try to add a wide variety of accounts.
In short… don’t moan, it’s just for fun!
Here’s what Eigenvector Centrality looks like if we use it to shade the network.
I’ve been asked in the past whether these visualisations can be built for other communities. Absolutely they can!
If you would like to talk about creating one for your own company, conference or interest area, please contact me. Images can be customised, use any colour palette you like and be exported at massive sizes for printing onto posters.