26 Jan

What are these knots of string doing all over your blog, Katy?

For the last 2 weeks, I’ve been posting a lot of images on my blog, but I haven’t taken the time to explain what or why or how these are happening.

Azerbaijan, one of the countries that I study, is experiencing some turmoil right now. Briefly, a soldier was killed in the military as a result of hazing. There was a coverup, but it was found out and photos of the soldier’s body came out on social media. As a result, a demonstration was organized, mainly on Facebook, to protest this sort of thing occurring in the military. (It is not uncommon.)

On January 12, the protest occurred.

At first I was involved in social media as I normally am – retweeting, sharing stories, etc. because I have a general interest in democracy and technology in this country. But then I had an idea to analyze the tweets. This seemed especially important to me because there was a bit of a battle occurring on the main event hashtag #protestbaku between pro-government and more democratically-inclined social media users.

So, thanks to Marc Smith, the first social network analysis of the #protestbaku hashtag was created about 3 hours after the protest started using the NodeXL program.

What is a social network analysis? Via Wikipedia:

Social network analysis (SNA) is the methodical analysis of social networks. Social network analysis views social relationships in terms of network theory, consisting of nodes (representing individual actors within the network) and ties (which represent relationships between the individuals, such as friendship, kinship, organizational position, sexual relationships, etc.) These networks are often depicted in a social network diagram, where nodes are represented as points and ties are represented as lines.

And NodeXL is a free tool that works with Microsoft Excel to create interactive network visualizations. It is fairly easy to use once you get used to it.

So with this program, you can see who follows whom on Twitter, who replies to each other, etc. And then it shows this all visually.

After the first January 12 protest, I did a new analysis of all the tweets, then again after the weekend was over. Then a week later I ran the analysis again and again.

The pro-government social media users started a counter-hashtag to shame a journalist. I noticed that there were a lot of strange twitter accounts associated with that hashtag. Analysis of that is here and here. I’m not going to leap to any conclusions, but please read for yourself.

Then on January 23, a riot began in a regional city Ismayilli. There was much tweeting about it, mainly from people not on the ground. But once again I did analysis of the tweets. I also made a graphic of the changing dynamics of the hashtag.

Then on January 26, another protest was organized in Baku and once again the #protestbaku hashtag was used. I kept all the analysis on one page here.


So why am I doing this?

  • I am a social scientist. I like seeing patterns in things and I believe that this sort of modeling can add to understanding.
  • I have the resources (time, computing power, skills) to do this.
  • I like making analyses accessible to people that don’t have the skills that I do.
  • I believe that information (to some extent) should be free. Moreover, I imagine that people in power have tools to understand networks like this and giving this information to everyone is more egalitarian.
  • This information (social media data) is already out there in the world, just not organized in this way.
  • I believe in freedom of expression. I am deeply sad that there is little freedom of expression in Azerbaijan.
  • If this analysis can be a tool for those supporting freedom of expression, that gives me a great deal of joy. I hope that it is not also being used as a tool of suppression, but that is the price one pays for transparency and openness.
  • It is possible that at some point I will write up some of this in the form of an academic article.
  • I’ve received a lot of positive feedback from those involved in these events that this analysis has been useful to them. It isn’t often that this sort of thing can have an immediate application, so this is really cool.

In the meantime, I am happy to answer any questions about this.


18 Jan

#khadijautan update

My original post is here.

It has been 2 days since I ran my first analysis, so here’s an update.

The groups this time are now MUCH clearer to me. As you can see there are 3 groups – groups 1 and 3 are anti-Khadija tweeters and group 2 are anti-anti-Khadija tweeters.


Link to full analysis

150 people used the #khadijautan hashtag with 2527 total tweets.

So, now to the other issue… what was going on with all of these repeated tweets. Out of 1245 tweets, VERY FEW were original.

In Excel, I sorted all the tweets alphabetically. I then also used conditional formatting in Excel to turn duplicate tweets red. (Also of note, if a URL shortener was used, the tweets don’t look like duplicates because they have different URL shortenings – but I hand-coded those.) (You can easily download the Excel file here and look at it yourself.)

Yes, the same accounts tended to be the ones that were writing the same tweets. There are many examples of this. In fact, the majority of the hashtag was this kind of tweet.

I’m not making any conclusions, but I wanted to point out that a lot of the same people are posting the exact same tweet and that is strange. These accounts seem to be “real” in that these people have since tweeted other things unrelated to the #khadijautan hashtag.

But then I saw a strange pattern – these repeated tweets were all posted a few minutes apart.

time6 time5 time4  time2 time

If I was a gambler, I’d say that either 1 person was logging into multiple twitter accounts or some sort of program was used.

As always, I am happy to answer questions about this.


Here’s an example of a couple of Twitter accounts where the same two tweets are posted in a row by multiple accounts.



16 Jan

#khadijautan – something is strange here

In a reaction to #protestbaku, a number of Azerbaijani tweeters, especially those associated with the pro-government youth organization, began a Twitter campaign called #khadijautan. This translates to “shame on you Khadija.” Who is Khadija? She is a journalist with Radio Free Europe, known for her investigations of government corruption. (More on Khadija here). What was so shameful? She said “there is a need for mothers in this country who don’t bargain over their son’s dead bodies”[…].” More on this on Arzu Geybulla’s blog.

And those using the hashtag think that they succeeded: “With 22K people engaged, apx 200K impressions #KhadijaUtan campaign succeeded. #azerbaijan #protestBaku.”

When I was doing analysis of #protestbaku, I saw that a lot of the Twitter accounts using #khadijautan didn’t have a photo associated with them. That is sort of odd, right? Most people put a picture on their Twitter account.

[Here is a tl;dr:”Turns out… “successful” #khadijautan hashtag campaign was mainly executed by a cyber-zombie army of tweeters that had 1. No profile photo 2. No followers 3. Didn’t send any tweets before this campaign 4. They wrote the same message over and over again. Read this article only if you have basic knowledge of how twitter works and statistics.”]

Only 126 people used this hashtag but they tweeted using it 2198 times (that includes 557 retweets), so it was fairly easy to do analysis on this.

I looked a little closer to my social network analysis map and saw that those Tweeters without photos also tended to not have a lot of friends on Twitter. That’s also a little odd.

So I took a closer look. Link to the full report here.

To understand the following, let’s have a little refresher of high school statistics:

Average or mean = equal to the sum of the values divided by the number of values
Standard deviation = standard deviation shows how much variation or “dispersion” exists from the average (mean, or expected value). A low standard deviation indicates that the data points tend to be very close to the mean; high standard deviation indicates that the data points are spread out over a large range of values.
Mode = value that appears most often in a set of data

Here’s the distribution of followers and following for the people on this hashtag. You can see that the vast majority of those tweeting with #khadijautan don’t have very many other people that they follow or that follow them. The average number of people that #khadijautan Tweeters follow is 194, the mode is 122, standard deviation of 242. This means that even though some people follow a lot of people, most don’t.

The number of followers for one of these #khadijautan was on average 371, and the mode was 7, standard deviation was 562. Again, while some of these people have a lot of followers, most don’t.


And it looked like the users of the #khadijautan hashtag didn’t Tweet a lot.


The number of tweets for these people is average 5168, mode 45, standard deviation of 11715. Again, a lot of people that don’t tweet a lot were on this hashtag.

Then I sorted the Tweeters by the date that they joined Twitter. 14 of them joined Twitter in the last few days. That isn’t that many.

Here’s the distribution of when these people joined Twitter. As you can see, a lot of them joined recently.


This is unlike most hashtag analyses. It is odd.

But let’s look at the groups – this is essential to understanding what is going on.


Group 2, for example, are mostly people fighting AGAINST this hashtag (full disclosure, this includes myself).

Group 3 includes individuals that are regular Tweeters from the pro-government opposition group.

Groups 4 and 5 seems really strange to me. I’m not sure what’s going on there. They look like tweet aggregators.

So let’s talk about Group 1 then. The top tweeters are all in the middle, but look at all the accounts that don’t have profile pictures (the blue circles). (This is also the case for Group 3, but not as heavily.) There were 41 people in Group 1 and 34 people in Group 3. That isn’t a lot. They all follow each other. Not many people saw their hashtag.



Okay, the content of the Tweets. What were people saying on the #khadijautan hashtag?

(I’m going to summarize this, but you can download the whole file here if you want to look at it yourself.)

What ended up surprising me is that a lot of Tweets from these “no profile photo” accounts were basically the same statement over and over again. Not retweets, per se, but just the same statement.

For example, this: “Стремящаяся вести свои политические игры, пользуясь смертью невинного солдата #Khadijautan #aztwi” was said 27 times by 21 different “no profile photo” accounts. This seems really strange to me.

Or this tweet: “X.İsmayıl bazarlıq statusunun Samirə Qubadovaya aid olmadığını dedi. #KhadijaUtan kampaniyası məqsədinə çatmışdır! http://t.co/xCEM83kz” was said 18 times by many of the same people that were tweeting repeatedly in other cases too AND don’t have profile photos.

This strange behavior happened a LOT.

I think that it is fair to say that there is some sort of robot set up to do these tweets.

I welcome questions on this and encourage people to open the file and look for themselves.