

So far in the brief history of web analytics, people have depended on page views, visits, visitors and conversion – and altogether too many variations on the same. Not that there’s anything wrong with those metrics, but imagining that we have a complete view of a site from those numbers is like deciding to do brain surgery based on blood pressure, temperature and eye color. Most people who produce or consume web analytics are happy to forget that a social medium is driven by the way that people are interacting with each other far more than by the way that people are interacting with computers… and all media are social. It is easier to forget that the only really new thing about today’s social media is that some of the social interaction moved on-line. But hey, it is easier to measure peoples’ interactions with computers than with each other.
I think of my methods of analyzing social media in three categories:
Traffic analysis includes today’s typical web analytics and a bit more. Beyond the basics, it is the idea of measuring hidden events by analyzing visible patterns they cause. This is like figuring out how big a rock tossed into a pond is by measuring the waves it produces. A simple social example is to look for correlations between the number of people participating in a discussion and who is participating. If you find that there are certain people whose presence correlates to higher activity, you might infer that they are provoking greater participation. Or the cause might be the other way around – certain people only enter the fray when things get hot.
Social network analysis assumes, reasonably, that groups of people interacting on-line organize themselves into roles such as opinion leaders, connectors, lurkers and so forth, and seeks to identify who’s who in the network. Law enforcement and intelligence agencies have used these techniques to solve problems such as figuring out whose telephones to tap. If there are 200 people in a crime syndicate, but resources only allow you to tap 10 phones, how do you choose which 10? One answer is to find the 10 people who, as a group, talk to the greatest number of the 200. That way, odds are you’ll hear a bit of what everybody is talking about. What’s more, those 10 are likely to be highly influential.
Linguistic/text analysis aims to figure out what people are talking about and their attitudes (sentiment) toward those topics. This is the hardest kind of analysis because language is complex and ambiguous and computers are extremely stupid when it comes to language. However, just looking at how language changes over time – which words or phrases are growing in popularity – can reveal quite a bit.
I don’t use these three approaches in isolation. The language part is so challenging that I have relied on traffic and social network analysis to narrow down the interesting people to a handful, which makes linguistic and text analysis computationally simpler. That is really no different from the telephone tapping example, except that law enforcement generally still relies on humans to figure out what is really going on. Computers can recognize words and phrases increasingly well, but even at Fort Meade, I’m fairly sure they still need people to truly make sense of language.
Tags: linguistic analysis, social networks, text analysis, traffic analysis







