For a long time, I’ve thought that retweeting was the most interesting thing about Twitter – and not just explicit retweeting, but also implicit retweeting (people posting the same URL around the same time, which may or may not really be an intentional retweet). I’ve thought of them as similar to links in hypertext and like others, I created a site that analyzes relationships among people by looking at their retweeting patterns.
This may sound odd, but making a user action easier for everyone is not always a good idea. An overly simple explanation for this is that the less effort it is to do something, the less significant the action becomes. That doesn’t mean that everything should be hard, it means there’s an optimal level of difficulty v. reward in social behavior. I’d rather not see Twitter encouraging a particular kind of social connection until the structures it supports are better understood. Has anybody really shown the value of retweeting in creating strong social networks? If so, it it clear that the API would tend to further strengthen them? I fear that the API is motivated by a more naive assumption – people are doing this anyway, so let’s make it easier. While that assumption is fine for things like soap, it isn’t right for social behavior.
I’ve been doing social media analytics for a long time. One of the things I’m always trying to measure is how much energy went into a particular user behavior or action. For example, a message that contains more original words took more energy than a shorter one. A message that quotes more than one person takes more energy than one that quotes just one person. A message that contains a URL probably took more energy than one that doesn’t. If the URL is unique in the medium, it probably took more energy to create than a URL that already existed.
If the effect of the retweet API is to make retweeting so simple that the act of retweeting loses much of its significance, that’s a net loss. More people might retweet, but less of them will be deeply engaged. Social systems should never have the goal of getting everyone to the same level of engagement. It is human nature for some to be opinion leaders, but they don’t easily emerge when “playing the game” is made easy for everyone. Unfortunately, the idea of getting as many people as possible to be as active as possible is a deeply engrained habit in the media industry. But any successful community manager or analyst will tell you that it is far more important to pay attention and nurture the “core community” that exists in any social network.
The sweet spot for ease of retweeting lies somewhere between it being so hard that only the most committed users do it (and the current manual method is far better than that) and being so easy that everybody essenially “votes” on everything, which would be bad. Even though that sounds like democracy, it is really demarchy. Seen any successful demarchies? I didn’t think so.
I’m not so sure that Twitter isn’t already in the sweet spot and the API is going to drive it away from there. I suspect that Twitter and those who analyze it haven’t had enough time to really figure out how it will fit into the social networking ecosystem in the long run, so any decsions about this are premature. I’d rather see them continue make the social network easier to analyze, not just for the sake of analytics, but because the results of analytics are getting fed back into the network, which makes the network smarter and smarter.
Tags: API, opinion leaders, retweet, social networks, twitter
“At the end of the day, Twitter is a prototype.” That’s a comment on Dave Winer’s blog by Chuck Shotton, who created one of the first web servers, long before most people had even heard of the Internet. Chuck’s main point is that Twitter is a good idea, but it should be implemented as a distributed system, not a centralized one.
Dead on, Chuck. I’m not in any way faulting Twitter by agreeing with Chuck. There are good reasons that they are succeeding where others have failed at microblogging. It is good that they are demonstrating the broad appeal and usefulness of this kind of communications. The problem, as Chuck nailed it, is that they are centralized. Compare this to blogging, which was designed from the start to be decentralized. There are dozens of blogging platforms that you an run locally, on a rented host or at a site dedicated to hosting blogs. Choices, choices, choices. But if you want to tweet, there’s only one way to do it – Twitter.
One reason Twitter succeeded where others failed is that it has a good API and is extremely open when it comes to sharing data. The default, unlike most other social media companies, is that all of your data is open to everyone, except for direct messages. That’s fairly radical and perhaps more than anything else, has inspired developers to create many, many Twitter applications.
I caught the bug myself, attracted by the volume of data that is easily available. I threw together TwURLed News, not with the idea of building a company around it, but because I wanted to see how well something like it would work. It wasn’t very hard to built, has a back end that requires a BSD machine worth maybe $1,000 and the front end runs on a very low-cost hosting provider. Amazing.
Still, I can’t believe this is the future of microblogging. Instead of running applications that use the Twitter API on our desktops, it seems much more likely that we will end up running something like the Twitter API ourselves, which talks peer-to-peer instead of client-server.
Consider how Twitter and Google have opposing information flow. The Google model is that people publish information on web servers, then Google’s robots gather the data. To access Google, you use a standard web client. In the Twitter world, nothing gets published until and unless it is pushed to Twitter’s servers and a lot of the people who read Twitter-published information do so using custom clients. I guess you can rationalize this by arguing that Twitter is getting its users to do all the work that Google’s robots would otherwise do, but that’s a terrible idea. As Chuck pointed out, it doesn’t scale.
Consider also how different Twitter’s data flow is from blogging. When you post a blog entry, you’re usually also publishing it as an RSS feed. Outfits like Technorati (and Google, of course) send robots out to read those feeds and make them available via the web or newsreaders. People call Twitter microblogging, but instead of encouraging people to tweet locally and make the tweetstream available to anybody who wants to retrieve it from your site, as with RSS, Twitter says no, you have to send your tweets to Twitter and then they become available to the public. The pain of that centralization is already hurting Twitter, as developers complain about being unable to get even a single user’s entire tweet history, about being unable to search more than a few weeks’ data and other limitations.
So, here’s a thought. How about if every Twitter application developer throws off the yoke of centralization and adds local (or hosted, via XML-RPC) RSS publishing as an option? This is relatively simple for desktop apps – it could use the same mechanisms as RSS. It could actually be an RSS feed tagged as a tweetstream, so that anything that reads it will know that no entry will be more than 140 characters, expect hashtags, “@” screen names, etc. Phone apps could use a proxy to do the same while continuing to publish the tweetstream on Twitter.
Imagine the services that could bloom if everybody’s tweetstream were available without haing to rely exclusively on Twitter and its limited resources? In no time at all, we’d see comprehensive indexing and other value-added services.
So, why not? I’m not suggesting anyone abandon Twitter, I’m just saying that microblogging will take off much faster if Twitter developers realize that they don’t have to depend only on Twitter to publish their tweets.
Tags: API, future, RSS, twitter
The person at the next desk to the right (my wife) just pointed out that I haven’t updated my blog in quite a while. Among other things, I’ve been, uh, struggling a bit with Twitter social graph analysis. The new social graph API calls make it easy to get friend and follower relationships, but they return Twitter IDs, not screen names. Getting screen names requires a lot of API calls and parsing of JSON or XML… and so far, that’s pretty slow. I seem to be able to get a few hundred names per minute, which might sound like a lot, but the Twitter social graph is huge. We have Twitter’s openness to blame.
The fact that for the most part, anybody can follow anybody, along with all the auto-followbacks, makes for a very densely connected graph. For example, I follow somewhere over 300 people. They follow several hundred thousand. The graph that shows the follower relationships of me and all the people I follow has about 1.4 million edges. That’s a lot to manipulate. I’ve experimented with removing the people who follow more than 500 or more than 1,000 people, which reduces the graph size considerably, but it is still challenging to analyze on a desktop system.
I think my next blog post will focus on the social graph, why it matters and what direction Twitter might take it. Feel free to ping me with your thoughts; I’m eager to hear them.
I should mention that even as I’m doing all this, I’m still looking for the right “real” job. I’m focusing mainly on product management related to social media and analytics.
Tags: social graph, social network, twitter
I have added a second vertical slice to TwURLed News – web analytics. It is also available on Twitter at @TwURLedNewsWA. That’s the good news.
The bad news is that everything was dead for about 12 hours because my hosting company, Bluehost, shut it down for consuming too many CPU cycles. The culprit was a WordPress plugin that generates XML sitemaps. It was generating an updated sitemap for every post, with a fairly expensive MySQL query each time. No more. The plugin is set to only permit manual updates and I’ll trigger that every few hours, not at every posting. That should also make the site more responsive.
Live and learn.
Tags: sitemaps, twitter, web analytics, Wordpress
I have struggled with language to describe the people whose web page citations appear on Tweetsnet. I started with a simple illustration about influence, the idea that people whose followers have more followers are potentially more influential, that influence is at least a second-order phenomenon. But that doesn’t really describe what I’m doing. I also experimented with the word “perceptive,” thinking that people who regularly are among the first to cite web pages that become popular may not be influential, they might just be good at seeing where things are headed. But that’s not the whole story.
I think I finally found the right phrase when I updated Tweetsnet’s “About” page to say that it looks for people who are “in tune” with what becomes popular. I see Twitter as a platform where people constantly organize themselves into choruses, amplifying the most pleasing melodies, generating and discovering harmonious ideas. As with flocking behavior, these choruses have no single leader, but unlike a flock (as far as I know), some people are clearly more “in tune” than others. Those are the people Tweetsnet seeks to identify – those who most frequently cite web pages that become popular.
I suspect there is a great opportunity in reporting what the choruses, known and discovered, are singing about. In other words, monitoring the buzz in sections of an ecosystem of interacting, overlapping shared-interest communities. This is where I want to take Tweetsnet, generating verticals, starting with known popular subject areas such as social media. I’m sure there’s a lot of thinking and experimentation to be done about the ways we could define the intertwined borders of such choruses. One thing I’m sure about – we need to change the way we tend think about redundancy.
Our left brains tend to think that duplicated effort is inherently wasteful, but the fact is that we are creatures of community. But here’s the most important idea to take away from the “chorus” metaphor: when a bunch of people act similarly in social media (e.g., post the same URL), it is not redundant, it usually adds value. That is deeply contrary to the one-to-many 20th century idea of information distribution, in which achieving stardom, not harmony, was the goal. We still have room for stars, but some of them will be choruses.
Here are some thoughts on features that contribute to Twitter’s “choral value.”
- Retweeting has very choral high value, as it strengthens the “melody” – people’s deliberate arrangement of information into tweets – and the “harmony” – the commonality among Twitter users that goes beyond simply posting the same information.
- UI design that makes retweeting easy is good, as long as it doesn’t encourage people to spew everything.
- Excessive tweeting and retweeting becomes noise – witness the efforts I’ve had to make to remove aggregators from Tweetsnet. The greatest value is added by the “jazz” tweeters, who have a melody and know how to harmonize, but aren’t afraid to improvise. In other words, have a focus, but don’t be a robot about it.
- Anything that shows how much human energy and thought went into a tweet adds value. Anything that makes it easy to tweet will eventually diminish the value. This is why the 140-character limit has added value – even headlines often are bigger, forcing people to think about how to squeeze information.
- Hashtags reduce Twitter’s choral value as ”solos” they do more to discourage than encourage retweeting. If a tweet is already tagged, I think people tend to assume there’s no need to retweet because interested people should be monitoring the hashtag.
- Followering somebody only matters if you take action; the main visible action is retweeting. Blogging about something you found on Twitter would add “choral” value if there were an easy way to discover it.
- Twitter’s APIs make it fairly easy to track user, URL and word usage, which is good data not just for Twitter’s basic features, but for discovering things we didn’t know to look for. It’s great that everything is open by default, unlike most other social networking platforms.
What’s the “choral value” you see in Twitter? What could the company do to further encourage it?
P.S. I’m going to change Tweetsnet’s name to TwURLedNews.
Tags: tweetsnet, twitter
I suppose it is a cliche to say that many useful things have been created unexpectedly, even accidentally. Here in Silicon Valley, that principle often becomes a problem, as highly creative people see a thousand products or services in their creations, but fail to focus enough to create a viable business. I know that disease well because I have to fight it constantly. Right now, however, with Tweetsnet, I’m still in the brainstorming and experimentation phase, when the point is to explore the possibilities. If it gives rise to a business of some sort, that’ll be just fine, but that’s not the point yet.
The bit of unexpected goodness I’ve noticed in Tweetsnet over the last few days is in the tagging. The tags and the tag cloud achieve one of my goals – self-organization – even though I didn’t really plan on it. If I had stopped to think about it, I guess I would have realized it would happen. It all started when I realized that since I’m fetching page titles from popular Twittered URLs, I could also extract any keywords found on those pages. I had to hack a Python WordPress RPC-XML library to support tags, but that was no big deal.
Once those tags were working, I realized that I could treat Twitter hashtags as a special case of tagging. In the Tweetsnet database, tags are identified by source – HTML meta keywords or hashtags. On the Tweetsnet pages, they all look the same.
When that was working, I found myself staring at the “phrases” that I’m capturing from Twitter. Those are two-word phrases extracted via some very simple rules – end of sentence detection, a stopwords list, hashtags and user names excluded and so forth. I noticed that when the same word showed up in more than one of those phrases, it often would be an appropriate tag. And I noticed that existing tag words often showed up in the phrases, so those get added no matter how frequent they occur. Any word that show up in at least three of the phrases is also added as a tag, although I’m not storing them in the database, since they are sometimes a bit odd.
The result is a set of tags and a tag cloud that do a pretty good job of finding articles related to a particular topic. For example, when an article about the rumored GDrive showed up, it was tagged “gdrive,” which I clicked and found two more articles. Cool. That’s why I recently increased the size of the Tweetsnet tag cloud widget.
As you may have noticed, I have added links to sites that are doing things similar to Tweetsnet. One of those, Twitscoop, offers a tag cloud widget, which gave me the idea that perhaps Tweetsnet should do the same. Soon, I hope. That would be in keeping with my idea that one of the secrets to success is to notice when you’ve invented something useful, then package it well.
I would be remiss if I didn’t point out that all this would not have happened if I wasn’t using WordPress as my platform. Although it gets in the way sometimes, the features that come for free, including all the third-party themes and widgets, are terrific. Ditto for Python and all the libraries people write for it.
Tags: self-organizing, tweetsnet, twitter
I knew this day was coming. Sometime in the last few hours, Tweetsnet acquired more Twitter followers than my personal Twitter account. I have 232 and Tweetsnet is now at 256 – and climbing faster than I am.
I’m happy that there’s increasing evidence that Tweetsnet is useful. On the other hand, what a strange world this is, in which I can create an automated information source that seems, by one metric, to be more popular than I am. It seems impersonal and perhaps just plain silly… until I consider that we are creating a world in which increasingly intelligent robots will interact not just with us, but with each other, which will make them (a) stupider, because they will have to deal with rapidly increasing amounts of data and (b) smarter, because we will figure out how to make them take advantage of all that data.
If you’ve been following Tweetsnet or this blog for the last few days, you know that my No. 1 strategic problem (as opposed to various little bugs) is the fact that aggregators – other robots – tend to score quite high in the rankings. An idealistic part of me wants every Twitter account to self-identify as robot or human… but I know that there’s no hope of compliance with anything like that. I’m actually more intrigued by the notion that value will arise from writing code that guesses whether or not a user is a robot. Web analytics has the same problem because some web robots and spiders masquerade as ordinary web browsers. I spent a lot of time on this problem at LiveWorld, where some of our customers were not too eager to pay for robot page views at the same rate as human page views.
The cool thing about the challenge of distinguishing bots from humans is that we’re essentially collaborating and competing on Turing tests. People are designing bots to gain influence in the Internet’s social networks, in competition with people who want to filter them out. As long as bots are dumber than people (and they will be for a long time), this competition will persist and it will drive collaborations that make software smarter. When we reach the singularity, it will stop mattering… or perhaps it will completely flip, so that the people who were trying to decrease the influence of stupid bots will focus on decreasing the influence of those stupid humans. Or perhaps it will be a happy collaboration.
Tweetsnet gained its first bunch of followers by following everybody who cited a URL that made it into the feed. A lot of those people automatically followed it in return. The recent big spike appears to be driven by the fact that a few Twitter users are now retweeting Tweetsnet items. That’s a kindness, really, because there’s no reason for them to do so. They could retweet one of the original tweets.
I imagine that one reason they give Tweetsnet the credit, so to speak, is that Tweetsnet doesn’t try to drive traffic to itself. When it posts a tweet, the links in that tweet point directly to the original site, not back to the posting on Tweetsnet. I get annoyed by tweets that point me to somebody’s site that does nothing more (for me) than provide a link to the site the tweet was really about.
Meanwhile, today’s project is to keep other peoples’ robots out of the Tweetsnet scoring – because they are stupid. The robots, I mean, the robots.
Tags: robots, Turing test, tweetsnet, twitter
I’m exploring the Twitter data I’ve gathered over the last few weeks, which is designed to uncover patterns of URL citations, which I believe is one of the service’s most powerful uses. As I have written, I’m looking at Twitter as a massively parallel self-organizing point-of-view system. In other words, my premise is that by posting URLs to Twitter, people are saying that they found a web page to be interesting and valuable.
Today, I’m looking at “centrality,” a typical social network metric. I am interested in degree centrality, which looks at how many connections a person has, which shows who the key players are. I’m considering two people to be connected if they cited the same URL in the same time frame, regardless of whether or not one was an explicit retweet of the other. Later, I’ll probably weight the connections with explicit retweet and other data. For now, I want to see if follower count, a far simpler metric than centrality, would work just as well. Here is a log-log scatterplot of degree centrality v. follower count.

Follower count v. degree centrality
The data points are scattered all over the place, which means that follower count does not correlate to the connections revealed by citing the same URLs. I’m not surprised, given all the games people play to get followers, the robots and such that have little or any human thought behind them.
As a reality check, let’s look at a similar plot that compares follower count to user mentions. I would expect that people who have a lot of followers will be mentioned (in the form of @screen name, in a reply, retweet or any other context) more often. Here’s the graph.

Followers v. mentions
Bear in mind that my data gatherer is biased toward people who cite a lot of URLs, so when I say count mentions, those are mentions by people who tend to cite a lot of URLs in their posts. As you can see, although there are many outliers, there is an obvious trend upward and to the right, which indicates a positive correlation – people with a lot of followers indeed do tend to be mentioned a lot. The upper left area is almost empty because it is hard to get any mentions when you don’t have any followers. On the other hand, you can have lots of followers and few mentions, which is why the there are more points toward the lower right.
Outliers are often interesting and I find myself wondering who is getting a lot of mentions even though they have very few followers. The dot closest to the upper left corner is MsTweet, who is a “customer service evangelist for Mr.Tweet” and therefore doesn’t follow much of anyone, but gets mentioned a lot. In the upper right border area, with lots of followers and mentions, are Shorty Awards, Chris Brogan, Guy Kawasaki, and ReTweetTrends (in the center of the top, not following nearly as many as the others). The lower right corner outliers are people who are heavily followed, but rarely mentioned by people who cite URLs. They include Kevin Rose, Jason Calacanis, Veronica and iJustine. I’m surprised, actually, that these folks’ huge followings apparently either aren’t mentioning them often or aren’t often citing URLs. Let’s reality-check that with Twitter search.
I’ll search on each of their user names, then repeat the search with their name and “http,” which will give a rough comparison of all mentions v. mentions with URLs in them. Twitter’s search doesn’t give a result count, so it’s pretty hard to tell. All I can go by is the frequency of recent tweets. Let’s compare it to somebody who is mentioned a lot – Chris Brogan. He is definitely getting a lot more frequent mentions in conjunction with URLs, so at first glance, the data seems believable.
Perhaps this indicates that the people with big followings yet few mentions have a different kind of influence. People like Chris and Guy seem to be leading others to look outside of Twitter, while Kevin, Jason, Veronica and Justine have some other, perhaps more Twitter-centric influence. Is it safe to say that the latter group is more engaged with Twitter for its own sake?
It seems that some of the popular Twitterers are leading their followers mostly into Twitter navel-gazing, while others are leading people beyond what Twitter itself has to offer. I find myself wondering how this might change as Twitter matures… and wondering if perhaps the navel-gazers are newer to Twitter and will get bored faster. I’m gathering more of the user information now, so I should be able to compare the average number of days they have been using it. In any event, from a business standpoint, I think I know which kind of leader I’d be more interested in.
Tags: social network analysis, twitter
Time to step back and consider what I’m doing with Twitter code and data.
Background: For the last two weeks, I have been writing code to find interesting URLs being cited in Twitter posts, or tweets.* I now have a database of about 100,000 Twitter users (I will not call them/us Tweeple!) who have cited 40,000 URLs and more than 200,000 two-word phrases that accompanied those URLs. The URLs have been mentioned 230,000 times (2.3 times per URL) and the phrases have been mentioned 330,000 times (1.6 time per phrase). I have gathered all of this data via the Twitter APIs within their constraint of making no more than 100 requests per hour. The primary public output of this work has been the Hot Twitter Cites list.
This morning, I’m going to try to take off my engineer hat, put on my product manager coat and consider what problem this could help solve and how to package it to meet that need. In other words, I’m going to try and extract some focus from my brainstorming. I’ll start by describing the data a bit.
Here’s a graph that shows how many people cited each URL for a one-day period. A few URLs are cited many times, but the vast majority only pick up a handful of cites – this graph shows a very long tail.

Users per citation
The pattern of citations per user has more depth. In other words, this also has a long tail, but a fatter, uh, body. This is good because it means that there are a lot of people citing URLs. More people means more points of view.

Citations per user
I’d be happier to see a greater variety of URLs being cited, but I’m not going to argue with the data… and Iwould expect (and hope) that the variety of cited URLs will rise as Twitter attracts a more diverse user base.
I’m generating a score for each user, based on how early they cite a URL that becomes popular. The URLs listed in the hot cites page are chosen partly because they were cited by people who tended to cite popular URLs in the past. I want to be sure that this isn’t redundant to how many people follow them. If it is, then there’s no point in doing all these calculations, I could just watch for the URLs cited by the people with the most followers. Here is a log-log scatterplot of my scoring v. follower counts.

Score v. follower count (log-log)
This is good. If the two data sets had a linear or power law relationship, the dots in the scatterplot would be clustered around a line. They are obviously not, which means that whatever I’m calculating, it is substantially different from ranking based on how many followers the citing user has. I’d like to see a comparison between my score and each user’s follower/followers (a/k/a friend/follower) ratio, but I’ve just started gathering the “follows” (friends) numbers.
Still, I’m not surprised. Follower relationships on Twitter do not imply significant connections between people, for several reasons:
- Many popular Twitter “users” are not people at all. They are aggregators, robots that spew, I mean stream, headlines.
- Twitter celebrities (I really will not say Twitterarati!) have far too many followers to have a significant relationship with most of them.
- People follow others on Twitter simply to induce the others to follow them to create the appearance of popularity.
(This topic itself is fairly popular, as demonstrated by the fact that The 10 Users You’ll Meet on Twitter was cited by 120 people in the last few days, which puts it in the top 2 percent of URLs cited.)
More to come as I have time.
* In a strange coincidence, around the same time I started this, I added to my office a clock that tweets a bird call for each hour. My wife made me take out the tweeter batteries. Mute birdies are staring at me.
Tags: twitter
I thought I’d see which Twitter users are scoring the highest in terms of posting URLs that become popular. My code gives them points based on how early they posted and how popular the URL becomes. I suppose it should not have surprised me to find that most of the high scoring users are not real people, but aggregators that feed tons of URLs.
Who is it that says that web analytics data is always messy? Whoever it is, right you are! Since a fundamental goal of the work I’m doing is to uncover interesting points of view, I need to downgrade sources that aren’t behaving as though they really have a point of view (or at least an intelligent one). I can tell instantly that I’m almost certainly looking at an automated system when I see that the “user” in question follows zero or very few people. That’s grounds for immediately downgrading. I’m not sure if I want to downgrade based on the volume of postings. Certainly beyond a believable number… and perhaps if every single post contains a URL.
Here are the top 20 sources from the last week or so, based on the criteria I described above.
- Net2 (878)
- techupdates (706)
- OriginalSignal (587)
- radi8 (565)
- Dakshinamurti (542)
- GaryTheGeek (453)
- techupdate (449)
- haripakorss (436)
- readmashcrunch (392)
- twittfeed (379)
- TwitLinksRSS (359)
- top_post (342)
- tclauss (329)
- TechFeed (303)
- tc2tw (300)
- vcsangels (295)
- dlbrown06 (287)
- davidsim (279)
- mashable (272)
- ReTweetTrends (268)
- balduaashish (268)
- wiredgnome (264)
- julieti (259)
- TechRSS (248)
- davekresta_rss (246)
Tags: social network analysis, twitter