msgbartop
Social media analytics for decision-making
msgbarbottom

30 Jan 09 Tweetsnet tags: Surprisingly useful

I suppose it is a cliche to say that many useful things have been created unexpectedly, even accidentally.  Here in Silicon Valley, that principle often becomes a problem, as highly creative people see a thousand products or services in their creations, but fail to focus enough to create a viable business.  I know that disease well because I have to fight it constantly.  Right now, however, with Tweetsnet, I’m still in the brainstorming and experimentation phase, when the point is to explore the possibilities.  If it gives rise to a business of some sort, that’ll be just fine, but that’s not the point yet.

The bit of unexpected goodness I’ve noticed in Tweetsnet over the last few days is in the tagging.  The tags and the tag cloud achieve one of my goals – self-organization – even though I didn’t really plan on it.  If I had stopped to think about it, I guess I would have realized it would happen.  It all started when I realized that since I’m fetching page titles from popular Twittered URLs, I could also extract any keywords found on those pages.  I had to hack a Python WordPress RPC-XML library to support tags, but that was no big deal. 

Once those tags were working, I realized that I could treat Twitter hashtags as a special case of tagging.  In the Tweetsnet database, tags are identified by source – HTML meta keywords or hashtags.  On the Tweetsnet pages, they all look the same.

When that was working, I found myself staring at the “phrases” that I’m capturing from Twitter.  Those are two-word phrases extracted via some very simple rules – end of sentence detection, a stopwords list, hashtags and user names excluded and so forth.  I noticed that when the same word showed up in more than one of those phrases, it often would be an appropriate tag.  And I noticed that existing tag words often showed up in the phrases, so those get added no matter how frequent they occur.  Any word that show up in at least three of the phrases is also added as a tag, although I’m not storing them in the database, since they are sometimes a bit odd.

The result is a set of tags and a tag cloud that do a pretty good job of finding articles related to a particular topic.  For example, when an article about the rumored GDrive showed up, it was tagged “gdrive,” which I clicked and found two more articles.  Cool.  That’s why I recently increased the size of the Tweetsnet tag cloud widget.

As you may have noticed, I have added links to sites that are doing things similar to Tweetsnet.  One of those, Twitscoop, offers a tag cloud widget, which gave me the idea that perhaps Tweetsnet should do the same.  Soon, I hope.  That would be in keeping with my idea that one of the secrets to success is to notice when you’ve invented something useful, then package it well.

I would be remiss if I didn’t point out that all this would not have happened if I wasn’t using WordPress as my platform.  Although it gets in the way sometimes, the features that come for free, including all the third-party themes and widgets, are terrific.  Ditto for Python and all the libraries people write for it.

Tags: , ,

26 Jan 09 I think I have developed a Twitter aggregator finder

Blocking aggregators from Tweetsnet is turning in to a whack-a-mole game.  As I block them, others pop up – and some of them are brand-new accounts.  This means I’ll have to prioritize creating an algorithm to block them rapidly.  Weird stuff pops into the feed, like a series of wrestling-related posts.

Meanwhile, I’m working on a couple of experimental vertical feeds, on web analytics and social media, since I’m interested in those subjects and I suspect they get a fair bit of attention from Twitter users.

Looking at the most common phrases that Tweetsnet is finding, it seems that the most talked about subjects are the generally popular ones – President Obama, the Super Bowl, peanut butter, Steve Jobs and “Slumdog Millionaire.”  But that may be becuase I’ve only recently pared down the aggregators.  We’ll see what develops.

It seems to me that Twitter is related to blogs and search engines as radio has been related to newspapers.  Radio was usually first to cover breaking news; newspapers covered the same events in greater depth.  I’m using past tense because radio and newspapers are changing these days, already fairly different from when I worked in those businesses.

25 Jan 09 Tweetsnet progress

Still debugging Tweetsnet… found a problem in scoring, which was limiting the variety of people whose cites could score anything at all.  That’s fixed, along with more minor items that I’m finding as I review the code.

I have added Twitter Trends as a source for discovering more URL citations.  The system periodically grabs the Trends list (had to learn a little JSON to do that) and combines the phrases with “http” to find trending tweets that also have URLs in them.

The changes I made earlier today have greatly slowed down the number of published items, as the system builds up information about non-aggregator sources.

25 Jan 09 The problem of robots

A handful of robotic news aggregators have taken over TweetsNet… Twitter users that spew volumes of URLs an hour, which makes them appear to be on top of whatever is new.  I’m planning to exclude them by algorithm (just the raw number of tweets is a good clue, but they also usually follow few people), but for now I’m excluding the big ones manually.

The idea of Tweetsnet is to leverage smart people, not dumb robots.

24 Jan 09 You can now follow Tweetsnet on Twitter

You can now receive the stream of Tweetsnet postings by following @Tweetsnet on Twitter. Still having some weirdness with document titles sometimes… still working on it.

@Tweetsnet is also now automatically following anybody who posted a URL that it published. We’ll see how long that works…

11 Jan 09 Most-cited sources on Twitter: frequency v. reach

I have extracted domain names from the URLs that I track on Twitter. Below is a table that shows how many citations and how many unique citing users there are for the top 25 domains since midnight last night.  The numbers are quite different.  For example, Engadget and Digg are cited quite a bit – high frequency – but by relatively few people – low reach. ReadWriteWeb, Mashable and TechCrunch seem to do the best job of achieving frequency and reach.

Here are the top 25 ordered by the number of people who cited pages from each source.

domain cites users
www.readwriteweb.com 263 216
mashable.com 210 189
www.techcrunch.com 274 162
news.cnet.com 236 118
friendfeed.com 164 98
www.whoppersacrifice.com 78 78
www.youtube.com 94 74
lifehacker.com 152 72
sethgodin.typepad.com 77 71
twitpic.com 115 69
twitter.com 114 69
www.smashingmagazine.com 76 68
digg.com 490 65
www.cnn.com 106 63
www.microsoft.com 59 59
www.ustream.tv 57 57
news.bbc.co.uk 102 53
truemors.nowpublic.com 78 52
danzarrella.com 50 50
www.google.com 52 48
www10.nytimes.com 82 47
xr.com 47 47
museums.alltop.com 47 47
www.engadget.com 534 45
www.mobilecrunch.com 48 40

Here are the top 25 sorted by number of cites.

domain cites users
www.engadget.com 534 45
digg.com 490 65
www.techcrunch.com 274 162
www.readwriteweb.com 263 216
news.cnet.com 236 118
mashable.com 210 189
www.techmeme.com 190 24
friendfeed.com 164 98
lifehacker.com 152 72
twitpic.com 115 69
twitter.com 114 69
www.cnn.com 106 63
news.bbc.co.uk 102 53
www.youtube.com 94 74
www10.nytimes.com 82 47
truemors.nowpublic.com 78 52
www.whoppersacrifice.com 78 78
sethgodin.typepad.com 77 71
www.smashingmagazine.com 76 68
twitrss.dyndns.org 63 4
www.msnbc.msn.com 61 26
www.microsoft.com 59 59
www.ustream.tv 57 57
news.yahoo.com 54 30
www.google.com 52 48

04 Jan 09 Twitter phishing URL made the list

I’m glad I haven’t automated the hot Twitter URL list yet… a “fake Twitter” phishing link showed up, so I changed its tinyURL in the database to “disabled.” I should create a page for that I guess.

Tags: ,

04 Jan 09 Taking popular sources out of the mix

As I’ve been working on the algorithm for hot Twitter cites, I’ve noticed that a handful of sources are responsible for most of the top 50 URLs cited.

They are:

  • Techcrunch
  • Engadget
  • Mashable
  • Cnet
  • Readwriteweb
  • Techmeme
  • Crunchgear
  • Lifehacker

Under the assumption that repeating content from popular sites doesn’t add much value, here’s what the current list looks like without those sites.

  1. YouTube – Broadcast Yourself. (32544 points)
  2. Phishing Scam Spreading on Twitter | Chris Pirillo (29291 points)
  3. Twitter / Tim: ack. how do i tell git to … (11197 points)
  4. Understanding Your Guests | chrisbrogan.com (7140 points)
  5. http://ow.ly/24v (6329 points)
  6. Seth’s Blog: When marketing goes nuclear (6126 points)
  7. Scientists discover true love – Times Online (5967 points)
  8. (5762 points)
  9. Seth’s Blog: Do ads work? (5494 points)
  10. Scientists: True love can last a lifetime – CNN.com (5248 points)
  11. Twitter be Nimble, Twitter be Quick, if you don’t know Jack, try these Twitter Tricks (4734 points)
  12. http://www.jpost.com/servlet/Satellite (4636 points)
  13. (4048 points)
  14. Seth’s Blog: Is everything okay? (3864 points)
  15. New! louisgray.com: Are We Too Connected to Social Media? (3798 points)
  16. New! I am getting a demo of something that will addict me much more than friendfeed in 2009. This is NOT good. – FriendFeed (3500 points)
  17. louisgray.com: 10 Predictions for 2009 In the World of Tech (3336 points)
  18. New! 100+ Remarkably Beautiful Twitter Icons And Buttons | Icons (3286 points)
  19. New! A Quick Public Service Announcement for Twitter Users | TheBusyBrain Blog! (3148 points)
  20. New! Sources: Burris won’t be allowed on Senate floor Tuesday – CNN.com (3075 points)
  21. New! How to Use Twitter to Grow Your Business — Copyblogger (2955 points)
  22. New! Religulous.avi (2955 points)
  23. New!
  24. New! Op-Ed Contributors – The End of the Financial World as We Know It – NYTimes.com (1927 points)
  25. Helloform » On Twply and giving out your Twitter password (updated) (1776 points)
  26. New! Digital Ethnography (1635 points)
  27. New! 10 Promising Free Web Analytics Tools – Six Revisions (1606 points)
  28. New! IPhone Apps: iSteam iPhone Steam Simulation App is Amazingly Cool (1554 points)
  29. New! Using social media in small business | Small Business Marketing Blog from Duct Tape Marketing (1413 points)
  30. Scobleizer — Tech geek blogger » Blog Archive Twitter warning: your account data is being sold « (1387 points)
  31. ReTweet Mapper (1279 points)
  32. New! Happy Tweets (1227 points)
  33. New! The Air Force’s Rules of Engagement for Blogging — Global Nerdy (1222 points)
  34. New! 60 New York Times profiles on Twitter | PRBLOGGER.COM (1077 points)
  35. New! Twitter Blog: Gone Phishing (1072 points)
  36. New! AppleInsider | Apple files for patent on winter-friendly iPhone gloves (1042 points)
  37. New! http://mrtweet.net?c=12 (1035 points)
  38. New! Robbie Madison’s Amazing Record Jump Video (918 points)
  39. New! ideasonideas – Eric Karjaluoto discusses design, brands and experience » Blog Archive » Why your web startup will fail (918 points)
  40. New! YouTube – Privacy Sucks In Social Media (917 points)
  41. New! Truemors :: Wikipedia Meets $6 Million Fundraising Goal (836 points)
  42. New! Israel and Gaza – The Big Picture – Boston.com (808 points)

03 Jan 09 Added a page for hot Twitter cites

I’ve added a page to this blog for hot Twitter cites, based on the code I’ve been writing over the last few days.

02 Jan 09 Updated: frequently Twittered URLs

I’ve added the necessary code to report on frequently posted URLs by day, so here’s an updated list of the URLs people have been frequently Twittering since yesterday (beginning 12:01 New Years Day).  Some of these were in the previous list, but some are new.  Comparing the list from day to day gives an idea of what’s gaining momentum.  I’ll be adding an hour-of-day column to the database, too, to get finer-grained acceleration reporting.  This is starting to come together into something I’ll automate soon, it looks like.
Anybody aware of a similar list that I can compare for a reality check?  I’m wondering if my sampling is intelligent enough to catch most, if not all, of the highly popular URLs being Twittered.
  1. 1 URL(s), 529 users: http://twitter.com/toni_stewart/statuses/1083734925
  2. 1 URL(s), 358 users: http://twply.com/
  3. 1 URL(s), 290 users: http://happytweets.com
  4. 1 URL(s), 258 users: http://water.alltop.com/
  5. 1 URL(s), 257 users: http://twitter.com/gfxmonk/statuses/1083729313
  6. 1 URL(s), 213 users: http://twit.pix.ly
  7. 10 URL(s), 121 users: http://www.searchenginejournal.com/top-20-twitter-posts-of-2008/8221/
  8. 13 URL(s), 120 users: http://www.copyblogger.com/grow-business-twitter/
  9. 2 URL(s), 120 users: http://dcortesi.com/tools/my-first-follow/
  10. 10 URL(s), 113 users: http://www.techcrunch.com/2008/12/31/top-social-media-sites-of-2008-facebook-still-rising/
  11. 1 URL(s), 113 users: http://nutrition.alltop.com/
  12. 8 URL(s), 107 users: http://www.chrisbrogan.com/27-blogging-secrets-to-power-your-community/
  13. 7 URL(s), 105 users: http://www.techcrunch.com/2008/12/30/large-form-ipod-touch-to-launch-in-fall-09/
  14. 1 URL(s), 99 users: http://mrtweet.net?c=12
  15. 6 URL(s), 92 users: http://www.prblogger.com/2008/12/60-new-york-times-profiles-on-twitter/
  16. 7 URL(s), 73 users: http://www.crunchgear.com/2008/12/31/all-zune-30s-crapping-out/
  17. 6 URL(s), 71 users: http://mashable.com/2008/12/30/how-to-simplify/
  18. 4 URL(s), 65 users: http://www.boston.com/bigpicture/2008/12/israel_and_gaza.html
  19. 6 URL(s), 63 users: http://gizmodo.com/5121311/30gb-zunes-failing-everywhere-all-at-once
  20. 1 URL(s), 58 users: http://www.zuneboards.com/forums/zune-news/38143-cause-zune-30-leapyear-problem-isolated.html
  21. 3 URL(s), 57 users: http://blog.wired.com/defense/2008/12/israels-info-wa.html
  22. 5 URL(s), 54 users: http://online.wsj.com/article/SB123051100709638419.html
  23. 1 URL(s), 54 users: http://twoogie.com/
  24. 3 URL(s), 51 users: http://mashable.com/2009/01/01/twitter-user-types/