msgbartop
Social media analytics for decision-making
msgbarbottom

11 Feb 09 First TwURLed News vertical – social media

I’ve just launched a beta version of a “vertical” slice of TwURLed News (formerly Tweetsnet). It is TwURLed News -Social Media. It uses the same infrastructure as the general TwURLed News blog, but focuses on people and pages that tend to be about social media.

I seeded the search system with a number of words, phrases, people and tags that are related to social media.  This includes things such as the phrase “social media,” for example.  The robot periodically searches Twitter for a handful of those terms, which leads it to find people and cited web pages related to the target subject.  I have a fairly long list of other evidence that a tweet or web page is about social media; each tweet, page title and page description is checked against all the evidence.  I’m calling it evidence because, like the other TwURLed News algorithms, this one uses evidential logic to estimate relevancy – the more, the better.  Each bit of evidence is assigned a weight and those weights are combined to meaure how related to social media a tweet or page is.

As the system identifies people, pages and tags that have strong correlations to social media, it should be able to figure out additional evidence, particularly words.  We’ll see how much that can be automated, but I’m hoping that TF/IDF (term frequency/inverse document frequency) will reveal at least some such terms.  One of these days I’ll take a deep breath and use SVD (singular value decomposition) and other clustering techniques to find patterns in the people, pages, tags and words.  I’ve had fairly good success with that in the past, although until lately, I could never figure out a good way to fully automate it.  If Twitter continues to be a place where people retweet and repeat the same URL citations, I have high hopes that a fully automated system will be useful.

If it’s not already obvious, what I’m doing is not very different from Google’s PageRank algorithm, which considers a page more significant if a lot of other pages have links to it.  I’m finding pages cited on Twitter that have a lot of people linked to them, so to speak.  One of Google’s ongoing problems is link spam, which is more or less like the problem TwURLed News faces with aggregators.  It is very easy to spew a ton of URLs, which can make a “person” on Twitter appear more prescient than it really is… but the good news is that it is fairly easy to exclude them.  On the web, it is relatively simple to fake the date and time an article was posted, but not on Twitter.  That means that nobody can fool the system by pretending to have cited a popular page before it became popular.  That’s a big advantage – it prevents quite a few potential spoofing approaches.

I’d love to hear your suggestions for further verticals.  I’m working on one that will cover web analytics, though at first glance, there doesn’t seem to be a lot of #wa talk on Twitter.  (That was a hashtag, for those who don’t Twitter yet.)

blog comments powered by Disqus