msgbartop
Social media analytics for decision-making
msgbarbottom

03 Dec 08 Analytics data warehousing

For many analytics practitioners, how to store data is somebody else’s problem – Google, Omniture, Yahoo/IndexTools or another third-party provider.  As Gary Angel points out, “Sophisticated organizations are increasingly finding good reasons to move data from their web analytics tools to other data processing and analysis platforms.”  Indeed.

Having spent the last few years designing, building and using a terabyte-scale analytics data warehouse, I have to agree with Gary.  He calls it “moving” or “transferring” the data, but I see it as an “also,” rather than “instead of.”  At LiveWorld, our data warehouse was complementary to various third-party analytics systems that our clients used.  Name just about any tag-based solution and we were supporting it.  We were even starting to test Google Analytics in “hybrid” mode, where the data goes to Google and to the local server, potentially allowing tag-based data to go into the data warehouse, which would provide a cross-check, at the very least.  Some of our customers, particularly bankers, banned third-party JavaScript for security reasons, so they relied entirely on the data warehouse.  (We could have hosted the tag scripts locally, but decided not to deal with the resulting maintenance issues.)

One of our clients did something that I suspect many other large businesses will opt for – they asked us to create a daily feed for their data warehouse.  The feed didn’t include full detail, it was aggregated data, relatively easy to customize because the source was a set of queries against our warehouse.  This approach offers the advantage of allowing deep data integration on the client’s systems.  They were able to out-source social media to us, yet run integrated reports while preserving customer privacy.

With a data warehouse, you can go beyond the kind of data you can get from tag- or log-based systems.  Social media is virtually always based on application servers.  There’s a database underlying the app server, which means that you can periodically query the database and stick the results into the data warehouse.  In some cases, you’ll just want a snapshot of the state of the app server.  Much of the time, you’ll want the app server to record events with a timestamp, so you can query for the full detail of what happened and when.

Once you have a data warehouse up and running, the big advantage is reporting flexibility.  The whole point of a data warehouse is to store information in a way that allows fast-running queries to be written easily.  At the most basic level of a typical data warehouse, there are no pre-suppositions about what queries will be run.  However, to gain acceptable performance, aggregates often need to be created (ideally, invisibly to users) for commonly run queries.

Next, I’ll given a quick overview of how data typically gets into and out of a data warehouse.

Tags: ,

blog comments powered by Disqus