<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Social Media Conversation Analyst &#187; data warehouse</title>
	<atom:link href="http://www.nickarnett.net/tag/data-warehouse/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.nickarnett.net</link>
	<description>Social media analytics for decision-making</description>
	<lastBuildDate>Mon, 18 Apr 2011 16:05:49 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.1</generator>
		<item>
		<title>The social media data warehouse</title>
		<link>http://www.nickarnett.net/2008/12/04/the-social-media-data-warehouse/</link>
		<comments>http://www.nickarnett.net/2008/12/04/the-social-media-data-warehouse/#comments</comments>
		<pubDate>Thu, 04 Dec 2008 16:28:54 +0000</pubDate>
		<dc:creator>Nick Arnett</dc:creator>
				<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[data warehouse]]></category>

		<guid isPermaLink="false">http://www.nickarnett.net/?p=39</guid>
		<description><![CDATA[I promised an overview of how data moves into and out of a data warehouse, so here goes.  The short version of &#8220;data in&#8221; is that there are ETL (extract, transform and load) processes that get data from various sources, change into a structure suitable for the data warehouse, then load it into tables in [...]]]></description>
			<content:encoded><![CDATA[<p>I promised an overview of how data moves into and out of a data warehouse, so here goes.  The short version of &#8220;data in&#8221; is that there are ETL (extract, transform and load) processes that get data from various sources, change into a structure suitable for the data warehouse, then load it into tables in the database.  The way data comes out is via SQL queries.  Nothing really unusual there, so let me explain how these are different from typical databases.</p>
<p>In a dimensional data warehouse, the data is extremely &#8220;denormalized,&#8221; meaning that instead of designing tables to eliminate redundancy and maximize integrity, they are designed to be extremely simple.  Ideally, there are only two types of tables &#8211; facts and dimensions &#8211; and every query joins fact tables to dimensions.  This is called a &#8220;star&#8221; schema.  Imagine a fact table at the center with dimensions as points of the star.  A typical web analytics fact table is a clickstream log; associated dimensions might be days, users, visitors, ip addresses and so forth.  That&#8217;s it in a nutshell.  Grab a book about data warehousing if you want details (and there are plenty) but I&#8217;m going to focus on some of the ways that a data warehouse for social media might be different.  There&#8217;s a lot to say about this, so this just a first post in a series.</p>
<p>If I were starting today, I think I would completely violate one of the principles of data warehousing and plan for the social media data warehouse to be on-line, a production system, rather than keeping it isolated.  The normal model is that the data warehouse only serves reporting and analytics needs of managers, clients, etc..  However, &#8220;analytics needs&#8221; are becoming part of the user experience.</p>
<p>We are so accustomed to thinking of analytics and reporting as a management tool, a way to keep clients and advertisers happy, that we forget that our communities can benefit from analytics, too.  Increasingly, when we discover something interesting in the data, there is value in exposing it to the community.  Unfortunately, that tends to be very slow to happen because analytics is usually a step-child in the engineering family.  When resources are tight, the production system gets priority, as it should.  So make the data warehouse a production system, not for analytics job security, but because analytics can add real value to social media.</p>
<p>Picture a community where anybody can blog.  Typically, the only feedback is how many comments a posting gets.  Imagine if each blogger could see how many visitors and page views each post gets.  Imagine if they could see which of their words are generating search engine hits.  In other words, empower each user to do their own SEO by giving them analytics-based feedback.  The vast majority probably won&#8217;t, but those few who do may have a great impact.  In social media, empowering the super-users may be <strong>the </strong>key to success.  Analytics is how you identify them, but don&#8217;t stop there.</p>
<p>The possibilities are enormous.  Social media analytics can tell us which people or groups have the greatest influence.  Feed that data back to the community.  Analytics can tell us which topics are heating up &#8211; feed that back.  Tell the community where the new visitors are coming from; there&#8217;s probably something interesting out there when it changes.</p>
<p>The architectural challenges are not simple, given that data warehousing isn&#8217;t intended to give real-time results the way that live production systems are.  I suspect that the path to this kind of capability is to mirror an aggregated version of the data warehouse in near real-time and let the production system query the mirror.  In any event, I do believe it is the way things are headed.  Analytics isn&#8217;t just a management tool &#8211; or perhaps it is, but we forget that every visitor can be a manager.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickarnett.net/2008/12/04/the-social-media-data-warehouse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Analytics data warehousing</title>
		<link>http://www.nickarnett.net/2008/12/03/analytics-data-warehousing/</link>
		<comments>http://www.nickarnett.net/2008/12/03/analytics-data-warehousing/#comments</comments>
		<pubDate>Wed, 03 Dec 2008 17:11:47 +0000</pubDate>
		<dc:creator>Nick Arnett</dc:creator>
				<category><![CDATA[infrastructure]]></category>
		<category><![CDATA[analytics]]></category>
		<category><![CDATA[data warehouse]]></category>

		<guid isPermaLink="false">http://www.nickarnett.net/?p=34</guid>
		<description><![CDATA[For many analytics practitioners, how to store data is somebody else&#8217;s problem &#8211; Google, Omniture, Yahoo/IndexTools or another third-party provider.  As Gary Angel points out, &#8220;Sophisticated organizations are increasingly finding good reasons to move data from their web analytics tools to other data processing and analysis platforms.&#8221;  Indeed. Having spent the last few years designing, [...]]]></description>
			<content:encoded><![CDATA[<p>For many analytics practitioners, how to store data is somebody else&#8217;s problem &#8211; Google, Omniture, Yahoo/IndexTools or another third-party provider.  As <a href="http://semphonic.blogs.com/semangel/2008/11/warehousing-web-analytics-data---interest-profiling.html" target="_blank">Gary Angel points out</a>, &#8220;Sophisticated organizations are increasingly finding good reasons to move data from their web analytics tools to other data processing and analysis platforms.&#8221;  Indeed.</p>
<p>Having spent the last few years designing, building and using a terabyte-scale analytics <a title="Wikipedia - data warehouse" href="http://en.wikipedia.org/wiki/Data_warehouse" target="_blank">data warehouse</a>, I have to agree with Gary.  He calls it &#8220;moving&#8221; or &#8220;transferring&#8221; the data, but I see it as an &#8220;also,&#8221; rather than &#8220;instead of.&#8221;  At <a href="http://www.liveworld.com/solutions/reporting.html" target="_blank">LiveWorld</a>, our data warehouse was complementary to various third-party analytics systems that our clients used.  Name just about any tag-based solution and we were supporting it.  We were even starting to test Google Analytics in &#8220;hybrid&#8221; mode, where the data goes to Google and to the local server, potentially allowing tag-based data to go into the data warehouse, which would provide a cross-check, at the very least.  Some of our customers, particularly bankers, banned third-party JavaScript for security reasons, so they relied entirely on the data warehouse.  (We could have hosted the tag scripts locally, but decided not to deal with the resulting maintenance issues.)</p>
<p>One of our clients did something that I suspect many other large businesses will opt for &#8211; they asked us to create a daily feed for <em>their</em> data warehouse.  The feed didn&#8217;t include full detail, it was aggregated data, relatively easy to customize because the source was a set of queries against our warehouse.  This approach offers the advantage of allowing deep data integration on the client&#8217;s systems.  They were able to out-source social media to us, yet run integrated reports while preserving customer privacy.</p>
<p>With a data warehouse, you can go beyond the kind of data you can get from tag- or log-based systems.  Social media is virtually always based on application servers.  There&#8217;s a database underlying the app server, which means that you can periodically query the database and stick the results into the data warehouse.  In some cases, you&#8217;ll just want a snapshot of the state of the app server.  Much of the time, you&#8217;ll want the app server to record events with a timestamp, so you can query for the full detail of what happened and when.</p>
<p>Once you have a data warehouse up and running, the big advantage is reporting flexibility.  The whole point of a data warehouse is to store information in a way that allows fast-running queries to be written easily.  At the most basic level of a typical data warehouse, there are no pre-suppositions about what queries will be run.  However, to gain acceptable performance, aggregates often need to be created (ideally, invisibly to users) for commonly run queries.</p>
<p>Next, I&#8217;ll given a quick overview of how data typically gets into and out of a data warehouse.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.nickarnett.net/2008/12/03/analytics-data-warehousing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

