In a lengthy conversation with a fairly well-known analytics thought leader earlier this year, I was startled when he mentioned that he didn’t know much about statistics. We were talking about my preference for simplicity and I was describing methods of eliminating redundancy. I gave up. While I would not claim to be an expert on statistics, I took the essential classes in college, I’ve been mentored by a couple of Ph.D. statisticians and I do know the parts that are important to the kind of analytics I do.
If there is one thing I’d urge on anyone doing this kind of work, it is to be in the habit of examining the assumptions behind your metrics and models. Bad assumptions lead to ambiguous or meaningless data.
Here is my favorite bad assumption: it is good when visitors view more pages. No. That behavior might mean bad site design. If people can accomplish their goals by clicking fewer pages, that generally would be considered an improvement. “More page views” is good for increasing the frequency of advertising, which was pretty much an unquestioned good thing before the Internet came along. It is not necessarily true any more, so it has become a bad assumption.
Here’s a bad assumption that should be thought-provoking: Some of the values will be equal to the average. Wrong. My favorite example is that nobody has the average number of arms. Think about it.
Perhaps the most common bad assumption that leads to statistic struggles is that data is distributed normally. Plenty has been written about this subject and I won’t try to summarize it, but here are some things to remember: