Thursday, December 3, 2009

Missing Data

I have witnessed so many cases of data either being mis-utilized, under-utilized, or simply not being used at all that I thought that it is important for me to talk about it in this blog. Companies fail at making the best out of information which they already have. Sometimes this could be simply because during their day-to-day activities they just do not get the time to be able to review data, or perhaps they do not realize the importance of doing some basic data analysis. Where companies do tend to use data, I see broadly four types of problems associated with the use of data:


1. Data analysts are so overwhelmed with the data available to them from various sources that they spend an eternity trying to unravel the data and make some sense out of it. Data, if not analyzed timely, loses all its meaning. Agreed, it is not always easy to analyze data or to make meaning out of it. But then, to someone who understands the data, attempting to make sense out of it is fairly simple. Some of the easy methods of being able to make sense out of data without wasting too much time on its accuracy are:

• Using approximations for data which could simplify the analysis
• Using surrogates against data which is not easily available or retractable
• Simplify the data analysis problem itself.

2. Managers are not clear about what they want. Problem definition is sometimes so weak that it is impossible to conduct an intelligent data analysis around the problem. For data analysts, there is no clear answer to a question such as, say, “ What do I do if…?”. Rather, data analysts are at best able to suggest what could happen if…. The what-do-I-do question can best be answered by decision makers. It requires both an understanding of the direction of the data, and the applicability of the date to the given solution.

3. Data analysts are prone to derive conclusions which are spurious. A good example is the one that is most often quoted with regard to the correlation between ladies hemlines and the stock market index. Again, data analysts who do not have a complete business understanding of the situation are likely to arrive at incorrect conclusions. They could possibly derive a meaning out of two sets of data where possibly none exist, leading to fallacious conclusions, leading to loss of confidence of managers in the analysis, or a lack of a well directed analysis.

4. And that brings us to the last of the concerns with data usage. Managers typically manipulate data to suit their conclusions. The entire purpose of data analysis is then lost since the end result is already known to the manager, and all conclusions are necessarily biased towards establishing his or her own point of view.

So, what should data be typically used for? There are a few simple, and not-so-earth-shattering things which should be done:

1. Look to data to seek a confirmation of what you as a decision maker have in mind.

2. Be as precise as possible in setting the question that you would like the answer for. The more precise the question, the greater the likelihood that you will be able to get a meaningful answer to your question from the data.

3. Do not expect data analysis to tell you what to do. As a manager, you must make the decision yourself. You can, at best, rely on data analysis to give you directions as to what to do.

4. Finally, double check the analysis from at least 2 or 3 different viewpoints to confirm that the answer you have is a meaningful one, and that you are not being misled by incorrect data analysis.

To assume that data analysis is the cure for all ills is a fallacy. At the same time, to decide not to rely on the wealth of information already available with you is a crime.

This blog is also posted on: http://mydatawise.com/missing-data

No comments: