Your Data Probably Sucks
Are you sure your social listening data doesn’t suck?
Are you sure your social listening data doesn’t suck?

I think the title says it all.  Everyone is finally scrambling to figure out how they are going to listen and learn from social data, but the care with which some programs are being set it is shoddy at best.  I know that sounds controversial, but this is not a condemnation of all programs. I certainly don’t know everything about social analytics (no one does) but in an effort to be provocative I am going to make that statement.  Why?  Because it is my job at NetBase to Evangelize the concept of social analytics (and by the nature of my role to make outlandish statements that some don’t agree with) Frankly, from my experience working within and selling to companies setting up programs, this is what I see each and every day.

Why am I choosing to call out social listening data? Because I think that when we rang in 2012, people were arguing too much about the importance of their tools having the right features. Today, those same people are spending too much time making sure they can see every single sound bite.  Forget a representative data set; the pervasive idea is that more is better.  Essentially, 2012 involved what I call the battle for content in social analytics tools.  I posted about this on my blog in March 2012, so I have been trying to figure out how to think about it for a while now.

If you are not being critical about the quality of your social content, or better yet, not educating yourself on what quality data is, you could be making flawed business decisions.

Yes, when it comes to social analytics tools, having great features that help you measure what you want will always be a part of the dialogue.  Whether it is visualizing the data over time, being able to see influencers, or having the ability to engage; all these things will never stop evolving or being critical to the discussion.  What needs to change is how people argue about their social media content.  If they don’t go deeper into what makes a content set “good” then they risk the validity of social media adoption within their company.

Why would I make this statement?  Because I have long believed in using social media to make business decisions and I am concerned that those who just now finally believe in the power of social media, could be making a fatal error.  The error is this; if you are not being critical about the quality of your social content, or better yet, not educating yourself on what quality data is, you could be making flawed business decisions.  The point of this post is to get you to stop and think about whether or not more data is really better?

So why might your data suck? 

In social media analytics, most vendors want you to ignore the computer industry maxim, “Garbage in, garbage out.” This only works if you are stuck in the “selling eyeballs to advertisers” model.  In the ad game, more is better is true – if advertising were free, we’d show every ad to everybody on the planet, which is the spammer’s business model.

The idea that more is better describes addiction, not efficiency. Like any other drug, its rewards are fleeting. Think of this the next time somebody invites you to drink from their social media firehose: “more is better” thinking will reward tools that fail to filter out the pond scum that we all know is floating atop the social media ocean – junk, spam, duplicates, etc.

Duplicate Posts

Let’s start with the most innocuous of these – duplicate posts. Unchecked, they creep into systems for several reasons:

  • The same content is available under more than one domain name.  This can be as simple as a site that achieves scale and redundancy by having multiple identical servers, such as and This is much more common than you might imagine. Or there simply might be more than one domain name for the same site – that takes about 10 seconds to set up and is often done for marketing reasons. E.g. and point to the same server (and so also might!).
  • In discussion venues, the same posts often appear on multiple pages to make it easier for visitors to read a conversation thread. Pages are optimized for humans to read, not for web robots to collect.
  • Most social media analytics companies buy content, especially blogs, forums and news, from third-party suppliers. For redundancy, the larger companies will buy the same content from more than one vendor and may also collect it themselves. However, even small differences in the way the vendors format data, such as inconsistent conversion of timestamps to a common time zone, will lead to duplicates.
 The idea that more is better describes addiction, not efficiency.

Further complicating things, legitimate duplicates need to be preserved – press releases that are reproduced on multiple sites, messages that are actually reposted by multiple people – anything that actually garners additional attention.

Junk and Spam

Now, onto junk and spam.  Do you really want all of Tumblr in your analytics?  Really? Read this. What percentage of Tumblr – or blogs in general – is meaningful? Go to Google, turn off SafeSearch and query this: “” and look through the results.  Still think “more is better”?

The dirty little secret of all the vendors who claim to monitor the world’s 165 million or so blogs is that about 164 million of them are crap – pharmaceutical sales, Amazon link spam, etc. The good news is that it isn’t so hard to filter out much of the junk, if you’re willing to make the effort – but you also have to be willing to forego the myth that more is better.

It’s not just blogs.  Search the Internet for “free forum hosting” and then go explore the domains that you find.  Any site that makes it easy to create a new forum site also makes it easy to create a new spam site.  Try searching those domains for popular pharmaceuticals or words such as “blackjack” or “make money fast.”  Still think more is better?  If so, you are playing a numbers game, gambling that if the numbers are just big enough, you’ll strike gold. In other words, you’re thinking like a spammer!

What can be the implications of a “more is better” mindset?

Now let’s apply this concept to your day to day usage of social data.  We all are working to uncover how to apply the data.  In fact, many companies like the one I work for will come to tell you about all the ways this can help your business.  They talk about use cases like campaign management, crisis management, innovation, customer service and many others.  These use cases are valid extensions of how to make it work (we all know that), but the point here is about the hidden dangers of not committing yourself to understanding your content.  To best illustrate why you must have good accurate content rather than simply voluminous content, let’s take a look at a few implications of the more is better mindset.

You overinflate the importance of data shifts in your campaign tracking

Everyone wants to know how their latest campaign is doing.  In fact, most are starting to realize that social can help them see the results of the campaign in near real time.  How are you doing this?  Are you simply tracking the buzz or are you going deeper to understand what they actually liked or dislike about your campaign?  Think about it. What do you really get from having all the data to show you movements if the that data you have is not right?

The point is that when the data is dirty, “more is better” is a very dangerous false positive waiting to happen.

Recently, I met with a customer that was telling me about how important it was to have every single piece of data to determine the success of their campaign. They were considering our product versus another (and my point isn’t about selling here but about their process) and were giving us the inquisition about can we do every single thing to fit their current process.   What they described was this; for their campaigns they would compare the changes in buzz over time and even versus past time period.  They were insistent that to do so, they needed all the data they could possibly get.  They wanted to know the change in sound bites, what the increase in twitter followers were and other quite simple changes in the data.  They also didn’t care about sentiment, they only wanted to make sure they had all the data they could to make sure they captured the changes.  If they didn’t have it all, then it didn’t follow their process.

Why was this problematic?  Well, firstly, by only looking at the change in buzz, they are missing the richness of the why.  Secondly, having a more is better mindset here, without considering data quality could very easily give you movements in your data that are not real.  You might think you have moved mountains, but the reality is that you may have simply added a ton of duplicate content or spam that says you got a lot of volume.  The point is that when the data is dirty, “more is better” is a very dangerous false positive waiting to happen.  Stop. Make sure your data is clean. Then come up with a better way to think more holistically through the problem.

You can’t react quickly in a crisis because you won’t really know why

So you might be thinking, that’s great for a campaign to see movement, but how does your argument hold water when I am in a crisis?  And you are right, in a crisis, having all the content is critical.  It is about content.  You want to make sure nothing gets past the goalie so to speak.  When listening during a crisis, you do need to understand where the problem originated, who said it, how much acceleration there is and other things we can think of.  If you don’t have total data clarity you can have a problem.

So let’s address the “more is better” concept of content in a crisis.  You see something.  It looks bad.  You know where it started.  You start to track it.  You are tracking everything as a means of understanding where things are moving.  You see it getting worse.  You come up with your strategy and respond.  You see movements again.  It looks like it is still getting worse.  You may have made it worse.

In a crisis, quality data helps you find the needle in your data haystack  when seconds may count and your job may be on the line.

But do you know why?  And if you want to know why, can get you get to the origin of that why because the data has grown so much so fast?  I bet you can’t.  As a crisis unfolds the nature of it is to go viral, which means more data.  Let’s assume for a minute that even though your data is junk, it is generally accurate enough to watch the trends.  So you know the situation got worse.  But you won’t be able to get to the why quickly because it is like looking for a needle in a haystack. Yes, it takes good sentiment analysis to do that, but wouldn’t it be way easier if your dataset is accurate with less crap in it?  Of course it would.  Again, what I am challenging here is your focus.  This post is about creating awareness to make sure your social media data isn’t full of crap so you can do a better job with your use case implementation.  In a crisis, quality data helps you find the needle in your data haystack when seconds may count and your job may be on the line.

Your internal customers won’t trust your recommendations because your accuracy sucks          

This post is getting long so I will keep this short.  I spend a lot of my time championing the cultural aspects of driving change.  It is no different when it comes to social.  You are trying to get your company to listen to you about the dangers of ignoring social data as a new source for decision making.  They already don’t trust it.  If you are not militant about the data quality of your social data set, someone else will be.  You know who that is?  The risk averse boss or executive who likes the way things are done.  Now play out the scenarios that are already happening.  You are social media Chicken Little.  They are tired of listening to you.  If your content sucks and you have to say “oops” too many times, Chicken Little will become fried Chicken Little.

So again I ask you this… Are you sure your social listening data doesn’t suck?  And if you are a bit worried about it now, what are you going to do to make sure it doesn’t?  Get your head out of feature fantasy and content complacency.  Bring a little accuracy to your content and know that accurate data helps you make better decisions faster.  Don’t be a fried Chicken Little.

Special thanks to Nick Arnett, Director of Product Management at NetBase Solutions, Inc., who helped contribute to this post.

Enhanced by Zemanta
Please follow and like:

About the Author

Malcolm De Leo
Malcolm, Chief Evangelist at NetBase Solutions, Inc.,  is a subject matter expert in the area of applying social media in an effort to build the marketplace for this powerful new consumer data source. Previously, Malcolm was the Global Vice President of Innovation at Daymon Worldwide and prior to that Malcolm spent 10 years at the Clorox Company managing partnerships with technology companies, developing innovation processes and building new innovation infrastructure.

Comments are closed.

VIP Explorer’s Club