Analytics, Halloween edition – trick or “tweet”

It’s that time of year…when ghastly ghouls appear to haunt your every step and zombies sneak through the streets in search of tasty prey. Cauldrons bubble and boil at the witching hour, promising more than just a little trouble for anyone who dares to look upon this sight. Suddenly, you hear a far off howl of a werewolf and try to take shelter in the nearest convenience store – only to find yourself face to face with an arachnid of monstrous proportions.

Such is the horror of Halloween!

Luckily, Halloween is usually far less frightening than that for most of us. It’s typically a fun celebration, especially in the United States, where over $2B is spent annually on just candy for the occasion. For many households, planning starts well in advance, with parents wondering what candy and costumes to buy their kids, and even pets.

But what does Halloween have to do with analytics?

If you want your house to be the trick-or-treat hot spot, you’d better know which candy is most popular. And how about outfitting your kids in the coolest costumes? Do you know which ones are in vogue?

These sound like simple questions, but the answers are not always readily apparent. Clearly, a company selling costumes would have a sense of what’s hot, based on their sales data. But what about the rest of us mere mortals. How can we track Halloween preferences?

Analytics for the mere mortal a.k.a. the self-service user

The use of analytics to inform decisions and predict the future has been lauded by many. Analytics can help predict anything from when a dangerous storm will hit to which patient is most likely to be readmitted for further treatment. However, until recently, leveraging analytical tools to derive these kinds of insights was largely confined to those with an extensive analytics background and programming skills – such as data scientists or IT analysts.

But a revolution is underway, fueled by the desire for users of all skills levels – especially those without a technical background – to have “easy” ways to derive insights. As a result, there is a growing market for easy-to-use, self-service analytics tools, which enable users to explore data, visualize and predict faster than ever before –without requiring extensive training in data science methods or IT support.

Tools like IBM Watson Analytics  make it easy for anyone, like you and me, to take data and derive interesting insights. Analytics is not just for technical users or data scientists anymore. In fact, Gartner predicts that amateur analytics users will outpace trained data scientists 5:1.

However, analytics without the “right” source of data is meaningless. Social media data is increasingly becoming one of the most valuable data sources because it comes directly from consumers. Today, almost 65 percent of all American adults are on social media.  Moreover, social media use for top platforms like Twitter are even higher in other countries such as India, Indonesia, Philippines, Mexico, and Brazil. This suggests there is vast potential to leverage social data from platforms like Twitter to gain insights about our global population.

One feature of IBM Watson Analytics is the ability to explore Twitter data, by searching for hashtags over a specific time period and generating a dataset composed of tweets that can be analyzed further using the self-service capabilities of the tool.

Which candy rules?

In the “spirit” of Halloween season, let’s run a simple Twitter analysis in IBM Watson Analytics to answer the following question: Which candy is king – candy corn, Snickers, gum, lollipops or Reeses?

I entered the following hashtags in the Twitter data upload box, and selected a date range (in this case all of September):

#halloween #candycorn #snickers #gum #lollipop #reeses

Figure 1: Upload the tweets we want

Notice that the data upload box shows the number of tweets that reference at least one of these hashtags – in this case, there are more than 48,000 tweets in total.

After uploading the data, you can begin to explore it by creating simple visuals. I’d like to see how sentiment compares for each of my candies of choice. Watson Analytics uses an internal algorithm to score sentiment of tweets. Sentiment refers to the attitude of the tweet writer with respect to the topic at hand.

In order to compare candies, we’ll need to filter our data using two steps. First, select #halloween to be “1”, indicating that we only want to include tweets that also reference Halloween (see Figure 2). This is to ensure that we’re only looking at Halloween-related candy tweets. As a result, our data set has been condensed quite a bit (to a total of 211 tweets).

It’s possible that many of the other tweets in the data set that reference the five candy hashtags ( which comprise about 11,000 of the original 48,000) are also written by Halloween-candy enthusiasts. However, for accuracy, I prefer to take the smaller set, which is likely to be hyperfocused on Halloween since they were tagged as such. You could also expand the time horizon for analyzing tweets,  including time periods before and after the event, to get a larger sample set.

Figure 2: Filter on Halloween tagged candy tweets
Figure 2: Filter on Halloween tagged candy tweets

Next, we’ll create a simple bar chart, filtering the tweets by candy hashtag (see Figure 3). You’ll notice that gum drops out of the filter list, indicating that there were no tweets that referenced both #gum and #halloween.

Figure 3: Select candies to view sentiment
Figure 3: Select candies to view sentiment

Now let’s also filter our chart by type of sentiment (see Figure 4). We want to include positive, neutral and negative sentiment, so that we can compare.

Figure 4: Filter on sentiment types to view
Figure 4: Filter on sentiment types to view

Now, let’s take a look at our winner – candy corn! Not only does it have the largest total sentiment feedback, indicating that more people are discussing it, but the ratio of positive to negative sentiment is very low, suggesting that our Halloween candy connosieurs overwhelmingly feel love for candy corn (see Figure 5).

Figure 5: Our candy winner

Candy corn rocks – who knew?

I have to admit I was a bit surprised by the results of our Twitter analysis. After all, why would a relatively old-fashioned, arguably plain candy be held in such high esteem?

However, I did some searching and compared my results with a consumer survey from the National Confectioners Association ranking potential 2015 candy sales. It suggests that the second highest seller – right after the general category of “chocolate” – will be candy corn.

Not bad for a little analytics magic!

While a thorough exploration of Halloween candy preferences using social media would require more analysis and should consider data sources beyond Twitter, my quick exercise does tie nicely with public opinion.  And it reinforces the importance of utilizing social media sources to better understand consumer preferences.

Hopefully, this tasty example has you bewitched about the endless possibilities of using social data and self-service analytics tools to search for answers. Analytics really is for everyone, not just trained data scientists who understand complex algorithms and sophisticated tools. And that’s not just “witch”-ful thinking!

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Pingbacks & Trackbacks