On my process…Christopher’s take on #makeovermonday week 3

By popular demand (read because Ann practically BEGGED me to) I will go through my process for this week’s Makeover Monday…and my process in general with how I interact with and present social data.  A quick shoutout to my friend and Tableau social buddy Michelle Wallace who did an amazing presentation at TC16 on some of the concepts I’ll cover here in greater detail.

I have been working with social media (Twitter, Facebook, Blogs, etc.) data for the past year.  Typically there are 2 ways to go about accessing this data: through a third-party vendor or through the service itself.  Third party vendors usually aggregate things and export in nice neat csv’s or xlsx’s but can be expensive.  I personally like going directly to the source.  When I first started working with this data it wasn’t long before my searching on google led me to a host of other Tableau developers encountering similar issues: how to glean value out of social media data.  I noticed very quickly that a lot of folks were utilizing Tableau’s Web Data Connector to connect directly to the Twitter and Facebook data stores.

A great starting point is hitting up the Tableau Junkie (also known as Alex Ross) for his blog on creating a Twitter Web Data Connector

A forewarning this connects to Twitter’s Public Search API which is limited to the latest 5000 tweets, for anything more than this you will have to purchase Twitter’s “firehose” data stream called GNIP.

Once you bring in the data you get a similar set of fields like we saw on this week’s Makeover Monday.  The key to social media analysis is asking the basic 5 questions: who, what, where, when, and why.  We want to know who said it, what they were saying, where they said it, when they said it and all of this hopefully combines together to get at the why they said it.  A key ingredient to this equation is the “what” portion…in our data that was the “Status Text”.  There are a lot of ways to glean keywords and hashtags using regex and other methods but I wanted to see the frequency and prevalence of the exact words…ALL of the exact words.

My first thought was to use the “Text to Columns” function in Excel.  However, I quickly realized that each tweet might contain 10, 15, 20+ words and multiply that times 5,000 tweets at a minimum (if only using the Twitter Public API) we are talking 100,000+ rows of data, not the easiest thing to drag down in Excel.  Then even if we could get all those words we just have the words I wanted to be able to see what those words connected to, who said them, when were they said…and eventually be able to derive my own sentiment analysis using those words…a little more googling and VOILA: ALTERYX!

Alteryx not only has a text to columns, but the key was a text to “rows” function.  For example if you have a tweet that reads:

“Today is a good day”

What should return is:

Text                                               Word

Today is a good day          Today

Today is a good day            is

Today is a good day            a

Today is a good day            good

Today is a good day            day

You get the picture.  A little clean up and employing the NLTK (Natural Language ToolKit) stop words ridding us of those nasty “the” and “me” and “and” and now we’re starting to get down to the meat!

One of my absolute favorite features in Tableau is the word cloud…a very simple visual utilizing the tree map and changing the mark type to “Text” for more info see a step-by-step how-to here.

A final piece that I could not get to work in a timely fashion was the search box parameter (thank you Matt Francis for the great trick) but there’s some magic needed to be able to create an action on the word cloud, the trend line, AND make the text searchable.

Needless to say I think this is a very helpful way for anyone to be able to explore the data at a high level and then drill down into the content of the posts.