Using Tableau 9.0’s Features: data prep and editing calculated fields

This weekend was the 60th anniversary of the polio vaccine, which removed the potentially deadly disease from the public consciousness of now three generations of children and parents in the US and most of the developed world. I thought this might be a good opportunity to explore the impact that the vaccine has had on childhood mortality worldwide, as well as play with some new features of Tableau Public 9.0 before I start training people on it next month.

I found data on polio fairly easily from the World Health Organization. Their data repository is quite comprehensive, but its format is kind of nasty. Not nasty like unstructured data can be, but its structure needs some work, like the removal of the empty row before the field headers (which isn’t so common in CSV files), and the fairly common use of time dimensions as if they were measures.

You’ve probably dealt with a lot of data sources like this:

Do you hate bad design as much as I do?

Help! I’m not a columnar data store!

So, the way I used to solve this problem was by loading the data that I wanted into SQL Server (my “work husband”) and then writing a ton of union statements to transform it into a new fact table. There’s probably an easier way than this, but it’s what I know. And I’ve gotten fast at it.

But I was super excited to see Tableau’s new Transpose feature in the data engine because–as much as I love proving my nerdy skills by writing intro-level SQL–this new data engine feature might free me from such tasks. So, I loaded up the CSV into Tableau 9.0, and the pesky empty row that I thought I could delete with the new data prep tools (which are awesome) didn’t work on my file…because it isn’t Excel. So that was annoying–I had to open the file in Excel to remove the row anyway.

But! Pivoting the data made up for it. It’s actually very easy, as you can see in this video–I highlighted the fields that I wanted, which were ALL of the years, shown below…

Wait for it...

Wait for it…

And then I right-clicked and select “Pivot”. Easy-peasy!

The fields pivoted automagically!

The fields pivoted automagically!

Right-clicking and renaming the fields is a good idea, too..

Once I got the data loaded, I set about developing a useful analysis. I was curious how the rate of polio vaccinations had changed over time, and then how it was related both to other vaccination rates and to childhood mortality. In 1988, the World Health Assembly created a new initiative to dramatically lower the rates of polio: the infection rate in 1988 was 350,000 cases in 125 countries (about 1/200 results in complete paralysis); in 2013, there were only 416 new cases, with almost half of them in Afghanistan, Pakistan, and Nigeria. (You can read more about it here.)

Those numbers speak for themselves, so I didn’t visualize them, but I did want to create a viz that allowed you to click on a country and to see the top three causes of mortality in children under five years old, as well as the change in childhood mortality and vaccination rates since 1988, which necessitated a table calculation. I love table calculations–we teach them from almost the beginning of our Tableau Desktop curriculum, because they’re incredibly useful. So, I set about adding some nice and simple ranking functions…I often customize table calculations, usually to show the percent difference from the average, so when I set about customizing a table calculation, the “Customize” button was not there…I could only edit the calc in the shelf, which is not very useful: shelf widths are limited to about 200px, and I actually really like the calculated field editing box–its new design is great. Why can’t we right-click on a table calc to customize it?)

customize button

And it’s entirely possible that I missed something during the Tableau 9.0 Roadshow that showed me how to do this more effectively. Which is a good segway back to polio…

Since polio is communicated from person to person through contaminated food or water, I added in some addition data points from the most recent set of World Bank Indicators, which I transformed using the aforementioned SQL unioning technique last year. It’s a highly informative data set, and Tableau ships a version of it, but I prefer to get to the native state, if possible, and since the other data sets in use are aggregated at the Country-Year-Metric level, blending them is very efficient.

Here’s what the dashboard looks like when you click on Afghanistan: it’s ranked #176, out of about 180 countries, with 71% of one-year-old children receiving the polio vaccine. What’s a little bit more shocking is that in a country where the US has “invested” billions of dollars, the healthcare spend per capita is only $51. That’s not much. In Norway, it’s, like, $10K. (Now that you know where Afghanistan is, go to this dashboard and click on it.) Oh, and 29% of the country has access to clean water. (Check out Rory Stewart’s documentation of life in rural Afghanistan if you want a really good feel for what things are like there.)

The trend lines for vaccination rates versus early childhood mortality in Afghanistan don’t look like they do for other countries, either: most countries keep making progress once they start. And fortunately, most countries aren’t overtaken by repressive regimes like the Taliban, which most likely accounts for the fluctuation in vaccination rates in the late 80’s and early 90’s. After their defeat in 2001/2002, it’s good to see that polio vaccination rates went from about 25% back up to 70%, which is almost a 300% increase.

How's your geography?

How’s your geography?

Other good news for Afghanistan: the childhood mortality rate has decreased about 50% since the polio initiative started in 1988, but it’s just one of many factors, which we’ll discuss more in upcoming blog posts. One in ten Afghan children don’t make it to the age of five, which is still shockingly high, but it’s better than one in five.

Check out the dashboard, shown below.

 

Personal note: my high school Latin teacher, who introduced me to Greek and Monte Python, has struggled valiantly for many years with the recurrence of polio that he contracted as a child. I really admire him, and I hope that raising awareness of the impact that a single vaccine can have will help prevent children from suffering unnecessarily.

And thanks to my mother for cultivating in her children an appreciation of our luxurious lives through literary exploration of other cultures.

Where is the Rattling and Rumbling?

I was flying out of Seattle on the way to southern California this morning, and when I looked out the window, we were rapidly approaching Mount St. Helens, which is quite beautiful and shocking. I remember reading about it as a child and am still in awe of the volcanic power of the earth. And that got me thinking–there are a lot of people who live fairly close to Mount St. Helens…and even closer to its dormant brethren, Mount Hood and Mount Rainier.

While those are humbling, I was a bit more enlightened/intrigued by the power of meteors when Neil deGrasse Tyson (@neiltyson) spoke about them in his keynote at the Tableau Conference in Seattle last week (ah, Seattle again…). A decently sized meteor could cause a tsunami that more or less wipes out the west coast of the US. (Thanks to the NOAA for reminding me who’s the boss–check out the data at http://www.ngdc.noaa.gov/hazard/)

So! I thought that some earthquake and volcano data would be perfect for the Tampa Tableau User Group’s online viz contest, which officially starts with this post and concludes with our meeting at Kforce next Tuesday, 9/23. We highly encourage you to download the data from Github at https://github.com/aohmann/Tampa-Bay-TUG and then to combine it with other freely available data sources (like World Bank or US AID data…hint, hint…) and, when you have finished, publish it to Tableau Public. Mention me on Twitter @ashleyswain with the hashtag #TampaBayTUG by noon on 9/23.

Oh, and sign-up for the in-person Tampa Bay Tableau User Group! http://www.tableausoftware.com/learn/usergroups/tampa-bay-user-group/09-23-2014

We’ll have judging online, via WebEx, so everyone can join. And we’ll provide updates regularly!

 

Two Jedis + Starbucks Tweets + cookies + 45 minutes = a winning viz!

I had the great fortune to be in Seattle yesterday for the Seattle Tableau User Group. A few user groups have been organizing viz contests lately, and Jen Vaughn (@butterflystory) and I teamed up for this one, which I have reformatted slightly for the blog.

The premise was that we would have 45 minutes to create a story (in Tableau) from the data set, which was provided at the beginning of the event, and then we would present it to the audience. The data set was a week’s worth of tweets about Starbucks’ biggest hit, the Pumpkin Spice Latte (which actually doesn’t have any pumpkin in it). The story is, Starbucks has an online promo for its most loyal fans and started a Twitter account, @TheRealPSL, that apparently gets around 3,000 mentions a day. (Having tweeted about coffee perhaps more than once, I’m not in a position to judge.)

Jen and I started with a map, even though only about half of the 14K tweets had geolocators, and then we studied some other attributes in the data, like the number of followers of the users, their genders, and their retweets. You can filter the other visualizations from the map. Other than #PSL, there were very few other hashtags used in Europe or even in Hawaii, but the tweeters from Africa (who appear mostly to be male) have some clever ones. And Americans create #entire #conversations with #hashtags–they are to #millennials what #skateboards and #skinnyjeans are to #hipsters, and there actually is little meaning that one can derive from them, in this case–while this promotion probably did drive the attachment rate for the more mundane but no less delicious goods, it would be hard to analyze sentiment (which by default is good, in this case, or else people wouldn’t join the promotion) or attachment from this data set.

We did some analysis over time, too. We chose to show the trend by hour of day, which we have adjusted to be in PDT. (Twitter defaults to GMT.) It’s fairly easy to use a parameter and the DATEADD function to enable your users to adjust for their timezone, but all that does is change the numbers along the axis–the pattern of usage remains the same. And not surprisingly, most of the tweets were in the morning, no matter where the tweeters were.

If we were to spend more time on this, we would have related activity to proximity to store or marketing campaign. We did get store data (thanks, Chris Toomey!), but we ran out of time.

I’ll be donating my prize to US Hunger–check it out at http://ushunger.com. Oh, and the Tampa Tableau User Group is sponsoring an online viz contest that starts on September 15. The Chicago TUG has just started their–tweet @KKMolugu to find out more.