Using Tableau 9.0’s Features: data prep and editing calculated fields

This weekend was the 60th anniversary of the polio vaccine, which removed the potentially deadly disease from the public consciousness of now three generations of children and parents in the US and most of the developed world. I thought this might be a good opportunity to explore the impact that the vaccine has had on childhood mortality worldwide, as well as play with some new features of Tableau Public 9.0 before I start training people on it next month.

I found data on polio fairly easily from the World Health Organization. Their data repository is quite comprehensive, but its format is kind of nasty. Not nasty like unstructured data can be, but its structure needs some work, like the removal of the empty row before the field headers (which isn’t so common in CSV files), and the fairly common use of time dimensions as if they were measures.

You’ve probably dealt with a lot of data sources like this:

Do you hate bad design as much as I do?

Help! I’m not a columnar data store!

So, the way I used to solve this problem was by loading the data that I wanted into SQL Server (my “work husband”) and then writing a ton of union statements to transform it into a new fact table. There’s probably an easier way than this, but it’s what I know. And I’ve gotten fast at it.

But I was super excited to see Tableau’s new Transpose feature in the data engine because–as much as I love proving my nerdy skills by writing intro-level SQL–this new data engine feature might free me from such tasks. So, I loaded up the CSV into Tableau 9.0, and the pesky empty row that I thought I could delete with the new data prep tools (which are awesome) didn’t work on my file…because it isn’t Excel. So that was annoying–I had to open the file in Excel to remove the row anyway.

But! Pivoting the data made up for it. It’s actually very easy, as you can see in this video–I highlighted the fields that I wanted, which were ALL of the years, shown below…

Wait for it...

Wait for it…

And then I right-clicked and select “Pivot”. Easy-peasy!

The fields pivoted automagically!

The fields pivoted automagically!

Right-clicking and renaming the fields is a good idea, too..

Once I got the data loaded, I set about developing a useful analysis. I was curious how the rate of polio vaccinations had changed over time, and then how it was related both to other vaccination rates and to childhood mortality. In 1988, the World Health Assembly created a new initiative to dramatically lower the rates of polio: the infection rate in 1988 was 350,000 cases in 125 countries (about 1/200 results in complete paralysis); in 2013, there were only 416 new cases, with almost half of them in Afghanistan, Pakistan, and Nigeria. (You can read more about it here.)

Those numbers speak for themselves, so I didn’t visualize them, but I did want to create a viz that allowed you to click on a country and to see the top three causes of mortality in children under five years old, as well as the change in childhood mortality and vaccination rates since 1988, which necessitated a table calculation. I love table calculations–we teach them from almost the beginning of our Tableau Desktop curriculum, because they’re incredibly useful. So, I set about adding some nice and simple ranking functions…I often customize table calculations, usually to show the percent difference from the average, so when I set about customizing a table calculation, the “Customize” button was not there…I could only edit the calc in the shelf, which is not very useful: shelf widths are limited to about 200px, and I actually really like the calculated field editing box–its new design is great. Why can’t we right-click on a table calc to customize it?)

customize button

And it’s entirely possible that I missed something during the Tableau 9.0 Roadshow that showed me how to do this more effectively. Which is a good segway back to polio…

Since polio is communicated from person to person through contaminated food or water, I added in some addition data points from the most recent set of World Bank Indicators, which I transformed using the aforementioned SQL unioning technique last year. It’s a highly informative data set, and Tableau ships a version of it, but I prefer to get to the native state, if possible, and since the other data sets in use are aggregated at the Country-Year-Metric level, blending them is very efficient.

Here’s what the dashboard looks like when you click on Afghanistan: it’s ranked #176, out of about 180 countries, with 71% of one-year-old children receiving the polio vaccine. What’s a little bit more shocking is that in a country where the US has “invested” billions of dollars, the healthcare spend per capita is only $51. That’s not much. In Norway, it’s, like, $10K. (Now that you know where Afghanistan is, go to this dashboard and click on it.) Oh, and 29% of the country has access to clean water. (Check out Rory Stewart’s documentation of life in rural Afghanistan if you want a really good feel for what things are like there.)

The trend lines for vaccination rates versus early childhood mortality in Afghanistan don’t look like they do for other countries, either: most countries keep making progress once they start. And fortunately, most countries aren’t overtaken by repressive regimes like the Taliban, which most likely accounts for the fluctuation in vaccination rates in the late 80’s and early 90’s. After their defeat in 2001/2002, it’s good to see that polio vaccination rates went from about 25% back up to 70%, which is almost a 300% increase.

How's your geography?

How’s your geography?

Other good news for Afghanistan: the childhood mortality rate has decreased about 50% since the polio initiative started in 1988, but it’s just one of many factors, which we’ll discuss more in upcoming blog posts. One in ten Afghan children don’t make it to the age of five, which is still shockingly high, but it’s better than one in five.

Check out the dashboard, shown below.

 

Personal note: my high school Latin teacher, who introduced me to Greek and Monte Python, has struggled valiantly for many years with the recurrence of polio that he contracted as a child. I really admire him, and I hope that raising awareness of the impact that a single vaccine can have will help prevent children from suffering unnecessarily.

And thanks to my mother for cultivating in her children an appreciation of our luxurious lives through literary exploration of other cultures.

What Impacts Maternal Mortality?

It’s been a while since my last post—so sorry about that! I have been working on this post for a few days, and it was prompted by a segment I saw on GMA recently about Christy Turlington-Burns’ efforts to bring awareness to maternal mortality throughout the world through her organization, Every Mother Counts. It really hit home with me—if it weren’t for great medical care, my son and I most likely would not be alive right now. (We’re so grateful!) Her work really resonated with me, and so I wanted to share it with you.

So, here’s my agenda for this post, and a few more to follow it: I want to show how easy it is to use data visualization to tell a compelling story about how cultural circumstances can have major impacts on the lives of women and children around the world. I was teaching a class last week, and when I showed a draft of this, people frowned. It’s sad stuff! No doubt about it. But it’s very real.

The first person I talked to about this was my mom. (Hi, Mom!) She’s a nurse practitioner, and she has worked in some of the very remote areas in the map below that have abysmal rates of preventable maternal deaths. I asked her—what do you think the causes are? Without hesitation, she said, “Teenagers giving birth, and the lack of skilled help during births.” That makes a lot of sense to me.

So, I got a hold of the most recent World Bank Indicators, a version of which ships with Tableau, and then spent an eon transforming it in SQL Server so that I could load only the most recent numbers for each country for the metrics in question. (More on that tomorrow!) It’s a very rich data source, and it includes economic measures that, along with literacy and health data, describes some of the living conditions in a country fairly well.

My first question is which areas of the world have higher instances of maternal death. I started with a familiar map—it’s a great way of showing disparities across the world. The countries are ranked in descending order by the likelihood that a woman will die in or after childbirth—countries with high ranks (like #1, South Sudan) are really bad places to be pregnant. (The US is in the middle…below several former Eastern Bloc countries, which is a surprise.) My friend Nelson Davis @nelsondavis recent blogged about the relationship between life expectancy and war—there have been several notable genocides and civil wars in Sub-Saharan Africa, and consequently, they are not places one should expect to live very long or in good health.

The countries are colored by percentiles (great new table calc in Tableau 8.2, along with rank) of maternal mortality rates. When you click on a country, the scatter plots below, which show correlations between the percentage of maternal deaths that are preventable and other public health measures, will highlight. The area map of our aid to those countries also filters.

The scatter plots are significant, and they prove numerically what my mother told me about the correlations between teenaged pregnancies, unattended births, and maternal mortality. I added in literacy rate—notice that it’s trend line is nearly identical to that of unattended births, though the median is a little bit lower. The relationship between percent of GDP spend on health is less significant, though the clustering is obvious—there are some outliers that I would question, like Liberia and Sierra Leone in the upper right—especially what we know about the quick spread of Ebola there recently.

Talk to me about your thoughts on this and what you think I should add in the future.

Join our Online Tableau Viz Contest!

What is an online viz contest? It’s an opportunity for you to share your work with a large community of interested people.

It’s easy to play:

  1. Go to Github (https://github.com/aohmann/Tampa-Bay-TUG) and download the data source. There actually are two, in this case—one on volcanic eruptions, and one on earthquakes.
  2. Play with the data—and join other data sources to it. Did you know that the World Bank (and several other NGOs, like USAID) have tons of data? So does the EPA—emissions data might be fun to add.
  3. Think of awesome ways to spend the $200 gift card that you could win—or, we can donate it to charity on your behalf.
  4. Use Tableau Public to visualize something interesting, something that most of us probably don’t realize but is nonetheless important.
  5. Publish your workbook in Tableau Public before noon on 9/23, and tweet the URL to @ashleyswain with the hashtag, #TampaBayTUG
  6. Register for the Tampa Bay Tableau User Group meet-up at http://www.tableausoftware.com/learn/usergroups/tampa-bay-user-group/09-23-2014. (We need your name and email so that we can track your visualization—and the votes that you get!)
  7. We’ll send you the WebEx if you can’t be there in person. (If you’re there in person, you can enjoy snacks and camaraderie.)
  8. At 5pm EDT on 9/23, login to the WebEx. Check out Jen Underwood’s presentation, and then around 5:45, we’ll give everyone who entered the contest five minutes to present. (So when you login to the WebEx, be sure that your name matches the name you used to register—otherwise we won’t know who you are…)
  9. When everyone has shown their visualizations, we’ll distribute the URL that people can use to vote (yes, you can vote for yourself…)
  10. Hang around for the next presentation, and we’ll present the winners at the end.
  11. Have fun!

Where is the Rattling and Rumbling?

I was flying out of Seattle on the way to southern California this morning, and when I looked out the window, we were rapidly approaching Mount St. Helens, which is quite beautiful and shocking. I remember reading about it as a child and am still in awe of the volcanic power of the earth. And that got me thinking–there are a lot of people who live fairly close to Mount St. Helens…and even closer to its dormant brethren, Mount Hood and Mount Rainier.

While those are humbling, I was a bit more enlightened/intrigued by the power of meteors when Neil deGrasse Tyson (@neiltyson) spoke about them in his keynote at the Tableau Conference in Seattle last week (ah, Seattle again…). A decently sized meteor could cause a tsunami that more or less wipes out the west coast of the US. (Thanks to the NOAA for reminding me who’s the boss–check out the data at http://www.ngdc.noaa.gov/hazard/)

So! I thought that some earthquake and volcano data would be perfect for the Tampa Tableau User Group’s online viz contest, which officially starts with this post and concludes with our meeting at Kforce next Tuesday, 9/23. We highly encourage you to download the data from Github at https://github.com/aohmann/Tampa-Bay-TUG and then to combine it with other freely available data sources (like World Bank or US AID data…hint, hint…) and, when you have finished, publish it to Tableau Public. Mention me on Twitter @ashleyswain with the hashtag #TampaBayTUG by noon on 9/23.

Oh, and sign-up for the in-person Tampa Bay Tableau User Group! http://www.tableausoftware.com/learn/usergroups/tampa-bay-user-group/09-23-2014

We’ll have judging online, via WebEx, so everyone can join. And we’ll provide updates regularly!

 

Two Jedis + Starbucks Tweets + cookies + 45 minutes = a winning viz!

I had the great fortune to be in Seattle yesterday for the Seattle Tableau User Group. A few user groups have been organizing viz contests lately, and Jen Vaughn (@butterflystory) and I teamed up for this one, which I have reformatted slightly for the blog.

The premise was that we would have 45 minutes to create a story (in Tableau) from the data set, which was provided at the beginning of the event, and then we would present it to the audience. The data set was a week’s worth of tweets about Starbucks’ biggest hit, the Pumpkin Spice Latte (which actually doesn’t have any pumpkin in it). The story is, Starbucks has an online promo for its most loyal fans and started a Twitter account, @TheRealPSL, that apparently gets around 3,000 mentions a day. (Having tweeted about coffee perhaps more than once, I’m not in a position to judge.)

Jen and I started with a map, even though only about half of the 14K tweets had geolocators, and then we studied some other attributes in the data, like the number of followers of the users, their genders, and their retweets. You can filter the other visualizations from the map. Other than #PSL, there were very few other hashtags used in Europe or even in Hawaii, but the tweeters from Africa (who appear mostly to be male) have some clever ones. And Americans create #entire #conversations with #hashtags–they are to #millennials what #skateboards and #skinnyjeans are to #hipsters, and there actually is little meaning that one can derive from them, in this case–while this promotion probably did drive the attachment rate for the more mundane but no less delicious goods, it would be hard to analyze sentiment (which by default is good, in this case, or else people wouldn’t join the promotion) or attachment from this data set.

We did some analysis over time, too. We chose to show the trend by hour of day, which we have adjusted to be in PDT. (Twitter defaults to GMT.) It’s fairly easy to use a parameter and the DATEADD function to enable your users to adjust for their timezone, but all that does is change the numbers along the axis–the pattern of usage remains the same. And not surprisingly, most of the tweets were in the morning, no matter where the tweeters were.

If we were to spend more time on this, we would have related activity to proximity to store or marketing campaign. We did get store data (thanks, Chris Toomey!), but we ran out of time.

I’ll be donating my prize to US Hunger–check it out at http://ushunger.com. Oh, and the Tampa Tableau User Group is sponsoring an online viz contest that starts on September 15. The Chicago TUG has just started their–tweet @KKMolugu to find out more.

 

 

Game of Thrones, in Tableau’s Story Points

I know that Game of Thrones Season 4 ended quite a while ago, but I have presented this Story Points (dashboard? story?) a couple of times to different user groups and wanted to post it to my blog for others.

I collected tweets with #GameOfThrones, #GoT, and #GoT Season4 through ScraperWiki, which no longer offers this service, for several months in 2014. You’ll notice that the tweet volume is wildly inconsistent; this is both my fault and ScraperWiki’s. Twitter rate-limited their searches for some weeks, so I am missing a fair amount of tweets. (What better prompt for me to start using the Twitter API?)

Another reason I am missing tweets actually is something I admonish people about when I am training them in Tableau: case sensitivity! Turns out that hashtags are case-sensitive, too. While I searched for #GameOfThrones, I did not search for #Gameofthrones. (Tableau Public limits me to 1mm records, so it probably would not change much in this viz.) And in the Top 5 Hashtags list for each episode, I filtered out the hashtags for which I searched, because that would be redundant, and I normalized the tweets by using UPPER. (You should avoid showing members of a dimension in all upper-case in a visualization, if you can—it looks angry and makes people think that your MDM folks are lame.)

This data set is a good candidate for Story Points in Tableau because it is sequenced over time, and there are many opportunities to comment on the causation of the fluctuation in tweets. For instance, when Mark Gattis joined the cast as a very minor character, there was an unusual spike that had nothing to do with HBO, but rather with his popularity as Mycroft Holmes, on the BBC’s epic hit, “Sherlock”. (Maybe a good topic for using the Twitter API?)

There are some things I like about Story Points—it allows me to guide the user’s navigation very carefully, and it looks great. It does require analysts to think about what their audience really needs to take away from an analysis, which is part of the vocation of data analysis that often gets lost.

I don’t like the inability to modify the appearance of the Story Points controllers, and it doesn’t write changes back to the dashboard or visualization in use. I actually did not need to use it for this dashboard, because the horizontal bar of episode numbers serves as a filter, too. Building a dashboard also was more efficient for me, probably because I have done it so many times. I’m curious what you all think about Story Points.

I’m going to use the dashboard I built previously from this data set, and not Story Points, for my upcoming client demo. Story Points is good for presentations, but it’s not an enterprise analytics tool.

(I did redact some of the spoilers in this Story, for those of you who haven’t seen Season 4–like my husband, who has been ultra-helpful setting up this blog :))

Enjoy!

Age is a number, but what does it mean?

The genesis for this blog post actually is my father. He had a birthday recently, which coincided with the announcement that he is engaged (congratulations, Dad!) to a lady who is from the country where he resides. So I was curious—how would his age compare in the country where they live? And what would his age in America be if he had been born elsewhere? And now that I’m entering middle-age, what would that look like in, for instance, Africa?

I used this World Bank data, which actually ships with Tableau, to create a multiplication factor for each country that relates its life expectancy, both for men and for women, to that of the US, and then I used a couple of parameters to allow you to select the country where you’re from—or one that holds your interest—and then input your age and gender. The labels over the countries tell you what your age would be if you lived there. If the local age is less than your current age, then their life expectancy is less than ours.

The countries are colored by the percentage of life that would be complete if you lived in a specific country. The news for our friends in Africa isn’t so good—their life expectancies are significantly shorter than ours in The Americas and in Europe. There’s a significant relationship between birth rate and life expectancy in each country. I added trend lines to the scatter plot below the map, and the relationships are logarithmic.

The birth rates translate into children per woman: for some perspective, 49 births per 1,000 in Niger translates into 7.6 births per woman; in the Netherlands, 9-ish births per 1,000 translates into 1.9 births per woman. (I found the births per woman data and will add it later this week.)

Click on a country or region to filter the scatter plot and the histogram. (I tested our friendly p-values and R2 values to confirm that this is the best model. If you love stats and Tableau, send me a message!)

Feel free to download the workbook and check out what I did with the parameters. This data is from 2010—I have found an updated data set, but it needs some major transformations; the metrics here haven’t changed significantly since published, but I will be updating it later in the week, and I plan to add more analyses of the infrastructure/educational/public health factors that contribute so such wide variations in birth rates and life expectancies over the next few weeks.

First Blog Post…ever!

It’s taken me a while to get into blogging, and as someone who’s only real athletic talents are talking and typing, my recent motivation came from the availability of a .NINJA address. (Because who doesn’t want to be a ninja, even just a little bit?)

My work focuses mostly on data visualization, and I really enjoy talking with people about user interface design in more traditional analytics applications. Good user interface design really revolves around how you like to travel around a webpage with our eyes and mouse, and what prompts you to move from one place to the next.

In fact, our Kforce team was just talking with a client yesterday about good UX design, and since it’s something that a lot of people forget when they’re designing a dashboard, I wanted to share with you all some of the basic principles of good design, along with some great resources. (And I threw in a Game of Thrones dashboard to keep things fun!)

1.      Keep the end user in mind, and add context to the dashboard accordingly.

2.     Context means making it very clear what everything is and how it relates to your user…use big fonts for titles, and label your axes. If there’s a point or area of interest, annotate it, and include metric definitions somewhere that’s easy to access.

3.     Keep it simple. Humans can retain only three pieces of information at a time, so presenting what’s most important, and then guiding the transition to detailed answers, is important.

4.     A single dashboard does not need to answer all of the questions that a user might have: it just needs to guide them through the answers to this question, and you can use multiple dashboards or visualizations to do this–what you’re building is actually an analytics user experience, not just a dashboard. It’s your data set’s elevator speech.

5.     Consistency is very important, for three big reasons: consistently designed dashboards are easier for users to navigate, they’re less distracting, and they build trust. New analytics tools or applications require a lot of change management, and a big element of successful change management is gaining the trust of your end users.

6.     Design the size of your dashboard appropriately for it’s end use. If it’s going onto a PowerPoint slide, it shouldn’t be bigger than 900×550 px. And it shouldn’t ever auto-size, unless you think it will be used primarily on mobile devices. Less is more.

The Game of Thrones dashboard below is relatively simple–it shows the geographical and chronological distribution of Game of Thrones tweets (in Spanish) after episode air time. (This is a heavily filtered data set.) I’m a big fan of histograms (thanks, SigSigma!), and this one is pretty useful–we can see that half of the tweets were made within 32 hours of air time, and we can also see how usage patterns vary across the southeastern US and Latin America. (I’m not the only one who can’t stay up past ten on a Sunday night and watches it the next day!)

I do have a pretty awesome Story Points dashboard that tells the whole story of Season 4…definitely has some spoiler alerts 🙂

Check out this viz: