“This is the best talk I’ve attended in over a year.”- Harrison Schramm
You may know Harrison Schramm from his “5 Minute Analyst” articles and blog posts, and when he isn’t thinking of the cost of the Death Star or solving the logistics problems of Harry Potter, he also is one of CANA Advisors’ Principal Operations Research Analysts. Recently he had the opportunity to go to a FiveThirtyEight Talk on Telling Stories (at the RStudio::conf ). In his words, Harrison said, “[t]his is the best talk I’ve attended in over a year.” In a change of pace from writing a blog post or article on the talk, we asked Harrison if he would share his notes on the event, and he was kind enough to pass them along. We hope these notes spark your interest in not just the ‘how’ but the ‘why’ of statistical analysis.
****From the Event Notebook of Harrison Schramm****
Data Journalism Principles:
Story leads data follows use rigorous but interminable methods: Be accurate, Be fast, and Be transparent.
Useful tools for R.
tidyverse is the tool of choice for data. (The tidyverse is a set of packages that work in harmony because they share common data representations and API design. https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/)
In the interest of transparency, FiveThityEight has created an R package. (Nate Silver’s FiveThirtyEight uses statistical analysis — hard numbers — to tell compelling stories about politics, sports, science, economics and culture. https://github.com/fivethirtyeight). For example, if you would like to see a breakdown of Avengers Characters by longevity and gender, you can do the following:
Install.packages(“fivethirtyeight”)
Library(ggplot2); library(magrittr); library(“fivethirtyeight”)
avengers %>% ggplot(aes(factor(death1), years_since_joining)) + geom_violin() + facet_wrap(~gender) + xlab("Currently Living?") + ylab("Years Since Joining") + ggtitle("Avengers Characters Violin Plot - Status vs. Years")
The Six Types of Data Stories
Novelty
Outlier
Archetype
Trend
Debunking
Forecast
Novelty Data Story: Basic questions are first.
New Data Story danger: Triviality
Remedy: Simple Summaries
Ask yourself: Is this data meaningful to others?
Outlier Stories
Danger: Spurious Result
Tactic: Characters - talk about who the outlier is: who is it, what company is it, etc.
Profile one of the characters from the outlier group, then introduce the statistics
Ask yourself: Is this really so different?
Archetype Stories
Danger: Oversimplification
Tactic: Modeling
Ask Yourself: What Variables am I leaving out?
Trend
Trends: Terrorism overall declining in the EU, but religiously inspired attacks rising.
Done using dplyr, data %>% group_by %>% summarize %>% ggplot
Danger: Variance - regression to the mean
Tactic: Be Conservative
Ask yourself: Is this signal or noise?
Fun Quote: If you can always tell a valid trend, you should be trading on wall street, not telling data stories
Debunking
Bechdel test: Examines how women are portrayed in movies. 1. Are there 2 or more women, 2. Do they talk to each other, 3. Do they talk to each other about something other than men?
Danger: Confirmation Bias - your own belief in the debunking action.
Tactic: Showcase Failures
Ask Yourself: How much do I want to debunk this?
Quote about p-hacking: Warning: This is evil (statistical) work. Do not go to the dark side. Do not try this at home. Note: You can read Harrison’s piece on P-hacking appearing in OR/MS Today here: https://www.informs.org/ORMS-Today/Public-Articles/June-Volume-43-Number-3/P-value-Primer-P-OR-P-values-in-operations-research-M-N-O-P-Q-R-S-T
Example of p-hacking: Eating potato chips leads to higher SAT Math scores.
Forecast (You work a narrow path here)
Danger: Overfitting
Tactic: Simulations and scenarios
Ask Yourself: Am I properly conveying the uncertainty in my model?
We hope these notes from Harrison Schramm on R and how to use it to tell a story with your statistical and analytical data is useful.
Follow Harrison (@5MinuteAnalyst on twitter) and the rest of the CANA Advisors’ Team (@CANAADVISORS on Facebook and twitter) for more insight, blog posts and articles devolving into data, logistics and analytics in creative and helpful ways.
Other interesting CANA Articles on R:
Blog Article: Document Preparation... in R? http://www.canallc.com/single-post/2016/09/02/Document-Preparation-in-R
Blog Article: Notes on The Seven Pillars of Statistical Wisdom http://www.canallc.com/single-post/2016/09/16/Notes-on-The-Seven-Pillars-of-Statistical-Wisdom