At CANA Advisors, we use R daily, for everything from exploratory analysis, to generating publication-quality documents with R Markdown and interactive web apps with Shiny. In January, I made the trek down to Austin, TX to attend the RStudio conference. This was the largest RStudio conference to date, buzzing with nearly 2,000 attendees. Here a few reflections from those two enlightening days.
R in Production: Not only possible, but easy! The first keynote speaker, Joe Cheng, broke down how the RStudio team is working to make R in production effortless for R programmers with little to no experience in web development. New tools like Shiny load test, plot caching, and async are now available to supplement old standbys RStudio Connect and Profvis. Throughout his presentation, Cheng stressed the importance of addressing the cultural and organizational barriers to scaling R. The ability to swiftly take analysis from an exploration in R to production presents a new role for data scientists, one that must be taken mindfully, respecting and relying on the expertise of IT and engineering teammates.
The theme of production carried throughout the conference, with several presentations on the topic. Of note was a presentation from a team at T-Mobile, who shared a familiar story with the audience: presenting a Shiny application to high-level leadership gave credibility to their project, sparked interest, and eventually earned them additional resources to continue their work. From there, their engineering and data science teams worked together to put a Keras model into production, which is now responding to customer requests in real time. This is one of the most robust R models I’ve seen in terms of scale— seeing how they overcame technical and cultural barriers to create a fast, compact app was incredibly valuable.
Defining Data Science In the second keynote talk of the conference, Felienne Herman, shared her joys and challenges in studying how people learn programming languages. Although this may be changing for the next generation, the majority of practicing data scientists do not have a shared memory of what it looks like to learn the tools of our trade. In part, this may be because data science programs didn’t exist until a few years ago, but the issue is even more fundamental. The view that learning how to code is exploratory, done individually with great struggle, is very common, because that is how so many of us learned how to code. What if, as with math or reading, there is worth in structured education for this line of work?
Data science is supposedly the “sexiest job of the 21st century.” Despite all of the hype surrounding this career, amongst the analytics community, there is little consensus on what actually defines a data scientist, apart from a high salary. As operations research (OR) analysts, we see great overlap between our two disciplines. OR can certainly be considered a necessary predecessor of data science. While at the RStudio conference, I was surprised to see how many presenters and conference attendees, from a variety of disciplines, identified as data scientists. To an extent, it feels as if “data scientist” is an attribute that we can append to those that have mastered modern analytics techniques, whether they be biologists, geographers, or operations research analysts. Angela Bassa gave an inspiring talk about growing data science teams and taking advantage of the unique strengths individuals may bring to an organization. A panel discussion with industry leaders also addressed this topic with a conversation about how to manage a diverse set of team members.
Strength in Community
As with most R events, Hadley Wickham opened the conference with a Code of Conduct, reinforcing the importance of inclusivity and diversity. The “Pacman Rule” of always leaving space for a new face to join a conversation was honored throughout the conference. On the final evening, I connected with other RLadies at a social event. I picked up great tips on successful events from folks at different chapters from around the globe to take back to my own chapter. This event continued a theme of openness and inclusivity felt throughout the conference.
PC: JD Long
Interested in learning more? Many of these talks are available online. Check out https://resources.rstudio.com/rstudio-conf-2019 to experience the conference.
Lucia Darrow is an Operations Research Analyst at CANA Advisors. To find more content on our favorite professional events, continue to visit our CANA Connection.