Many of the speakers at FutureStack14, New Relic’s annual technology and user conference in San Francisco on October 8th and 9th, will talk about how they’re changing the world with software and data. Sometimes those effects can be hard to see, however, which is why Chris Whong’s work with data visualization and civic technology casts such a bright light. Perhaps his best-known effort is NYC Taxis: A Day in the Life, which maps the wanderings of a New York taxi over a single day. Chris also maintains chriswhong.com, where he hosts some of his other experiments with data visualization, including an video where visitors can track 32,000 trips on shared bikes in the nation’s capital.
For his day job, Chris is a data solutions architect with Socrata, a “data experience company” focused on helping public sector organizations improve transparency, citizen service, and data-driven decision-making. Before joining Socrata, Chris managed data at the transit information app developer Roadify; he was also the founder of Charm City Networks, a Baltimore-based IT services company.
We talked with Chris about his background and what he’ll be discussing in his 5 p.m. Wednesday talk at FutureStack14, entitled “Visualizing Civic Data: A Day in the Life of a NYC Taxi.”
When did you first realize you were a data nerd?
I’ve always been a technologist and always dabbled in Web development and a little programming here and there. But in 2011, when I started grad school in urban planning at NYU, I found myself sitting in front of GIS (geographic information system) software for the first time in a decade. I realized there is so much more data available now than ever before, for whatever civic issue you’re interested in. I call GIS “the gateway drug” that got me into processing, Web mapping, d3, and other visualization tools.
You’ve been combining data collection and analysis with geography for a long time. What makes these two fields particularly well matched?
Especially in cities, location matters. Whether you’re studying property values, crime, transportation, public health, construction, or whatever, the spatial component is always critical. Maps are a great way to convey that information in a way people can relate to.
What do you plan to talk about at FS14?
I’ll be talking about the tools and techniques I’ve used to turn data into meaningful experiences. Most data visualization projects require a lot of “data munging” to clean, parse, scrape, aggregate, or otherwise prep the data. Once it’s ready for consumption, there are a variety of ways to bring the data to life. I will be telling the story of how I came to be in possession of data for 173 Million Cab Rides via a Freedom of Information Act request, how I shared the data, where it went once I shared it, and what I chose to build with it.
What’s an example of how you’ve had to “munge” data to prepare it for visualization?
When I made a visualization of MTA Turnstile Data, I had to do quite a bit of munging. First, the format of the published data shows four-hour chunks of data for each set of turnstiles. Some stations contain many turnstile sets, so I had to combine records for the same station to get an accurate count spread out over time.
Next, in order to map it, I had to geocode each station ID. A friend and I spent a few hours manually doing this, as there was no “answer key” associating the station identifiers with latitude/longitude coordinates. I published the results on GitHub, and lots of people still use it when working with this data.
Why are you interested in speaking at FutureStack14?
I was flattered that they reached out to me, and the audience and venue looked great. I was up for the challenge (this is probably the biggest audience I’ve ever presented in front of).