We’ve created an updated version of this blog post with much more detail. – Editor
At Strata+Hadoop World, James Burkhart, technical lead on real-time data infrastructure at Uber, shared how Uber supports millions of analytical queries daily across real-time data with Apollo, Uber’s internal analytics querying language.
James covers architectural decisions and lessons learned from building an exactly-once ingest pipeline that captures raw events across in-memory row storage and on-disk columnar storage. He also details how Uber uses a custom metalanguage and query layer by leveraging partial OLAP result set caching and query canonicalization. Putting all the pieces together provides thousands of Uber employees with subsecond p95 latency analytical queries spanning hundreds of millions of recent events.
Video and Slides:
About James
James Burkhart, is the technical lead on real-time data infrastructure at Uber. James has a strong background in time series data storage, processing, and retrieval. Previously, he worked on Blueflood, a time series database on top of Cassandra, while at Rackspace.
Additional Resources
Uber Engineering Blog: eng.uber.com
Uber Open Source: uber.github.com
Uber Eng Twitter: twitter.com/ubereng
Uber Slides: http://msql.co/uberscale