O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

列式存储在Uber (Columnar storage at Uber)

此演讲使用中文 (This will be presented in Chinese)

Zhenxiao Luo (Uber)
14:50–15:30 Saturday, 2017-07-15
数据工程和架构 (Data engineering and architecture)
地点: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): 非技术性 (Non-technical)

您将学到什么 (What you'll learn)

Learn columnar storage, concepts, techniques, and query optimizations

描述 (Description)

伴随着Uber的持续成长,我们的大数据系统需要在扩展性、稳定性和性能等方面进一步发展,从而能帮助Uber做出商业决策、提供推荐给用户以及分析来自多个数据源的试验。

在Uber,我们的Hadoop数据仓库使用列式存储、Parquet作为默认的文件格式。我们使用Presto作为交互式的查询引擎,使用Hive和Spark作为批处理引擎。我们还开发了不少在这些查询引擎上针对列式存储的性能优化器,从而为我们的用户带来了非常好的性能。

在本演讲里,我们会介绍我们在列式存储上所做的工程努力,包括嵌套式列裁剪、谓词下推、字典下推、列式读取和懒读取等。我们的基准测试结果显示我们对于所有的使用列式存储的查询引擎都能获得超过5倍的性能提升。我们也愿意分享我们在列式存储上的经验。


As Uber continues to grow, its big data systems must also grow in scalability, reliability, and performance to help Uber make business decisions, give user recommendations, and analyze experiments across all data sources. Zhenxiao Luo shares his experience running columnar storage in production at Uber and discusses query optimization techniques in SQL engines.

Uber’s Hadoop warehouse uses columnar storage with Parquet as the default file format, Presto as its interactive query engine, and Hive and Spark as the batch engines. Zhenxiao explains how Uber developed a number of performance optimizations for columnar storage in all of these query engines to achieve much better performance for customers, including nested column pruning, predicate pushdown, dictionary pushdown, columnar reads, and lazy reads, achieving a more than 5x performance improvement in all query engines.

Photo of Zhenxiao Luo

Zhenxiao Luo

Uber

Zhenxiao Luo is a software engineer at Uber working on Presto and Parquet. Previously, he led the development and operations of Presto at Netflix and worked on big data and Hadoop-related projects at Facebook, Cloudera, and Vertica. He holds a master’s degree from the University of Wisconsin-Madison and a bachelor’s degree from Fudan University.

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site