O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

Apache Kylin 2.0:从Hadoop上的OLAP 引擎到实时数据仓库 (Apache Kylin 2.0: From an OLAP engine on Hadoop to a real-time data warehouse)

此演讲使用中文 (This will be presented in Chinese)

Dong Li (Kyligence)
11:15–11:55 Saturday, 2017-07-15
数据工程和架构 (Data engineering and architecture)
地点: 紫金大厅B(Grand Hall B) 观众水平 (Level): 中级 (Intermediate)
平均得分:: ****.
(4.00, 1 次得分)

必要预备知识 (Prerequisite Knowledge)

了解Hadoop基本原理,了解OLAP基础知识

您将学到什么 (What you'll learn)

对以Kylin为代表的预处理数据类数据引擎的理解,启发观众从不同的角度思考如何面对不断升级的规模数据的挑战

描述 (Description)

Apache Kylin v2.0即将发布!作为领先的大数据OLAP分析引擎,现在的Apache Kylin羽翼更丰:支持雪花模型、更加全面的SQL语法、初出茅庐的Spark Cubing、更好地支持实时流式数据接入等等。Apache Kylin正逐渐从一个Hadoop上的传统OLAP平台,演变为一个Hadoop上的实时数据仓库。本演讲将介绍Apache Kylin v2.0带来的最新特性,以及它们背后的技术架构和设计理念:

自v1.5起,Apache Kylin就支持通过micro-batch加载Kafka数据,实现了分钟级的准实时分析。到v2.0,Apache Kylin对Kafka数据源的支持更加稳定和友好,用户可以在同一个平台对流式数据和历史数据进行分析。

在过去,Apache Kylin只支持星型数据模型,给部分应用带来了局限。从v2.0开始,Apache Kylin将支持雪花模型,用户无需进行模型转换,就可以直接按现有数据模型在Kylin中建模,这使得Kylin可以更容易地应用在复杂案例当中。

预计算类分析平台意味着离线的数据预处理过程。对Apache Kylin而言,这个过程就是Cube的构建(Cubing),我们尝试使用Spark对现有的构建引擎进行大幅改进,并且收获了不错的初期结果。

Apache Kylin对SQL语法的支持也在不断改进,如支持时间函数、窗口函数、百分位等复杂函数。这些改进的需求起源于社区,也最终由社区的力量推动而实现。

区别于其他的SQL on Hadoop技术,Apache Kylin始终专注于尽量使用离线预计算替代在线计算。在这个数据规模日益激增的时代,如果希望以稳定的性能面对各类规模的数据挑战,Apache Kylin或许才是你的首选!


Apache Kylin v2.0 is coming soon. Li Dong explains how Apache Kylin is evolving from a traditional OLAP platform on Hadoop to a real-time data warehouse on Hadoop and outlines the new features of Apache Kylin v2.0, the technical architecture, and the design principles behind them. Apache Kylin v2.0 will support snowflake schema, more comprehensive SQL syntax, and the new Spark cubing and offer better support for real-time streaming data.

Since v1.5, Apache Kylin has supported the loading of Kafka data via microbatch, enabling semi-real-time analysis on a minute level. In v2.0, Apache Kylin’s support of Kafka data sources is more stable and user friendly, allowing users to analyze streaming data and historical data on the same platform.

In the past, Apache Kylin only supported star data schema, limiting the applications you could use. Starting with v2.0, Apache Kylin will support snowflake schema, which allows users to do data modeling directly in Kylin without a model conversion, allowing Kylin to be easily applied into complex cases. Precomputed types of analysis platforms normally require offline data preprocessing. Unlike other SQL on Hadoop technologies, Apache Kylin has always focused on using offline precomputing as an alternative to online computing. Apache Kylin’s process is cube building (cubing). Apache Kylin’s support for SQL syntax is also improving, with better support for time functions, window functions, percentiles, and other complex functions. The requirements for these improvements originated in and were ultimately driven by the community.

In the era of data deluges, Apache Kylin should be your first choice to handle various types of data challenges with stable performance.

Photo of Dong Li

Dong Li

Kyligence

Kyligence Inc技术合伙人兼高级软件架构师,Apache Kylin Committer & PMC Member,专注于大数据技术研发,KyBot技术负责人。毕业于上海交通大学计算机系;曾任eBay全球分析基础架构部高级工程师、微软云计算和企业产品部软件开发工程师;曾是微软商业产品Dynamics亚太团队核心成员,参与开发了新一代基于云端的ERP解决方案。

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site