O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

使用大数据推动东南亚前行 (Driving Southeast Asia forward with big data)

This will be presented in English.

Feng Cheng (Grab), Edwin Law (Grab)
11:15–11:55 Friday, 2017-07-14
数据工程和架构 (Data engineering and architecture), 英文讲话 (Presented in English)
地点: 紫金大厅B(Grand Hall B) 观众水平 (Level): Non-technical
平均得分:: **...
(2.00, 2 次得分)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of ride-hailing platforms, distributed computing, SQL on Hadoop, Spark, and stream processing

您将学到什么 (What you'll learn)

Understand how Grab improved the performance, reliability and availability of its data infrastructure, migrated from Redshift to Presto and managed to reduce query running time from 30 minutes to 5 minutes at only 20% of the cost, and built a real-time big data platform with Spark Streaming and key-value storage

描述 (Description)

在东南亚,Grab位于数字与物理世界的交汇处。我们的愿景是推动东南亚交通运输的前行,并变革本地区的移动互联网生态系统。Grab带领着超过60万的司机,他们的任务就是提升东南亚各家的6亿2千万用户的出行体验和推进经济增长。这个单纯的商业计划给了我们一个巨大的机遇来使用数据从根本上完善这个过程。

大体上,Grab的目标是创建和维护一个数据驱动的文化,使用数据来解决整个公司里最困难的问题。数据工程团队的责任是搭建一个可靠的供全公司共享的数据分析平台。因此,我们在帮助不同的团队从P字节规模的数据仓库/数据湖里来发现产品和消费者的洞察时扮演了一个重要的角色。他们的应用案例包括随机查询(订单和日志等)、分析用户体验以及训练机器学习模型等。

在本议题里,Cheng Feng将介绍Grab在把它的后端办公应用进行扩展时面临的一些挑战,以及我们是如何应对这一需求的。他还会分享一些架构轨迹从Redshift变为EMR+S3的历史。在早期,Redshift是一个简单且高费效比的分析我们数据的解决方案。但随着近年来我们数据量的爆炸性增长,它就变的很贵且慢了。因此我们决定对架构做出重大改变。我们用AWS的EMR+S3做为我们的数据仓库。这一架构让我们能把计算层和数据存储层分离。也可以让多个集群共享同样的S3上的数据,而且集群可以是长时运行的,或出于灵活性的考虑而仅是临时存在的。我们的用户通常是编写Spark或是Presto的任务来进行ETL和数据分析。

主题包括:

  • Grad的分析基础设施
  • Redshift和数据湖的对比
  • Presto:背景和场景
  • EMR上的Presto
  • Grab使用Spark Streaming的应用案例


Grab is sitting at the junction of the digital and physical worlds. Its vision is to drive Southeast Asia forward and transform the way people travel and pay across the region. With more than 700,000 drivers and 36 million app downloads, the Grab app has become a platform with one of the highest usage and transaction rates for the 620 million people in SEA—and is growing every day—giving the company an incredible opportunity to perfect the way it uses data to make lives easier across SEA.

In general, Grab aims to create and sustain a data-driven culture, using data to solve the toughest problems. The Data Engineering team is responsible for building a reliable data analytics platform, playing a big role in helping different teams to gain product and consumer insights from a multipetabyte scale data warehouse. Their work ranges from supporting ad hoc queries (booking, log, etc.) to analyzing user experience and training machine-learning models.

Feng Cheng and Edwin Law explain Grab’s data architecture and offer a history of its data platform migration and stream-processing apps. Feng and Edwin describe some of the challenges the company has faced in getting its back-office applications to scale and what it’s done to meet demand. They also explore its history of architecture traces, from Redshift to EMR + S3. In the early stage, Redshift was a simple and cost-effective solution to analyze all of Grab’s data. But when data volumes grew exponentially over the last year and data processing became more complicated, the company decided to make a big change in the architecture, leveraging AWS (EMR + S3) for its data warehouse. This architecture offers many advantages, including allowing Grab to separate the computing and storage layers and allowing multiple clusters to share the same data on S3 and data analytics.

Topics include:

  • Data infrastructure at Grab
  • Redshift versus data lakes
  • Presto: Background and context
  • Presto on EMR
  • User case studies using Spark Streaming at Grab
Photo of Feng Cheng

Feng Cheng

Grab

Cheng Feng is a data engineer at Grab, where he works on the big data platform, distributed computing, stream processing, and data science. Previously, he was a data scientist at the Lazada Group, working on Lazada’s tracker, customer segmentation and recommendation systems, and fraud detection.

Edwin Law

Grab

Edwin Law was the third person and first engineer on the Data team at Grab (formerly MyTeksi and Grab Taxi), which encompasses data engineering, data science, and data analytics. Edwin leads the almost-15-member-strong Data Engineering and Database Operations teams as their engineering manager.

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site