In-Person Training
Apache Spark高级实践和原理解析 (Apache Spark advanced practice and principles)

Carson Wang (Intel), Yucai Yu (Intel), Zhichao Li (Intel), Yiheng Wang (Intel), Daoyuan Wang (Intel)
09:00–17:00 Wednesday, 2017-07-12
Location: 多功能厅3B(Function Room 3B) 观众水平 (Level): 中级 (Intermediate)
Participants should plan to attend both days of this 2-day training course.

这几年随着大数据分析和机器学习等等在工业界中越来越广泛的应用,越来越多的人选择在大数据平台比如Apache Spark之上构建大规模数据处理、分析和机器学习,以便利用大量原始数据和扩展架构。如何深入理解大数据关键技术并更好的运用它们?本次课程将结合当前大数据技术的浪潮和趋势,为您介绍Apache Spark的高级实践和原理解析,帮助您加深领会Apache Spark的精华设计思想,以及如何与流式分析、机器学习,深度学习等紧密结合,在数据采集,分析处理,特征提取,机器学习等方面提供一致性和集成性的高级实践。

Hardware and/or installation requirements:

  • A laptop with Java (7 or above) and Scala (2.10.4 or above) installed

Apache Spark 揭秘

  • Spark 设计揭秘
  • Spark shuffle
  • Spark memory management

Spark SQL 揭秘

  • 什么是Spark SQL
  • Spark SQL 特性
  • Spark SQL 工作原理

Spark Streaming

  • 流数据
  • Spark Streaming 设计原理
  • 如何实现高可用性

Machine learning on Spark

  • Scaling out ML algorithms on Spark
  • End-to-end machine-learning pipelines

Deep learning on Spark using BigDL

  • BigDL概述
  • 在Spark上如何使用BigDL

本次课程是针对Spark中高级用户的,对SPARK深度的原理解析和高级实践分享,希望您具备一定的Spark 知识技能,并对流式分析及机器学习有一定的兴趣或了解。最后希望此课程能帮助您加速Spark分析以及机器学习的实践,为您的数据科学学习和工作开启一页新篇章。

As big data analysis and machine learning become more widely used in the industry, more and more people are choosing a big data platform such as Apache Spark to build large-scale data processing, analysis, and machine learning to take advantage of the large amount of raw data and an extended architecture. But how can you further understand key big data technology and make better use of it?

Carson Wang, 俞育才, Zhichao Li, Yiheng Wang, and Daoyuan Wang explore the tide and trends of big data technologies and offer an overview of Apache Spark’s advanced practice and principles to help you understand the essence of Apache Spark’s design ideas and how to closely integrate with streaming analytics, machine learning, and deep learning and provide consistency and integration of high-level practice in data acquisition, analysis and processing, feature extraction, machine learning, and so on.


Apache Spark internals

  • Spark internal overview
  • Spark shuffle
  • Spark memory management

Spark SQL

  • What’s Spark SQL?
  • Spark SQL features
  • How does Spark SQL work?

Spark Streaming

  • Streaming data
  • Spark Streaming design
  • How to implement high availability

Machine learning on Spark

  • Scaling out ML algorithms on Spark
  • End-to-end machine-learning pipelines

Deep learning on Spark using BigDL

  • Introduction to BigDL
  • How to use BigDL on Spark

About your instructors

Photo of Carson Wang

Carson Wang is a big data software engineer at Intel, focusing on developing and improving new big data technologies. He is an active open source contributor to the Spark and Alluxio projects. Prior to Intel, Carson was an engineer at Microsoft working on cloud computing technologies.

Photo of Yucai  Yu


Photo of Zhichao Li

利智超来自于Intel大数据技术团队,专注于大数据分析领域, Spark contributor。他的同事和他致力于在Apache Spark平台上开发分布式机器学习算法,以满足大数据背景下的机器学习需求。他还为这些分布式机器学习算法在Intel平台上进行优化,以及帮助Intel的客户为他们的业务开发大数据分析程序。

Photo of Yiheng Wang

Yiheng Wang is a software development engineer on the Big Data Technology team at Intel working in the area of big data analytics. Yiheng and his colleagues are developing and optimizing distributed machine learning algorithms (e.g., neural network and logistic regression) on Apache Spark. He also helps Intel customers build and optimize their big data analytics applications.

Photo of Daoyuan Wang

王道远,英特尔亚太研发有限公司资深软件工程师,自2014年起参与Spark SQL开发,是Apache Spark开源社区的活跃贡献者。在参与Spark开发之前,他参与了IDH版本Hive的开发。译有《Spark快速大数据分析》一书。


