使用R和Apache Spark处理大规模数据 (Scaling R faster and larger using Apache Spark)

此演讲使用中文 (This will be presented in Chinese)

Xiaoyong Zhu (Microsoft)
11:15–11:55 Friday, 2017-07-14
数据科学&高级分析 (Data science & advanced analytics)
地点: 多功能厅5B+C(Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)
必要预备知识 (Prerequisite Knowledge)

A basic understanding of R, Spark, and machine learning

您将学到什么 (What you'll learn)

Learn how to use R to analyze terabytes of data

描述 (Description)


- 我们会介绍微软R服务器的设计原则和架构,以及它和Apache Spark的集成。

- 演示如何使用R服务器来进行在Apache Spark上的可扩展的机器学习,以及使用R语言来分析T字节级数据。

R is a popular data science tool for data analysis. However, it has many drawbacks, such as its memory utilization and single-thread design, that limit its usage for big data analysis. Xiaoyong Zhu explains how to use R to analyze terabytes of data, covering the design principles and the architecture of Microsoft R Server and its integration with Apache Spark and leading a demo on how to utilize it to perform scalable machine learning on top of Apache Spark.

Xiaoyong Zhu


Xiaoyong Zhu is a program manager at Microsoft focusing on scalable machine learning and advanced analytics.



