O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

使用R和Apache Spark处理大规模数据 (Scaling R faster and larger using Apache Spark)

此演讲使用中文 (This will be presented in Chinese)

Xiaoyong Zhu (Microsoft)
11:15–11:55 Friday, 2017-07-14
数据科学&高级分析 (Data science & advanced analytics)
地点: 多功能厅5B+C(Function Room 5B+C) 观众水平 (Level): 中级 (Intermediate)
平均得分:: *****
(5.00, 1 次得分)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of R, Spark, and machine learning

您将学到什么 (What you'll learn)

Learn how to use R to analyze terabytes of data

描述 (Description)

R是一个流行的用于数据分析的数据科学工具。然而它有不少的缺陷,比如它的内存使用问题以及单线程的设计。本演讲:

- 我们会介绍微软R服务器的设计原则和架构,以及它和Apache Spark的集成。

- 演示如何使用R服务器来进行在Apache Spark上的可扩展的机器学习,以及使用R语言来分析T字节级数据。


R is a popular data science tool for data analysis. However, it has many drawbacks, such as its memory utilization and single-thread design, that limit its usage for big data analysis. Xiaoyong Zhu explains how to use R to analyze terabytes of data, covering the design principles and the architecture of Microsoft R Server and its integration with Apache Spark and leading a demo on how to utilize it to perform scalable machine learning on top of Apache Spark.

Photo of Xiaoyong Zhu

Xiaoyong Zhu

Microsoft

Xiaoyong Zhu is a program manager at Microsoft focusing on scalable machine learning and advanced analytics.

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site