O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

使用BigDL在Apache Spark上进行大规模分布式深度学习 (Distributed deep learning at scale on Apache Spark with BigDL)

此演讲使用中文 (This will be presented in Chinese)

Zhichao Li (Intel), Shengsheng Huang (Intel), Yiheng Wang (Intel)
14:00–14:40 Friday, 2017-07-14
AI应用 (AI applications)
地点: 报告厅(Auditorium) 观众水平 (Level): 中级 ()
平均得分:: ****.
(4.33, 3 次得分)

必要预备知识 (Prerequisite Knowledge)

A basic understanding of deep learning, Spark, Scala, and Python

描述 (Description)

BigDL是基于Apache Spark的开源分布式深度学习框架(https://github.com/intel-analytics/BigDL)。它为Spark提供了深入学习功能的原生支持,同时为现成的使用单节点志强Xeon CPU的开源深度学习框架(如Caffeh和Torch)带来了数量级的性能速度提升,并为它们提供了基于Spark架构的对深度学习任务的高效的水平扩展的能力;此外,它还允许数据科学家使用熟悉的工具(包括Python和Notebook等)来对大数据进行分布式深度学习分析。

在这次演讲中,我们将演示大数据用户和数据科学家如何使用BigDL以分布式方式对海量数据进行深度学习分析(如图像识别、对象检测、NLP等)。这可以让他们使用已有的大数据集群(例如Apache Hadoop和Spark)来作为数据存储、数据处理和挖掘、特征工程、传统的(非深度)机器学习和深度学习工作负载的统一数据分析平台。

此外,我们还将提供与现有深入学习框架(如PyCaffe和Tensorflow)有相似风格的培训和推断的Python API。大家会觉得使用BigDL Python API开发深入学习应用程序非常得简单直接。BigDL还提供了丰富的可视化功能,让用户能理解、监控、检查和操做其模型和处理过程。我们将通过实例演示这些便利性。


BigDL, an open source distributed deep learning framework for Apache Spark, brings native support for deep learning functionalities to Spark, providing an orders-of-magnitude speedup over out-of-the-box open source DL frameworks, such as Caffe, Torch, or TensorFlow, with regard to single node Xeon performance, and efficiently scales out deep learning workloads based on the Spark architecture; in addition, it allows data scientists to perform distributed deep learning analysis on big data using familiar tools, including Python.

Zhichao Li, Shengsheng Huang, and Yiheng Wanghow explore how data scientists have adopted BigDL for deep learning analysis on large amounts of data in a distributed fashion, allowing them to use their big data cluster as a unified data analytics platform for data storage, data processing and mining, feature engineering, traditional (non-deep) machine learning, and deep learning workloads.

Zhichao, Shengsheng, and Yiheng also share a Python API that is similar to existing deep learning frameworks such as PyCaffe and TensorFlow. Using this API, it’s easy and straightforward to use BigDL to develop deep learning applications. They conclude by demonstrating how BigDL provides rich visualizations for users to understand, monitor, inspect, and manipulate their models and processes.

Photo of Zhichao Li

Zhichao Li

Intel

利智超来自于Intel大数据技术团队,专注于大数据分析领域, Spark contributor。他的同事和他致力于在Apache Spark平台上开发分布式机器学习算法,以满足大数据背景下的机器学习需求。他还为这些分布式机器学习算法在Intel平台上进行优化,以及帮助Intel的客户为他们的业务开发大数据分析程序。

Photo of Shengsheng Huang

Shengsheng Huang

Intel

Shengsheng (Shane) Huang is a software architect at Intel and an Apache Spark committer and PMC member, leading the development of large-scale analytical applications and infrastructure on Spark in Intel. Her area of focus is big data and distributed machine learning, especially deep (convolutional) neural networks. Previously at NUS (National University of Singapore), her research interests are large-scale vision data analysis and statistical machine learning.

Photo of Yiheng Wang

Yiheng Wang

Intel

Yiheng Wang is a software development engineer on the Big Data Technology team at Intel who works in the area of big data analytics. He and his colleagues are developing and optimizing distributed machine learning algorithms (e.g., neural network and logistic regression) on Apache Spark. He also helps Intel customers build and optimize their big data analytics applications.

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site