O'Reilly、Cloudera 主办
Make Data Work

使用Alluxio助力机器人云 (Powering robotics clouds with Alluxio)

此演讲使用中文 (This will be presented in Chinese)

Shaoshan Liu (PerceptIn)
14:50–15:30 Saturday, 2017-07-15
AI应用 (AI applications)
地点: 报告厅(Auditorium) 观众水平 (Level): 中级 (Intermediate)

必要预备知识 (Prerequisite Knowledge)

A general understanding of AI and distributed computing

您将学到什么 (What you'll learn)

Understand how how PerceptIn designed and implemented a cloud architecture to support these emerging user requirements using Alluxio

描述 (Description)



• 在线对象识别:当视频流进入时,云端需要能实时地抽取对象标签,并在发现异常对象时发送报警。
• 视频直播:云端需要能支持按需的视频直播,方便用户远程实时地使用机器人巡视他或她的家。
• 存储:视频流需要被存在云端,并加上时间戳、位置和抽取出的对象标签。
• 搜索和视频回放:使用抽取的对象标签、位置和时间戳作为搜索键值,云端可以让用户快速地搜索目标视频段。




• PerceptIn的客户端设备:这些设备可以获取视频流,并加上<sessionID, timestamp, location>元数据后将视频流上传到云端。
• PerceptIn的流服务器:发送用户请求的视频流给用户。
• 对象识别:从每个到达的视频流流中抽取对象标签。
• KV(键值)存储:管理视频流的键值存储。每段视频流的键按照<sessionID, timestamp, duration, location, >的格式建立。
• 搜索引擎:这个引擎会支持用户的视频搜索。用户可以使用对时间、地点、抽取的对象标签的任意组合来搜索视频流。比如,我可以搜2017年1月1日到1月2日之间的,在我的卧室的,包含狗的视频段。
• 业务分析引擎:这个引擎针对所有的视频数据生成高层次的统计信息。 例如我们可能会有兴趣想知道客厅里最常见的对象是什么。
• Alluxio:Alluxio提供了这个架构能成功的两个关键特性。首先,它提供了支持快速视屏搜索的高吞吐和低延迟的能力。另外,它提供了对很多流行的存储系统(包括S3、GCS、Swift、HDFS、OSS、ClusterFS和NFS)的一个统一的命名空间。


为了更好地理解Alluxio是如何帮助这个业务场景的,我们会首先比较Alluxio和原生的文件系统的写的速度。这个指标非常关键,因为它决定了我们能以多快的速度把视频流写入存储系统。如果写得特别慢,那么存储层就可能成为整个多媒体数据管道的瓶颈。使用Alluxio,我们能很容易的获得超过 650MB/s的写速度,而使用原生的文件系统,我们得到速度是120MB/s。因此使用Alluxio在这个场景下可以获得至少5倍的速度提升。





The rise of robotics applications demands new cloud architectures that deliver high throughput and low latency. Shaoshan Liu explains how PerceptIn designed and implemented a cloud architecture to support these emerging user requirements using Alluxio.

Shaoshun discusses in-home service robots that act as surveillance robots, which require the following features from the cloud:

  • Online object detection—as video feeds come in, the cloud extracts object labels in real time and sends an alarm when an abnormal object is detected.
  • Video streaming—the cloud supports on-demand video streaming for the user to remotely patrol their home in real time.
  • Storage—the video feeds are stored in the cloud with time-stamp, location, and extracted labels.
  • Search and video playback—using extracted labels, location, and time stamps as keys, the cloud enables users to quickly retrieve target videos.

These requirements necessitate a storage layer that can handle enormous amount of incoming data, which may end up in different storage systems (including S3, GCS, Swift, HDFS, OSS, GlusterFS, and NFS). Also, when writing and retrieving video feeds, the storage layer must provide high throughout and low latency.

To fulfill these requirements, PerceptIn designed and implemented a cloud architecture consisting of the following components:

  • PerceptIn Client Devices—capture video feeds and send the video feeds to the cloud along with their metadata <sessionID, timestamp, location>.
  • PerceptIn streaming server—streams on-demand live video feeds to users.
  • Object recognition—extracts object labels from each incoming video.
  • KV store—for organizing the video feeds; the key to each video has the following format: sessionID, timestamp, duration, location, and list of labels.
  • Query engine—supports retrieval of video feeds; users can search using any combination of time, location, as well as extracted labels. For example, you could search for videos between 1/1/2017 and 1/2/2017 located in your bedroom that contain the object “dog.”
  • Business analytics engine—generates high-level statistics of all video data (for example, the most common objects that appear in living rooms).
  • Alluxio—provides two key features that are critical to the success of this architecture. First, it provides high throughput and low latency to support fast retrieval of video feeds. Second, it provides a unified namespace to support many popular storage systems, including S3, GCS, Swift, HDFS, OSS, GlusterFS, and NFS.

Alluxio enables more than 650 MB/s throughput whereas the native filesystem only achieves 120 MB/s (a 5x increase). This throughput is critical as it determines how fast you can write a video feed to storage. If the throughput is too low, then the storage layer may become the bottleneck of the whole multimedia data pipeline.

Alluxio also supports fast retrieval: with Alluxio, you can retrieve a video within 500 milliseconds. However, when the video is stored in remote machines, the latency can be as high as 20 seconds. Using Alluxio to buffer “hot” video data could reduce retrieval latencies by as many as 40 folds.

In addition, different users demand different persistent storage underlying Alluxio: some may use HDFS; others may use S3. Without Alluxio, PerceptIn would have to manage multiple interfaces, one for each persistent storage. With Alluxio’s unified namespace, PerceptIn only has to maintain one major interface while supporting many different underlying storages.

Photo of Shaoshan Liu

Shaoshan Liu


Shaoshan Liu is the cofounder and president of PerceptIn, a company working on developing a next-generation robotics platform. Previously, he worked on autonomous driving and deep learning infrastructure at Baidu USA. Shaoshan holds a PhD in computer engineering from the University of California, Irvine.



WeChat QRcode


Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2


ORB Data Site