HBase多数据中心方案及未来的增量备份功能介绍 (HBase as a multiple-data-center solution and its future incremental backup function)

此演讲使用中文 (This will be presented in Chinese)

Biao Chen (Cloudera)
14:00–14:40 Saturday, 2017-07-15
Hadoop内核&发展 (Hadoop internals & development)
地点: 多功能厅2(Function Room 2) 观众水平 (Level): 中级 (Intermediate)

必要预备知识 (Prerequisite Knowledge)

A working knowledge of HBase

您将学到什么 (What you'll learn)


在本节中将介绍未来在HBase 2.0的Backup特性,其提供的增量备份方案,避免了现有技术对全表数据的扫描,大大提高了备份性能,同时又提供了repica不具备的一致性。本节包括架构、内部原理刨析、对于多数据方案的重要性和使用介绍。

Hadoop technology can’t be completely integrated into a core business service system, due to its lack of a mature, stable multiple-data-center solution. For use cases like disaster recovery, HBase clusters, which store important data, are required to back up data across data centers. (The Chinese banking supervision agency has even stricter mandatory offsite multiple-data-center backup requirements.) However, HBase is normally deployed in a single data center, and the replica and snapshot copy and export methods provided by current HBase fail to meet the requirements of both the supervision agency and offsite disaster recovery.

Biao Chen explores the challenges when deploying HBase in multiple data centers and analyzes exiting solutions, such as replica and snapshot copy and export, discussing implementation methods, scenarios applied, pros and cons, overall architecture, and best practices. Biao then introduces the backup feature in the upcoming HBase 2.0, which provides an incremental backup solution, avoiding the full table scan used in current technology while significantly increasing backup performance. At the same time, it provides consistency that replica doesn’t. Biao covers the architecture, principles, and the importance of this function in terms of a multiple-data-center solution and explains how to use it.

Biao Chen


Cloudera售前技术经理、行业领域顾问、资深方案架构师,原Intel Hadoop发行版核心开发人员。2006年加入Intel编译器部门从事服务器中间件软件开发,擅长服务器软件调试与优化。2010 年后开始Hadoop 产品开发及方案顾问,先后负责Hadoop 产品化、HBase 性能调优,以及行业解决方案顾问,已在交通、通信等行业成功实施并支持多个上百节点Hadoop 集群。



