Hyperledger与CDH大数据生态系统的融合以及应用实践 (Hyperledger’s integration with CDH's big data ecosystem and its real-world applications)

此演讲使用中文 (This will be presented in Chinese)

蒋守壮 (Shouzhuang Jiang) (万达网络科技集团有限公司), 丛宏雷 (万达网络科技集团有限公司)
13:10–13:50 Saturday, 2017-07-15
企业应用 (Enterprise adoption)
地点: 多功能厅6A+B(Function Room 6A+B) 观众水平 (Level): 中级 (Intermediate)
必要预备知识 (Prerequisite Knowledge)

Cloudera Manager和CDH Hyperledger

描述 (Description)


2.Cloudera Manager背景介绍
Cloudera Manager 是 CDH 市场领先的大数据管理平台,对 CDH 的每个部件(包括HDFS,YARN,HBase,Hive和Impala等等)都提供了细粒度的可视化和控制,从而设立了企业部署的标准。通过 Cloudera Manger,运维人员得以提高集群的性能,提升服务质量并降低管理成本。

Cloudera Manager 设计的目的是为了使得对于企业数据中心的管理变得简单和直观。通过 Cloudera Manager,可以方便地部署,并且集中式的操作完整的大数据软件栈。该应用软件会自动化安装过程,从而减少了部署集群的时间。通过 Cloudera Manager可以提供一个集群范围内的节点实时运行状态视图。同时,还提供了一个中央控制台,可以用于配置集群。不仅如此, Cloudera Manager 通过包含一系列的报道和诊断工具,可以帮助您优化集群性能,并且提高利用率。 Cloudera Manager 能够为您提供以下的功能:
● 自动化 Hadoop 安装过程,大幅缩短部署时间;
● 提供实时的集群概况,例如节点、服务的运行状况;
● 提供了集中的中央控制台对集群的配置进行更改;
● 包含全面的报告和诊断工具,帮助优化性能和利用率。

3.Hyperledger on CDH项目
因为使用Cloudera Manager非常方便将一个新的组件加入CDH大数据平台上面,所以我们想到使用CM将Hyperledger部署到企业级的大数据平台环境并且打通数据流的整合和高效分析,即利用已有的大数据平台存储和分析Hyperledger产生的数据,我们设立Hyperledger on CDH项目,专注于快速部署,监控和管理Hyperledger。

Hyperledger on CDH项目涉及以下几个方面工作内容:
• Hyperledger Parcel 包含了所有组件依赖以及所需的Docker镜像
CSD 如何被部署以及如何启动,停止以及监控
• Cloudera Manager Agent负责分发Parcel包并启动服务
• 用户选择需要部署服务各个角色的节点
• Cloudera Manager通知Agent产生一个用于运行的沙盒环境,这个环境包含了生成的配置文件,环境变量以及控制脚本
• 当用户点击启动服务,Agent调用控制脚本启动Hyperledger服务







Hyperledger is an open source project, initiated by the Linux Foundation in 2015, to advance blockchain digital technology and transaction authentication. Hyperledger aims to promote cross-industry collaboration to build an open platform that can meet the requirements for multiple domains and simplify business processes. All nodes in a Hyperledger network forge a peer-to-peer distributed ledger, and the overall distributed ledger is totally shared and decentralized. Its information is jointly updated using a consensus mechanism. Hence, it is well suited for financial applications and many other industries, such as manufacturing, banking, insurance, and the internet of things. By creating a distributed ledger open standard and implementing virtual and digitized value exchange (e.g., asset contracts and energy transactions), it is possible to track and conduct transaction securely, efficiently, and cost effectively.

CDH’s Cloudera Manager provides fine-grained visualization and control of every component in CDH, including HDFS, YARN, HBase, Hive, and Impala, which helps set an enterprise-level deployment standard. Leveraging Cloudera Manager, operations personnel can improve their cluster’s performance and service quality while lowering management costs. The design goal of Cloudera Manager is to make management of an enterprise’s data center easy and intuitive. This software can automate the installation procedure, decreasing the time to cluster deployment. Cloudera Manager has a real-time and runtime status panel for all nodes within a cluster and a central control console by which users can configure a cluster. In addition, Cloudera Manager includes a set of report and debug tools that can help users optimize performance and increase utilization rate of clusters.

Shouzhuang Jiang and 丛宏雷 discuss Hyperledger’s integration with CDH’s big data ecosystem and its real-world applications. Cloudera Manager can easily add new components into CDH’s big data platform, so it is possible to use Cloudera Manager to deploy Hyperledger into an enterprise-level big data environment and integrate data flow with efficient analysis—that is, use your existing big data platform to store and analyze data generated by Hyperledger.

You’ll also explore real-world applications of Hyperledger, including:

  • Membership services that manage identification, privacy, and confidentiality on the network. Participants register to obtain identities, which enables the attribute authority to issue security keys for transacting. By using the blockchain to record the update history of a ledger, auditors can view transactions pertaining to a participant, assuming that each auditor has been granted proper access authority by the participants.
  • Blockchain services that manage the distributed ledger via a peer-to peer protocol based on GRPC. Optimized data structure can effectively represent participants’ states. Different consensus algorithms maybe be embedded to guarantee strong consistency among different networks (tolerating misbehavior with Byzantine fault tolerance, tolerating delays and outages with crash tolerance).
  • Chaincode services, a secured and lightweight way to sandbox the chaincode execution on validating nodes. The environment is a “locked down” and secured container with a set of signed base images that contain secure OS and chaincode language, runtime, and SDK images for Golang, Java, etc.
  • How Wanda Internet Technology Group applies Hyperledger in its Digital Right Exchange Platform to ensure all transaction history remains confidential and cannot be tampered with. Applying Hyperledger in a supply chain management system and leveraging its tracking capacity can link producing and sales parties, creating a brand-new sale model.

The talk concludes with a consideration of bridging Hyperledger’s data to big data platform’s components, implementing data stream pass-through and data mining, and extracting maximum value.

Topics include:

  • Hyperledger Parcel, which includes all dependent components and all required Docker images
  • How is CSD deployed and how to start, stop, and monitor it
  • Cloudera Manager Agent, which is in charge of sending Parcel packages and start the service
  • How users can configure the role of nodes on which services will be deployed
  • How Cloudera Manager requests an agent to create a runtime sandbox, which has all generated configure files, environment variables, and control scripts
  • How the agent evokes the control scripts to start Hyperledger services
Photo of 蒋守壮 (Shouzhuang Jiang)

蒋守壮 (Shouzhuang Jiang)


专注于Hadoop,Spark,Flink,Kafka,Elastic,HBase,Hive,Kylin等大数据相关技术的源码研究和企业级实战,《基于Apache Kylin构建大数据分析平台》一书作者。

