O'Reilly、Cloudera 主办
Make Data Work
2017年7月12-13日:培训
2017年7月13-15日:会议
北京,中国

大数据的数据模型 (Big data modeling)

This will be presented in English.

Ted Malaska (Capital One)
09:00–12:30 Thursday, 2017-07-13
数据工程和架构 (Data engineering and architecture), 英文讲话 (Presented in English)
地点: 多功能厅5B(Function Room 5B) 观众水平 (Level): Beginner

必要预备知识 (Prerequisite Knowledge)

A working knowledge of SQL

您将学到什么 (What you'll learn)

A basic understanding of relational data models

描述 (Description)

从Spark到Impala,再到Spark Streaming或Storm,分布式计算引擎最近的发展令人兴奋。然而,如果你的设计仅仅只是专注于数据处理层并期望得到高速度和性能,那么你可能就忽视了故事的另外一半,从而没能用到很多的优化方法。

Ted Malaska关注于技术栈的下层,将会介绍一系列在Cassandra、HBase、Kudu、Kafka、SoIR、Elasticsearch、HDFS和S3上实现的存储设计的模式和规划。通过仔细地调整每种业务场景下数据存储的方式,数据处理和访问的时间可以降低两到三个数量级。

你在本课程里学习到的策略和原理可以被应用于很多的软件环境。课程里会展示使用HDFS、HBase、Cassandra、Kudu、Kafka、Elasticsearch和S3的例子。


The recent advancement in distributed processing engines, from Spark to Impala to Spark Streaming and Storm, has proved exciting. However, if your design only focuses on the processing layer to get speed and power then you may be missing half the story, leaving a significant amount of optimization untapped.

Ted Malaska looks down the stack and describes a set of storage design patterns and schemas implemented on Cassandra, HBase, Kudu, Kafka, SolR, Elasticsearch, HDFS, and S3. By carefully tailoring how data is stored for each use case, processing and access times can be reduced by two to three orders of magnitude.

While the strategies and principles you’ll learn in this class can be applied in many software environments, examples will be shown using HDFS, HBase, Cassandra, Kudu, Kafka, Elasticsearch, and S3

Photo of Ted Malaska

Ted Malaska

Capital One

Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.

联系OReillyData

关注OReillyData微信号获取最新会议信息并浏览前沿数据文章。

WeChat QRcode

 

Stay Connected Image 1
Stay Connected Image 3
Stay Connected Image 2

阅读关于大数据的最新理念。

ORB Data Site