The move to streaming architectures from batch processing is a revolution in how companies use data. But what is the state of the art for a real-time data stack? Sijie Guo and Maosong Fu explore the typical challenges in a modern real-time data stack and explain how the modern technology will impact streaming architecture and applications in the future.
09:00-12:30 (3h 30m)
数据工程和架构 (Data engineering and architecture), 英文讲话 (Presented in English)
大数据的数据模型 (Big data modeling）
Ted Malaska (Blizzard Entertainment )
The recent advancement in distributed processing engines, from Spark to Impala to Spark Streaming and Storm, has proved exciting. Ted Malaska explains why, if your design only focuses on the processing layer to get speed and power you may be missing half the story, leaving a significant amount of optimization untapped.
Ted Malaska walks you through building a fraud-detection system, using an end-to-end case study to provide a concrete example of how to architect and implement real-time systems via Apache Hadoop components like Kafka, HBase, Impala, and Spark.