Apache Hadoop已经伴随着开源的大数据分析同步地发展了十多年。伴随着即将发布的3.0版，Apache Hadoop持续地进化着，并不断地加入显著的新特性，比如HDFS可擦除编码、YARN的时间线服务版本2和MapReduce的任务级别的优化。加在一起，这些新特性增强了Hadoop的性能、可扩展性和多租户的能力。
在本演讲里，我们会给出一个Apache Hadoop发布版本的简史，随后再综合介绍Apache Hadoop 3.0版的特性。我们也会给出目前3.0版本进展状态的更新，以及整个社区在测试和稳定代码上的努力。这些工作保证了Hadoop 3将会是最好的主要发布版本。
Apache Hadoop has been synonymous with open source big data analytics for over a decade. With the upcoming 3.0 major release, Apache Hadoop continues to evolve with the addition of significant new features like HDFS erasure coding, YARN Timeline Service v2, and MapReduce task-level optimization. Together, these new features improve the performance, scalability, and multitenancy capabilities of Hadoop.
Andrew Wang and Daniel Templeton offer an overview of new features and discuss current release management status and community testing efforts dedicated to making Hadoop 3.0 the best Hadoop major release yet.
Andrew Wang is a software engineer at Cloudera on the HDFS team, an Apache Hadoop committer and PMC member, and the release manager for Hadoop 3.0. Previously, he was a PhD student in the AMPLab at UC Berkeley, where he worked on problems related to distributed systems and warehouse-scale computing. He holds a master’s and a bachelor’s degree in computer science from UC Berkeley and UVA respectively.
Daniel Templeton has a long history in high-performance computing, open source communities, and technology evangelism. Today, Daniel works on the YARN development team at Cloudera, where he focuses on the resource manager, fair scheduler, and Docker support.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • email@example.com