It’s one thing to write an Apache Spark application that gets you to an answer. It’s another thing to know you used all the tricks in the book to make it run as fast as possible. Ted Malaska shares some of those tricks.
Join Ted to discover patterns and approaches that may not be apparent at first glance but that can be game-changing when applied to your use cases. You’ll learn about nested types, multithreading, skew, reducing, Cartesian joins, and other fun stuff.
Ted Malaska is a director of enterprise architecture at Capital One. Previously, he was the director of engineering in the Global Insight Department at Blizzard; principal solutions architect at Cloudera, helping clients find success with the Hadoop ecosystem; and a lead architect at the Financial Industry Regulatory Authority (FINRA). He has contributed code to Apache Flume, Apache Avro, Apache Yarn, Apache HDFS, Apache Spark, Apache Sqoop, and many more. Ted is a coauthor of Hadoop Application Architectures, a frequent speaker at many conferences, and a frequent blogger on data architectures.
©2017, O'Reilly Media, Inc. • (800) 889-8969 or (707) 827-7019 • Monday-Friday 7:30am-5pm PT • All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners. • firstname.lastname@example.org