Apache Spark is the fast data processing of large document stores and databases. Spark is highly distributed, optimized, and redundant for large clustering manipulation and aggregation.
This talk is an introduction to Apache Spark, it's architecture, and it's programming API. We start with an introduction to DataFrames, the Catalyst Optimizer, and Spark SQL. We will then venture onto DataSets, discuss the DataSet API and the functional programming aspects of it. We will touch lightly on RDD and the pros and cons of using the API. We will then finish with how to connect to data sources like HDFS, S3, Cassandra, Elastic Search, and Kafka. This presentation will have samples that you can try out at home or at the office.
Daniel is a programmer, consultant, instructor, speaker, and recent author. With over 20 years of experience, he does work for private, educational, and government institutions. He is also currently a speaker for No Fluff Just Stuff tour. Daniel loves JVM languages like Java, Groovy, and Scala; but also dabbles with non JVM languages like Haskell, Ruby, Python, LISP, C, C++. He is an avid Pomodoro Technique Practitioner and makes every attempt to learn a new programming language every year. For downtime, he enjoys reading, swimming, Legos, football, and barbecuing.