Getting started with Hadoop

Apache Hadoop is a powerful and sometimes complex tool for dealing with Big Data as well as high data throughput applications which can enable some existing applications to finally run right as well as open doors for entirely new types of applications and analysis. So the question is how does one get started with Hadoop? This presentation explores the various introductory aspects of the Hadoop infrastructure, data sources and query strategies and planning so you can get started with Hadoop.

Through this introductory no non-sense presentation we will explore various environmental options to design your initial cluster; such as physical vs virtual environments. In addition, we will explore various data ingestion and modeling strategies so you can populate your new cluster with the data required for your analysis in an Agile way. Finally, we will review various strategies available to process and query your data so you can get value from the cluster.

About Mark Johnson

Mark Johnson is a Director of Consulting at Hortonworks where his day is spent helping people achieve value from their Big and complex Data repositories. Mark has worked on a wide range of technology during his career. Most recently he has focused on the Hadoop ecosystem. Mark is active in the software community as the President of the New England Java Users Group (NEJUG) and a regular presenter to user groups and various conferences. When not working, Mark can be found riding his mountain bike on local trails and playing with his family.

More About Mark »