Speaker Topics - No Fluff Just Stuff

Applying Testing Techniques for Big Data and Hadoop

More and more companies are relying on timely and accurate analytics from their “Big Data” systems. Unfortunately, testing toolsets and concepts common in other technical disciplines is lacking. A common and incorrect perception exists in “Big Data” that proper testing is impossible due to dataset size and tool complexity. This overview session will examine a sampling of Hadoop testing tools and processes available today which you can use on your projects today.

Testing “Big Data” can mean big time investment; several hours often spent just to realize you made a simple typo. You fix the typo and then wait another couple hours for your script to hopefully complete. Even if the Big Data script or program ran to completion are you sure your data analysis is functionally correct? Getting programs to run to completion and have comfort that your analytic output is functionally correct is one of the biggest hidden problems in “Big Data” today. We all know we need to test and verify our scripts and programs, but given the large dataset sizes we are all working with, each test run can take hours, making proper testing unreasonable to do with the demand for quick turn-arounds.

This overview presentation will focus on surveying two areas of the testing problem; (1) establishing efficient test datasets and (2) Hadoop centric testing tools such PigUnit, Junit for Pig and Hive UDF testing, BeeTest for Hive testing among other tools of interest.


About Mark Johnson

Mark Johnson is a Director of Consulting at Hortonworks where his day is spent helping people achieve value from their Big and complex Data repositories. Mark has worked on a wide range of technology during his career. Most recently he has focused on the Hadoop ecosystem. Mark is active in the software community as the President of the New England Java Users Group (NEJUG) and a regular presenter to user groups and various conferences. When not working, Mark can be found riding his mountain bike on local trails and playing with his family.

More About Mark »