Oleg Zhurakousky

ÜberConf

Denver · July 16 - 19, 2013

You are viewing details from a past event

Principal Architect w/Hortonworks

Oleg is a Principal Architect with Hortonworks responsible for architecting scalable BigData solutions using various OpenSource technologies available within and outside the Hadoop ecosystem. Before Hortonworls Oleg was part of the SpringSource/VMWare where he was a core engineer working on Spring Integration framework, leading Spring Integration Scala DSL and contributing to other projects in Spring portfolio. He has 17+ years of experience in software engineering across multiple disciplines including software architecture and design, consulting, business analysis and application development. Oleg has been focusing on professional Java development since 1999. Since 2004 he has been heavily involved in using several open source technologies and platforms across a number of projects around the world and spanning industries such as Teleco, Banking, Law Enforcement, US DOD and others.
As a speaker Oleg presented seminars at dozens of conferences worldwide (i.e.SpringOne, JavaOne, Java Zone, Jazoon, Java2Days, Scala Days, Uberconf, and others).

Presentations

High Speed Continuous & Reliable Data Ingest into Hadoop

This talk will explore the area of real-time data ingest into Hadoop and present the architectural trade-offs as well as demonstrate alternative implementations that strike the appropriate balance across the following common challenges:

Go Beyond "Debug": Wire Tap your App for Knowledge with Hadoop

Today, application developers devote roughly 80% of their code to persisting roughly 20% of the total data flowing through the applications. That means two things:

80% of the data flowing through our applications is at best lost in rolling log files, at worst never collected – without ever being analyzed or accounted for.
Application-level database programming, licensing, storage, administration, and ETL processing have maxed out IT budgets and have constrained app development teams from keeping pace with the rate of change in the business.
The other 80% of the data is “Event Data” that can no longer be ignored if you want to stay competitive. Changes to application state are already stored as a sequence of events in application and middleware logs. In fact, since this data never held value to anyone but the developer in the past, a lot of potentially valuable information is often never collected. With Hadoop, we can: * store and query these events - Transaction tracing,
use the event log to reconstruct the application domain at any point in time - ETL,
use the same event log to construct new domains we haven't planned for - ELT, and
automatically adjust our data domains to cope with retroactive changes - ???