Building a Data Lake using Apache Hadoop: A Proof of Concept

Objective of the workshop
Data Lake is a cool new idea, but often one needs more clarity on ‘why’ you may need to build it and ‘how’ you can build it. We want to answer these two questions not only through slides, but also using a Proof of Concept. In the process, we will
a)Drive home a few differentiators that a Data Lake offers vis-a-vis a traditional Enterprise Data Warehouse:
i.Retain huge volumes of raw data from numerous sources (aka ‘Schema on Read’)
ii.Enable predefined as well as exploratory analysis of data from the same data source
iii.Empower a variety of stakeholders in the business with analyzable data,
b)Explain the building blocks of a Data Lake architecture

Who can attend this workshop?
Primarily for Developers, Architects and Technical Managers. But Business Managers considering an investment in a Data Lake for their organizations may also benefit from the workshop.

What all will be covered in the workshop?
In this workshop we will demonstrate a Proof-of-Concept of a Data Lake built on top of Apache Hadoop and Pentaho Data Integration, using data from financial markets. In the context of a Data Lake, we will talk about:
-Setting up a scalable HDFS Layer
-Data Ingestion in Batch and Streaming mode
-Data Curation, Discovery and Metadata
-Designing an Extensible Pluggable Data Analytics Infrastructure
-Data Integration into downstream applications including dashboards and decision support systems
-Multi-tenant access for various stakeholders and departments in an organization

Benefits/Takeaways for the attendees
Attendees will learn about Open source platforms available for building a Business Data Lake. You will also take away a documented set of steps to replicate the proof of concept on your own.

Pre-requisites to attend the workshop
A very basic understanding of the following two topics will suffice:
-Data Warehouse

Speaker/Instructor Profile:
Monojit Basu is the Founder and Director of TechYugadi IT Solutions & Consulting, Bangalore. This company is engaged in technical enablement in Cloud Computing, Analytics and new software architectures and consulting services for open source software. Before setting up this company, Monojit worked in Senior Product Manager and Technical Architect roles in various software product companies including Sun Microsystems, webMethods, IBM. Monojit has played a key role in development as well as in customer success of many innovative enterprise software platforms, including highly available J2EE application server, SOA platform, Application Life Cycle Management platform and Big Data technology platform. He has been a speaker at conferences and events in India, China and the US, including JavaOne, IBM Innovate, and NASSCOM-IIMB workshop. Monojit is also a Cloudera-certified Hadoop developer.


Leave a Reply

Your email address will not be published. Required fields are marked *