Genie – Hadoop Platform as a Service at Netflix


Recently in our tech-blog, we discussed the architecture of our petabyte-scale data warehouse in the cloud ( Salient features include the use of Amazon`s Simple Storage Service (S3) as our “source of truth”, leveraging the elasticity of the cloud to run multiple dynamically-resizable Hadoop clusters to support various workloads, and our implementation of a horizontally-scalable Hadoop Platform as a Service called ?Genie?. In this presentation, we will focus on Genie, which provides job and resource management for the Hadoop ecosystem in the cloud, and is the core service that the various components of the enterprise ecosystem at Netflix use to integrate with Hadoop in the cloud. From the perspective of the end-user, Genie abstracts away the physical details of various (potentially transient) Hadoop resources in the cloud, and provides REST-ful APIs to submit and monitor Hadoop, Hive and Pig jobs without having to install any Hadoop clients. We will describe how Genie is used in production at Netflix for processing 100s of terabytes of data everyday, running thousands of ETL (extract, transform, load) jobs, plus hundreds of ad-hoc jobs from our visualization tools and our web interface. Finally, we will discuss our plans for open sourcing Genie.



Please enter your comment!
Please enter your name here