Hadoop 2.0 is here. After 5 ½ year of initial proposal hadoop community has delivered next major version update to the world’s most popular big data stack. Though it looks like single number upgrade, its going to redefine how we use Hadoop.

It’s right time to get on the next generation platform. If you are still not convinced, the following reasons should get you excited.

Hadoop becomes Big Data OS

Hadoop is the kernel of a distributed operating system – Doug cutting

With Hadoop 2.0 hadoop becomes kernel of big data operating system. Hadoop 1.0 used to only support one way of data processing, Map-Reduce, which limited it to few specific applications. But with introduction of YARN, hadoop now can support any kind of data processing model.

Evolving Hadoop ecosystem independent of core

Over the years, ecosystem projects like Hive, Pig have suffered from supporting ever changing Hadoop versions. As there was tight coupling between Map/Reduce and hadoop versions, they ended up supporting only 1.x versions leaving all 0.21, 0.22 versions unsupported. As Map/Reduce is becoming userland library, independent of core of Hadoop, now ecosystem projects can evolve faster and support any Hadoop 2.x versions.

Innovation at Scale

More processing paradigms like MPI, Spark landing on Hadoop with YARN

Map/Reduce is a great algorithm for certain kind of problems. But it fares poorly in the problems like iterative processing, machine learning. Even though projects like Mahout, RHadoop tried to build machine learning on Map/Reduce it was too hard. But over the years there are many projects like Spark, Storm are solving this problem. But till Hadoop 2.0 they were not able to run natively on Hadoop cluster. But from 2.0 they can run natively on Hadoop and deliver high performance on same cluster. This is enabling frameworks to innovate at scale.

Hadoop becomes Real time

Hadoop was built for batch processing systems. But as its popularity grew need for real time processing also grew. As Map/Reduce was inherently batch processing system it was very difficult to support to real time on it. But from 2.0 there are many frameworks like Apache Storm, Apache Tez available to support real time processing.

With Hadoop 2.0 release, finally hadoop comes out of Google Map/Reduce shadow. Hadoop 2.0 is built for enterprise from scratch. Hadoop 2.0 is ready, are you?

Leave a reply