Thursday, June 26, 2014

Comparing the top Hadoop distributions (Networkworld)

Hadoop introduced a new way to simplify the analysis of large data sets, and in a very short time reshaped the big data market. In fact, today Hadoop is often synonymous with the term big data.

Since Hadoop is an open source project, a number of vendors have developed their own distributions, adding new functionality or improving the code base. This article by Altoros, a big data specialist, provides an overview of the major distributions, describing how they differ from the standard edition.

A standard open source Hadoop distribution (Apache Hadoop) includes:

  • The Hadoop MapReduce framework for running computations in parallel
  • The Hadoop Distributed File System (HDFS)
  • Hadoop Common, a set of libraries and utilities used by other Hadoop modules
This is only a basic set of Hadoop components; there are other solutions -- such as Apache Hive, Apache Pig, and Apache Zookeeper, etc. -- that are widely used to solve specific tasks, speed up computations, optimize routine tasks, etc.
Vendor distributions are, of course, designed to overcome issues with the open source edition and provide additional value to customers, with a focus on things such as:
  • Reliability. The vendors react faster when bugs are detected. They promptly deliver fixes and patches, which makes their solutions more stable.
  • Support. A variety of companies provide technical assistance, which makes it possible to adopt the platforms for mission-critical and enterprise-grade tasks.
  • Completeness. Very often Hadoop distributions are supplemented with other tools to address specific tasks.
In addition, vendors participate in improving the standard Hadoop distribution by giving back updated code to the open source repository, fostering the growth of the overall community.
Three of the top Hadoop distributions are provided by Cloudera, MapR and Hortonworks. The chart below illustrates the results of the market research “Big Data Vendor Revenue and Market Forecast 2012–2017.” It compares the revenue of these major Hadoop vendors in 2012.

No comments:

Post a Comment