As Big Data Takes Off, the Hadoop Wars Begin
IBM already has a Hadoop business that includes its own distribution it says is better suited for commercial users than the open-source Apache Hadoop distribution, though both IBM and Cloudera are based on the Apache distribution. IBM’s offering also provides an application called InfoSphere BigSheets, which hides the complexities of Hadoop underneath a variety of advanced analytics, BI and visualization tools. Based on a few sources I spoke with at Structure: Big Data, and after reading into an advertisement in the program for the conference , it looks EMC is getting into the game. The ad hints that EMC will announce a Hadoop product involving its new Greenplum database on May 9: The ad read, “05.09.11: EMC Greenplum. Apache Hadoop.” Also at the event, two independent sources suggested members of Yahoo’s Hadoop team will be spinning off their own separate business, and there is speculation this move is somehow tied into EMC’s Hadoop plans.
IBM isn’t to be taken lightly, nor is EMC on its own, but the latter turn of events would be a potentially market-changing situation given the Hadoop know-how within Yahoo, which has contributed the majority of the code now included in Apache Hadoop. During a panel at Structure: Big Data, Yahoo’s VP of Cloud Architecture Todd Papaioannou, quipped to Cloudera’s Awadallah that Yahoo will keep innovating on Hadoop and everyone could keep reselling it. Papaioannou declined to comment on the rumors of a Hadoop spinout, but did tell me via email, “I think Apache Hadoop will remain the go-to place to get access to new improvements and innovation in the core Hadoop platform. That’s exactly why we announced our ‘double down’ strategy and the work we are doing on the next generation of both Map Reduce and HDFS.”

