M. Sashi Kala, Nancy Jasmine Goldena
Currently, organizations are swimming in an expanding sea of data that is either too voluminous or too unstructured to bemanaged and analyzed through traditional means. Every day, Google alone processes about 24 petabytes (or 24,000 terabytes) of data. Yetvery little of the information is formatted in the traditional rows and columns of conventional databases.Analyzing and working with Big Data could be very difficult using classical means like relational database management systems or desktop software packages for statistics and visualization. Instead, Big Data requires large clusters with hundreds or even thousands of computing nodes. This paper would highlight the software tools used successfully and widely for storage and processing of Big Data sets on clusters of commodity hardware. The primary purpose of this paper is to provide an in-depth analysis of different platforms available for performing big data analytics. The tools used for Extraction, Storage, Cleaning, Mining, Visualizing, Analyzing and Integrating are shed light on in detail.
Hadoop, HDInsight, Spark, Mozenda, D3, Rapidminer, Orange, KNIME, Highchart, jHepwork