Chetan Balaji
Sentiment analysis has become an interesting field for both research and industrial domains. The expression sentiment refers to the feelings or thought of the person across some certain issues. Besides, it is additionally viewed as an immediate application for feeling mining. The tremendous measure of unstructured information has been the wellspring of printed information and one of the most fundamental information volumes; subsequently, this information has various points, for example, business, modern or social points as indicated by the information necessity and required preparing. As a matter of fact, the measure of information, which is huge, develops quickly every second and this is called large information which requires unique preparing methods and high computational force so as to play out the necessary mining errands. Here we propose an idea to perform a sentiment analysis with the help of Apache Spark framework, which is considered an open source distributed data processing platform which utilizes distributed memory abstraction. The goal of using Apache Spark’s Machine learning library (MLIB) is to handle an extraordinary amount of data effectively. We recommend some Pre-processing and Machine learning text feature extraction steps for getting greater results in Sentiment Analysis classification. The effectiveness of our proposed approach is proved against other approaches achieving better classification results when using Naïve Bayes, and Decision trees classification algorithms. Finally, our solution estimates the performance of Apache Spark concerning its scalability
Apache Spark, Unstructured Data