IJTRD

Title of Paper:
Using Apache Spark for Analysing the Sentiments of Unstructured Data with Logistic Regression Algorithm

Download

Authors:
Chetan Balaji

Cite This Article :

Chetan Balaji "Using Apache Spark for Analysing the Sentiments of Unstructured Data with Logistic Regression Algorithm" Published in International Journal of Trend in Research and Development (IJTRD), ISSN: 2394-9333, Volume-7 | Issue-5 , October 2020, URL: http://www.ijtrd.com/papers/IJTRD22297.pdf

Abstract :
Sentiment analysis has become an interesting field for both research and industrial domains. The expression sentiment refers to the feelings or thought of the person across some certain issues. Besides, it is additionally viewed as an immediate application for feeling mining. The tremendous measure of unstructured information has been the wellspring of printed information and one of the most fundamental information volumes; subsequently, this information has various points, for example, business, modern or social points as indicated by the information necessity and required preparing. As a matter of fact, the measure of information, which is huge, develops quickly every second and this is called large information which requires unique preparing methods and high computational force so as to play out the necessary mining errands. Here we propose an idea to perform a sentiment analysis with the help of Apache Spark framework, which is considered an open source distributed data processing platform which utilizes distributed memory abstraction. The goal of using Apache Spark’s Machine learning library (MLIB) is to handle an extraordinary amount of data effectively. We recommend some Pre-processing and Machine learning text feature extraction steps for getting greater results in Sentiment Analysis classification. The effectiveness of our proposed approach is proved against other approaches achieving better classification results when using Naïve Bayes, and Decision trees classification algorithms. Finally, our solution estimates the performance of Apache Spark concerning its scalability

Keywords :
Apache Spark, Unstructured Data

Publication Details:

Published In :
Volume-7 | Issue-5 , October 2020

e-ISSN Number :
2394-9333

Unique Identification Number :
IJTRD22297

International Journal of Trend in Research and Development

International Peer Reviewed, Open Access Journal ISSN: 2394-9333

Using Apache Spark for Analysing the Sentiments of Unstructured Data with Logistic Regression Algorithm

For Author

Archives

Statistics

Contact