The project requires building of a NLP classifiers and sentiment analysis based on the below guidelines. I have attached the Rmarkdown file and required input files for the analysis. please use Python to do the below steps preferably. The ‘cleaned_subtitles’ will be the dataset used for the analysis and ‘movie reviews’ file is what you should use for sentiment analysis
– Pick several nouns and/or verbs from your previous assignment. Create a column in the dataframe that indicates if that line from the movie/TV show includes that word or does not include that word. You can use 0 and 1 or any labels that make sense to you. Remember, we covered regular expression detection and deletion in the raw text assignments! – Once you have created this column, use string replacement to delete that word from your subtitles. We will take the word out to see if we can predict when it is used – if you leave it in, it’s a perfect predictor! – Use *two* feature extraction methods and *two* machine learning algorithms to determine if you can predict when your noun or verb will be used. You should include four different classification reports below.
– Use *one* of the unsupervised lexicon techniques to create sentiment scores for your movie/TV show. – What is the overall sentiment of your movie/TV show? How would you interpret the scores provided? – Using the movie reviews mini dataset provided online, create a sentiment tagging model (one feature extraction method + one algorithm). – With this new model, create sentiment scores for your movie/TV show. – What is the overall sentiment using the new model of sentiment tagging? How would you interpret the scores provided?