Simple Sentiment Analysis – DevGin

Sentiment Analysis

This is a very basic sentiment analysis based on the last 2000 Tweets about @realDonaldTrump (as of Jan 2 2018).

The data for positive and negative words are in the files on this repository.

Or…Scrape your own Tweets by following this guide, or use the WordFreq.csv provided on the GitHub account.

The Code

First, read in the negative and positive words. The WordFreq.csv file was created using R and tapping the Twitter API for specific Tweets. The R program also does a word frequency count

We lose a degree of freedom because we can’t trace back the words to the article; however, this was just an exercise to start learning Python syntax and manipulating data. Still useful if you don’t need to trace back.


593 positive words.
773 negative words.

43.41% Positive

56.59% Negative


I’m interested in adding swear words to the negative.txt file to see how that changes the results.
Results can be skewed due to sarcasm or bigrams such as “not cool” is actually negative; even though “cool” will be counted as a positive. The general idea is that the “non” and other such words will balance each other out for positive and negative.


Thanks to the authors for contributing the positive and negative datasets:

Minqing Hu and Bing Liu. “Mining and Summarizing Customer Reviews.” Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2004), Aug 22-25, 2004, Seattle, Washington, USA, Bing Liu, Minqing Hu and Junsheng Cheng. “Opinion Observer: Analyzing and Comparing Opinions on the Web.” Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.

Author: mtgingrass

Author: administrator