Pro, Con, and Affinity Tagging of Product Reviews

Todd Sullivan
Stanford's Natural Language Processing Course
Final Project
Stanford Department of Computer Science

In this project I created several systems for tagging product reviews with the pros, cons, and affinities implied by the text and other information such as the review's rating, author's location, etc. While I technically had four weeks to complete the project, I realistically only had three weeks due to overlap with the class' third project. I explored many techniques including various preprocessing methods, a bag of words Naive Bayes baseline classifier, a maximum entropy classifier, and a combinatorial optimization algorithm for finding optimal tag sets. PowerReviews, which operates the product review portal, provided the product review data.

Not related to the NLP course, I also developed a production-capable system that takes a vastly different approach that discards the training data requirement in favor of a different avenue of human input.

Technical Report