Extending WordNet using Generalized Automated Relationship Induction

Todd Sullivan, Nuwan I. Senaratna, Lawrence McAfee
With guidance from: Rion Snow, Dr. Dan Jurafsky, Dr. Andrew Ng
Stanford's Machine Learning Course Research Project
Stanford Department of Computer Science

GARI was my group research project for Stanford's CS 229 Machine Learning course. In a one-line summary, we generalized Rion Snow et al.'s technique for automatic hypernym discovery to work with any word relationship in a plug-and-play manner. While our project used Snow's technique, we did not use any code or datasets from Snow's projects.

Our system comprises a Java package that allows for easy manipulation of the parts of the algorithm and simple creation of new classifiers/relationships. We used the Stanford Parser for accurate parsing of sentences into dependency trees and context-free phrase structure trees. We used WordNet as our starting set of word pairs for various relationships including antonyms, holonyms, hyper/hyponyms, meronyms, participles, and synonyms.


Technical Report


Member Contributions

My group simultaneously worked on LittleDog and GARI. Thus we did not necessarily divide our tasks evenly across each project separately. To gauge contributions relative to each team member one needs to view the breakdown for both projects. The following list details all group contributions on GARI.

  • We collectively designed the system, determined all interacting classes and interfaces, and stubbed all of the Java package's methods.
  • Lawrence created the wrapper for the SVM classifier and implemented the methods for all other classifiers except the "Naive Entropy Score" classifier described in footnote 2 of page 3 of the report.
  • Nuwan developed the "Naive Entropy Score" classifier described in footnote 2 of page 3 of the report.
  • We collectively determined the best method to extract a relevant path between two words in a context-free phrase structure trees.
  • Nuwan implemented the methods for extracting a relevant path between two words in a context-free phrase structure tree.
  • I designed and implemented the methods for extracting a relevant path between two words in a dependency tree.
  • Nuwan and I codeveloped the threading functionality within the feature package.
  • I developed the methods for extracting the original training set for each relationship from WordNet and pruning the extracted training set to contain only words that exist in a given corpus.
  • Nuwan and I codeveloped the caching methods in the system.
  • Nuwan executed, monitored, and categorized the tests of various corpus sizes.
  • Lawrence manipulated the classifier parameters and analyzed the results of each classifier/parameter combination used on the output of Nuwan's executions.
  • We each wrote one-third of the milestone report.
  • Nuwan and Lawrence wrote the final report while I solely wrote the LittleDog report.
  • Nuwan and Lawrence co-edited the paper. I was the final editor of the paper.
  • I applied all formatting and presentation features to the paper.

Source Code

I was not able to obtain permission from all of the GARI stakeholders to release our source code. If the situation changes in the future, I will place a link to the source code here.