MASC Word Sense Sentence Corpus, Tab-Separated Format

TitleMASC Word Sense Sentence Corpus, Tab-Separated Format
Publication TypeMiscellaneous
Year of Publication2014
AuthorsPassonneau, R. J., Ide N., Baker C. F., Fellbaum C., & Xie B..
Other Numbers3781
Abstract

The MASC Word Sense Sentence corpus is distributed as a set of three *tsv files (tab-separated format) that contain the sentences, annotation labels, and senses that comprise the sentence corpus: (1) the annotation labels (masc_annotations.tsv), (2) the WordNet word senses (masc_senses.tsv), and (3) the word token-sentence pairs, or instances (masc_sentences.tsv). A total of 116 distinct lemmas were selected; for each lemma, approximately 1000 example sentences were taken from the MASC corpus; and for each word in its sentence context, a trained annotator assigned a WordNet sense (WordNet version 3.1) as the annotation label. The following README describes the data in detail.

Bibliographic Notes

Columbia University Academic Commons, available at http://dx.doi.org/10.7916/D80V89XH

Abbreviated Authors

R. Passonneau, N. Ide, C. Baker, C. Fellbaum, and B. Xie

ICSI Research Group

AI

ICSI Publication Type

Technical documentation