MASC Word Sense Sentence Corpus, Tab-Separated Format
Title | MASC Word Sense Sentence Corpus, Tab-Separated Format |
Publication Type | Miscellaneous |
Year of Publication | 2014 |
Authors | Passonneau, R. J., Ide N., Baker C. F., Fellbaum C., & Xie B.. |
Other Numbers | 3781 |
Abstract | The MASC Word Sense Sentence corpus is distributed as a set of three *tsv files (tab-separated format) that contain the sentences, annotation labels, and senses that comprise the sentence corpus: (1) the annotation labels (masc_annotations.tsv), (2) the WordNet word senses (masc_senses.tsv), and (3) the word token-sentence pairs, or instances (masc_sentences.tsv). A total of 116 distinct lemmas were selected; for each lemma, approximately 1000 example sentences were taken from the MASC corpus; and for each word in its sentence context, a trained annotator assigned a WordNet sense (WordNet version 3.1) as the annotation label. The following README describes the data in detail. |
Bibliographic Notes | Columbia University Academic Commons, available at http://dx.doi.org/10.7916/D80V89XH |
Abbreviated Authors | R. Passonneau, N. Ide, C. Baker, C. Fellbaum, and B. Xie |
ICSI Research Group | AI |
ICSI Publication Type | Technical documentation |