MICUSP for AntConc and WordSmith
- Resources Corpora
The Michigan Corpus of Upper-Level Student Papers is a collection of advanced undergraduate and and graduate student writing. The University of Michigan’s online interface for the corpus is here.
For my own research, I wanted to be able to process the data locally, so I compiled the data in 3 versions:
- The bodies of the papers (absent abstracts and bibliographies).
- The bodies of the papers tagged with the CLAWS 7 tagset and formatted for AntConc.
- The bodies of the papers tagged with the CLAWS 7 tagset and formatted for WordSmith.
The first is formatted like this:
<The Ecology and Epidemiology of Plague>
<Student Level: Final Year Undergraduate>
<Native speaker status: NS>
<Paper type: Report>
<Paper contains following features: Literature review section, Reference to sources>
Throughout history, plague has been made infamous as the ultimate biological killer.
The second like this:
Throughout_II history_NN1 ,_, plague_NN1 has_VHZ been_VBN made_VVN infamous_JJ as_II the_AT ultimate_JJ biological_JJ killer_NN1 ._.
And the third like this:
<w II>Throughout <w NN1>history , <w NN1>plague <w VHZ>has <w VBN>been <w VVN>made <w JJ>infamous <w II>as <w AT>the <w JJ>ultimate <w JJ>biological <w NN1>killer .
With a password (which you get by contacting me), they can be downloaded from the following links: