Writing Up Your Linguistics Project
- Resources Project Guides
If you’re new to linguistics, committing your analysis to paper can be anxiety producing. What follows are some tips to help you on your way. First, I would suggest that you find models to emulate. For example, you might draw from some of the example in the article database, or something you found elsewhere. Then, I would apply an analysts eye to those models. Note how they are structured, the vocabulary they use (and not just the nouns, check the verbs), and the ways in which they both present and discuss their data.
Regardless of the specific models you consult, here are a few things your paper should do (in rough order):
- Set out the focus and scope of your research
- Situate your research within an ongoing conversation in the field
- Articulate what insights are to be gained from your research
- Describe your data, where it comes from, and how it was collected
- Describe your methods of analysis and explain why those methods were selected and why they have explanatory power
- Present your results
- Analyze those results
- Summarize you findings and, perhaps, reinforce their implications or posit areas of potential future inquiry
As you draft your paper, here are a few quick pointers:
- Working with linguistic data is often a processes of discovery. You are not necessarily going to know what your main claims are (your thesis) until after you’ve written up much of your analysis. With that in mind, I recommend that you write your introduction last. It is often easiest to begin with descriptions of your data and your methods, followed by results and discussion — and saving your introduction and conclusion until after you’ve drafted the body of your paper.
- Visual presentation matters. Tables and figures (including charts, graphs, and images) are rhetorical tools. They are as much a part of your argumentation as your prose, and you should attend to them accordingly. They should be clearly designed and labeled (check here for some tips). They should be numbered and captioned for easy reference (e.g., see Fig. 1). And you should choose the appropriate chart for the information you want to communicate (see here and here). If you are new to making charts, my advice is to keep it simple. Don’t overdesign and focus on clarity. If a simple bar chart is appropriate, use one.
- Know and articulate not just the strengths of your data, but also its limitations. All data have limitations. This is not a fatal flaw, but is simply part of doing this kind of work. There is no need, therefore, to oversell your results. In fact, doing so can undermine your credibility.
Now, I want to look at a few issues in more detail and with the help of some examples.
1. Make effective use of subsections.
This may seem odd to you if you are used to writing essays comprised of what is effectively a single section of continuous text. Section headings are a highly efficient way for you to orient your reader’s expectations without having to do a lot of rhetorical work. When a reader sees “Methods,” she or he know what to expect. You don’t need to do any set up: In this section, I describe the tools that I used to generate my analysis… blah, blah, blah. No! You can dive right in. In a typical linguistic analysis you will have the following sections:
- Data & Methods
- Results & Discussion
In between the Introduction and the Data & Methods sections, there is sometimes a short literature review (though it is most often subtitled something more specific to the research topic). Its purpose is to situate the research within the context of an ongoing conversation in the field (more here). This is important, but within the context of a research paper, it is not as important as your own analysis. In many cases, the work of situating the research is folded into the introduction, itself.
In some disciplines, Results & Discussion are rigorously separated, with the former emphasizing the presentation of the findings and the latter on what Swales and Feak (2012: 157) describe as “an increasingly generalized account of what has been learned in the study.” Perhaps because linguistics is such a diverse discipline, Results and Discussion are not always strictly divided. What you choose to do with these will depend on the kinds of research questions you are pursuing and how you think your findings can most clearly be presented. Often, however, they are going to be woven together in a series of subsections that move through different types of analysis (quantitative and qualitative), with repeated and explicit connections back to the arguments established in the introduction.
Similarly, the Data & Methods are sometimes separated, but it can be more efficient to do these rhetorical tasks together. Here, you want to describe what your data are, where you got them from, how you are analyzing them, and why your analytical approach is appropriate. I’ll say more about the Methods in a bit. But I would stress that it is in your interest to be clear and efficient. This is set up for the main event — the analysis of your data. When appropriate, you might take advantage of tables as a tool for presenting important information. For example, if you are using a corpus, you will need to provide that information for the reader in the Data & Methods section. And one way to do that is with a table, as Fischer-Starcke does in her study of Pride and Prejudice:
If you are using a corpus compiled by someone, you should describe at cite it, as in this excerpt from Ng et al.:
This study used the Corpus of Historical American English (COHA), a database of approximately 400 million words that covers 1810–2009 (Davies 2012). To examine age stereotypes in this database, we generated a comprehensive list of synonyms for the term elderly from 1810–2009 using two historical thesauri that included American English (Kay 2009; Waite 2012). This generated a list of 11 elderly synonyms that appeared at least 10 times in the dataset. Some synonyms entered the database in later decades reflecting expected variation over time in language use (e.g., senior citizen first appeared in 1949). Three nouns (i.e., aged, elderly, old people) appeared across all 20 decades; these were analyzed as a subset.
2. Be clear about the purpose of a Methods section.
For anyone coming to linguistics from the sciences, a methods section with be perfectly familiar. However, for those trained in literary studies, it might seem strange. As I said above, you will need to describe your choice of method and explain your motivations for that choice. Why does doing linguistic analysis in the way you’ve selected have explanatory power?
If you are new to this (heck even if you’re not), the chances are you’ll be leaning heavily (if not entirely) on methods developed by other scholars, which you are now applying to a new set of data with (presumably) the purpose of providing some sort of insight. This can be accomplished with some efficient summary and strategic citation. Again, let’s look at Fischer-Starcke’s example:
Keywords indicate dominant topics or themes of a text or corpus since the reason for their frequent occurrence in the data is their significance either for the data’s content or its structure. One function of keywords therefore is to indicate the ‘aboutness’ (Phillips 1985) of the data. This ‘aboutness’ is revealed by words which form semantic fields on a list of keywords, and which in their turn represent topics of the text. These semantic fields consist of words which express semantically related concepts. Consequently, the analysis of keywords from a dominant semantic field on a list of keywords is likely to reveal dominant meanings of the data. The relevance of keywords for detecting meaning in literary and non-literary texts has been shown, for example, by Scott (2002), Rayson (2008) and Culpeper (2009). While keywords cannot be identified intuitively, but only by using a quantitative comparative approach to the analysis, the classification of words into semantic fields is an intuitive process in this paper.
A couple of paragraphs later, that summary is followed by an explanation of her own choices:
Thee choice of reference corpora was based on three reasons:
a. Comparing the language of P&P with that of Austen’s other five novels allows for the identification of lexical features that are specific to P&P as opposed to Austen’s other novels. Meanings of the text can therefore be identified that are distinct from those of the other novels.
b. Using contempLit as a reference corpus allows for the identification of differences between P&P and its contemporary literature.
c. Comparing Austen to contempLit allows for the identification of features of Austen’s idiolect and authorial style compared to her contemporary literature. This, however, is only a minor issue touched upon in this paper.
Performing two keyword analyses serves two aims. First, words or topics that are identified as dominant on both lists of keywords are doubly legitimized as significant for the data. Second, a comparison between the two lists of keywords helps identify differences between the two reference corpora and shows how the compilation of a reference corpus influences the results of an analysis.
Notice, too, how she uses her explanation of her methods as an opportunity to clarify the scope of her analysis. In her point under (c), she notes that while this method sheds light on issues related to authorial style, that “is only a minor issue touched upon in this paper.”
Here is another example from Lancaster’s examination of academic writing:
The three corpora I examined offer a snapshot of academic writing at three distinct levels. The first corpus is the academic section of the Corpus of Contemporary American English, or COCA (Davies). This is a 91-million-word database of published texts from almost 100 different peer-reviewed journals across disciplines. The second is the Michigan Corpus of Upper-Level Student Papers (MICUSP), an online corpus of 829 high-graded papers written by senior undergraduate students and early graduate students across sixteen fields, totaling 2.2 million words (see Römer and O’Donnell for further details). The third is a corpus of 19,456 directed self-placement (DSP) essays collected from five years at the University of Michigan and Wake Forest University. These are essays written by newly admitted university students in response to varied prompts asking for evidence-based arguments, ones that support or challenge arguments from assigned reading(s). At 19.2 million words, the corpus is to my knowledge the largest of its kind. To be sure, these corpora are not perfectly comparable. The first-year (FY) essays are predisciplinary argumentative essays, while the COCA and MICUSP papers are a mix of essays, reports, critiques, research papers, and other academic genres across disciplines. Nevertheless, all the papers are from academic contexts and thus offer a broad testing ground for whether and how the TSIS templates are used.
The concordancing software I used to examine the corpora is AntConc (Anthony). The procedures involved three recursive steps: (1) targeted searches of wordings taken directly from the TSIS templates and from the literature on evaluation and metadiscourse (e.g., Hyland, Metadiscourse); (2) inductive analysis of word/phrase lists from the corpora generated by AntConc; and (3) qualitative interpretation of concordance results. The purpose of Step 1 was to examine how frequently the exact wordings from the TSIS templates were used, as well as how their frequency compared to alternative wordings for achieving the same function. The purpose of the inductive analysis, Step 2, was to identify other common phraseological patterns for achieving the moves, ones that were not picked up in the targeted searches. The purpose of Step 3 was to ensure form/function matching. For example, in many instances the formulation It could be argued that functions to entertain an objection; in other cases it is used to suggest the writer’s own view in tentative terms. I carefully read the context for each instance to make sure the wording was operating to realize the rhetorical move under analysis
In the first paragraph he describes his data, and in the second he explains his analytical method. Note that similar to Fischer-Starcke, he takes the opportunity to note some of the limitations of his study, specifically that his datasets “are not perfectly comparable.”
3. Provide a robust and thorough analysis of your data.
The analytical portions of your paper are really its heart and soul. In my experience, less successful papers are ones in which the analysis is not fully realized. The writer, for example, may make a broad claim about the data, but fail to follow that up with explanations of exactly how that claim is illustrated or expressed in the data itself. That failure forces the reader to stop and do that interpretive work herself or himself. Presumably you have gone through an interpretive process to arrive at a series of conclusions about what your data is showing. You need to get that information out of your head and onto the page.
Here, for example, is an example from Baker’s analysis of the words bachelor and spinster:
Table 6.3 shows that the relationship between the two collocates always occurs in descriptions of someone’s ‘bachelor days’. However, we can go further than this by looking for discourse prosodies. Lines 5 and 6 refer to ‘happy bachelor days’, whereas lines 4 and 6 refer to ‘memories’ of bachelor days. With line 7 there is ‘a memento’ of bachelor days, again suggesting that they are worth remembering. So bachelor days could be characterized as containing a discourse prosody of happy memories – again, a positive representation of unmarried men (although there is also an implication, particularly regarding the references to memories – that these bachelors were youthful). Interestingly, line 7 includes the phrase her bachelor days. Here, bachelor refers to a woman, suggesting that the fine gender distinction between bachelor and spinster is not absolute.
Here, Baker is unpacking his concordance lines. Note how specifically he refers back to the data — both using numbers and short quotations. In the last sentence of this excerpt, Baker also complicates his primary claim. He calls attention to an exception, adding nuance to the main thrust of his argument. This is what I mean by being thorough. Linguistic data is often messy and complex. Overlooking or omitting the ways in which that complexity is realized in your data can make your analysis seem facile or less than fully considered.
Here is another example from Bucholtz and Lopez’s analysis of Hollywood films:
In Bringing Down the House, language ideology is key to establishing Martin’s character Peter, a workaholic lawyer whose tightly-wound personality is signaled to viewers by his penchant for correcting others’ English. The appearance of Queen Latifah’s character Charlene, a wrongly convicted felon who seeks Peter’s help to clear her name, wreaks havoc on his orderly if emotionally unsatisfying life. Charlene’s speech is the target of Peter’s disapproving metapragmatic commentary early in the film, as they dine together in an upscale restaurant:
Further reinscribing the linguistic division between these characters, Peter’s condemnation of Charlene’s use of slang does little to alter her language: in her next turn, she uses an AAE structure (zero auxiliary in a wh- question; line 4) as she challenges his comment.
Although this is an excerpt rather than a series of concordance lines, notice how clearly it is presented in order to make specific reference back to it (as in the parenthetical in the last sentence).