What Data Do I Need?
- Resources Project Guides
The type of data you will need to carry out a particular project will depend on a number of factors:
- The specifics of the course you are taking. A class in general linguistics, corpus, linguistics, or sociolinguistics will all have different areas of emphasis and requirements. Your instructor will give you guidance here.
- The questions you are posing in your research. Examinations of language attitudes with likely point you toward survey instruments and interviews. Investing how features are realized in student writing will likely require you to gather samples of samples of student writing (surprise!).
- The time you have and the scope of your project. For many projects, you will only have a university term at your disposal. And only a fraction of that can be devoted to the collection of data. So think about what will be doable in the time you have. Maybe this means that you take advantage of data that others have already collected. Or you limit you data collection by undertaking a case study (of an individual or communicative event) or a study of small number of samples — recognizing, of course, that this will limit the conclusions you can draw (which is not a problem, as long as you articulate those limits in your write up).
- The methods you intend to use. Are they quantitative? Qualitative? Both?
- Whether you want to collect your own data or make use of some of the already available data. (I have a list here.)
I’ve mentioned it in other project guides, but it’s worth emphasizing: As you hammer out the details of your research design look for models that make sense to you (and that you can cite when you write up your methods). Are you building a specialized corpus and using AntConc as an analytic tool? There has been a ton (!) of great research you can build from. Paul Baker published a study with Tony McEnery using a corpus compiled from The Sun (here). Laura Aull and Zak Lancaster analyze stance markers in a corpus of student writing (here). Alon Lischinsky has a study of annual reports (here). And Ruth Page investigates Twitter hashtags (here). All use AntConc.
There is work analyzing television shows (here), movies (here), music (here), menus (here), even t-shirts (here). Of course there is a lot of work on literature (like this). One of the exciting things about working with language is that we are surrounded by it. And it is always being produced and is changing, so there is new stuff to explore. But this is also what can make this work a little scary for those who haven’t done it before. There is just so much!
In my courses, the most successful projects have generally been strategic with their choices of data:
- A case study of a sibling’s linguistic self-identity as a “hillbilly”
- A study of comparison of involved vs. information production on Twitter by users with a few followers, a moderate number of followers, and many followers
- A study of social-class-related identity performance in a student-centered online discussion forum
- An autobiographical comparison of one’s academic writing in high school vs. college focusing on hedges and boosters
- A study in the changing use of the word sex in Cosmopolitan magazine
- A study of historical military slang terms in national service “yearbooks”
- A study of subtitling patterns in the television show Here Comes Honey Boo Boo and their ideological implications
- A case study of student-tutor interactions at a university writing center
How the data would be gathered for these kinds of projects is fairly self-evident, with the exception of the Twitter study. That particular student had some programming experience, so he was not intimidated by the prospect of scraping some Twitter data. That said, I have had students with no programming experience compile small, specialized Twitter corpora using freely available tools.
The proposed projects that have needed some re-thinking have sometimes been overly ambitious or more topic- than question-driven:
- I have had a number of students propose to analyze Wikipedia’s accuracy. Sure, this has been attempted, but how could this be accomplished in a term and to what end?
- Students are also often interested in investigating cause/effect relationships, like the effect of popular culture on particular speech communities. This is an interesting question, but the constraints of time make sussing out such relationships extraordinarily difficult.
- Sometimes students have the inclination to use linguistic analysis to confirm popular, zeitgeist-y impressions (like millennials are narcissists). Trends in social psychology can be compelling. Again, however, they are complicated and tough to do well without some substantive reading and training. Typically, questions of narcissism, for example, get operationalized as pronoun counting. We see this a lot (examples here, here, and here and a critique here). Changes in first person pronoun frequencies don’t necessarily mean a rise in self-involvement, but may result from a variety of pragmatic or genric exigencies (see, for example, here). My advice: It’s fine (maybe super interesting) to investigate things like pronouns, but focus on their use at the level of meaning-making in the data that you have.