In one of classes, my students were talking about the  Sean Penn’s article in which he interviewed the Mexican drug lord and escaped convict Joaquín Guzmán. For those unfamiliar with the piece, it was widely ridiculed after its publication. One target of that ridicule was the very style of the writing. As one outlet put it:

If you know only one thing about the Rolling Stone article itself, it should be this: The writing is terrible. Penn apparently fancies himself some combination of Bob Woodward and Hunter S. Thompson; he’ll tell you he farted when El Chapo shook his hand, but he calls it “expel[ling] a minor traveler’s flatulence.” It is the sort of self-serious dreck that no one but a celebrity could publish in a major American magazine, and that always inspires self-righteous grumbling from bona fide professional journalists.

So one of my students posed the question: If we did some basic corpus analysis, could we tell the writing is bad? This is a really interesting question. So I put together a small corpus for the purposes of comparison. Genre matters, of course. Comparing Penn’s article to famous works of fiction, for example, won’t tell us much, since the conventions of fiction are different than the conventions of magazine journalism. We need to compare apples to apples. I found a list of recommended pieces of journalism that included a set a celebrity profiles. Admittedly, this is hardly an unimpeachably objective list. But for the purposes of our small experiment, it works as a functional benchmark.

From that list, I put together the the corpus below. (Note there is an additional folder with versions of the files with the quotations extracted for checking their potential effect on the data.)

When we did a key word search with Penn’s article as the target corpus, and other profiles as the reference corpus, the pattern that jumped out was Penn’s relative overuse of first person pronouns.

ranktokenkeyness
4our106.529
7I93.532
9we83.238
10my78.623
36myself20.454
45us16.146

Penn spends a good portion of his narrative describing his own thoughts and actions or on the combined actions of himself and a companion. This is clear in the following passage (first person pronouns in bold):

We quietly make our plans, sensitive to the paradox that also in our hotel is President Enrique Peña Nieto of Mexico. Espinoza and I leave the room to get outside the hotel, breathe in the fall air and walk the five blocks to a Japanese restaurant, where we‘ll meet up with our colleague El Alto Garcia. As we exit onto 55th Street, the sidewalk is lined with the armored SUVs that will transport the president of Mexico to the General Assembly. Paradoxical indeed, as one among his detail asks if I will take a selfie with him. Flash frame: myself and a six-foot, ear-pieced Mexican security operator.

Alternatively, if we use the profiles as the target and Penn’s article as the reference, we find a very different pattern.

ranktokenkeyness
1he65.634
6said31.805
8they28.866
17his20.155

Penn’s article appears to significantly under-use third person pronouns, as well as what are called reporting verbs — particularly the most common verb of journalistic attribution, said. (The gender skew of these pronouns is also eyebrow raising.)

The keywords, therefore, reveal an unconventional focus: away from the words and actions of the purported profile subject and toward the author himself. This doesn’t necessarily measure “badness.” However, it certainly points to a piece of writing that significantly strays from the norms of its genre.

@