The Art of Quotation in the Age of Automated Transcription

After smartphones made audio recording ubiquitous and A.I.-powered robots made transcribing fast and cheap, that most sacred element of news, the quote, is going through its most profound aesthetic and ethical transformation since the invention of the portable tape recorder

Quotation is a sort of writer’s abdication. “The most obvious attraction of quotation,” Janet Malcolm once told The Paris Review, “is that it gives you a little vacation from writing — the other person is doing the work. All you have to do is type.” The price of such a voyage was once the plodding work of transcription, but the advent of automated services — quotation hardly even requires typing anymore — has opened new vistas for ever more sprawling vacations and the further eradication of a writer’s interceding presence. “There is a reason beyond sloth for my liking of quotation at length,” Malcolm said in that interview. “It permits you to show the thing itself rather than the pale, and never quite right, simulacrum that paraphrase is.” When journalists turn to dispassionate machines to process raw audio and provide them with “the thing itself,” is something lost? Some writers have resisted hazarding the experiment. “I don’t trust those things and do it all myself like a Luddite!” New Yorker staff writer Michael Schulman told The Fine Print. But others, whether because of the requirements of expediency, the carpal-tunnel threatening risks of massive amounts of manual transcription, or pure curiosity, dove in and found themselves wondering how it’s affected their work — not just whether automated transcription makes the process easier, but whether it has changed the very shape of the work produced.

(Disclosure: This reporter has become hopelessly addicted to Otter’s A.I. transcription and has been increasingly self-conscious about the extent to which that shows. “This is feeling really Otter-y,” we’ll say at The Fine Print when too many untamed quotes roam freely across a draft. Sometimes, it’s, “Sorry it’s so Otter-y, wish I had more time.”)

The last major technological innovation to inspire a sea change in reporters’ relationships with their quotes was the introduction of easily portable tape recorders in the ’60s, though widespread and consistent adoption lagged for decades. The social anthropologist Jack Goody tracked how recording transformed his profession in 1987’s The Interface Between the Oral and the Written, evoking the parallel effect taping had in journalism. “Before the 1960s, virtually all the oral literature from simple societies was dictated since recording machines required the use of mains electricity or heavy batteries. With the appearance of the transistorized tape-recorder, these problems vanished; within a few years dictation gave way to transcription,” he wrote. “All earlier oral works are known only because they have been written down, usually by a literate member of that very society, possibly by the poet himself, an action that in itself may transform that composition to a greater or lesser extent.” So, while a reporter like Joseph Mitchell could offer up long monologues from his source’s point of view, the quotes were filtered first through his head and then through his notebook. Once interviews started to be regularly recorded, transcription became the standard, and some of those filtering layers were wiped out. The writer still shaped the context, but to a greater extent, subjects were expected to speak for themselves.

New Yorker contributing writer Rachel Monroe has picked up on some of the intricacies of that pre-recording journalism in cases where she’s written down quotes in her notebook while also recording an interview. “The quote that I’ve written down in my handwritten notes is maybe more succinct, or elegant, or logical or compressed or, whatever, a ‘better version.’ When I listen to the transcript, it’s like I’ve written down a distillation of what they said, and what they said is just a little rougher,” she told The Fine Print. Comparing the quotes she uses for accuracy to the more artfully constructed quotes of writers from an earlier era, it can feel like the old style is inaccessible or even inadmissible. “The technology makes it hard to defend doing that kind of thing because you have access to what they really said,” she said. Still, Monroe’s quotes tend to be cleaned up somewhat. “I don’t know if I’m supposed to admit this or not, but I don’t do perfectly verbatim. Even if the A.I. gives me that — am I going to get in trouble? I don’t know what we’re — I didn’t go to journalism school. Now I’m worried I’m telling too many truths. Ideally, the fact checker has the transcriptions and the audio, right? So they’re keeping me honest. I have no fear about that. But some people speak in such nice, concise, full sentences and full paragraphs. I know that I personally do not. I feel very aware of the fact that I do not and most people don’t,” she said. “I want the gist of what they’re saying to come across more than the awkwardness of the phrasing.”

For some reporters, like GQ contributor Rosecrans Baldwin, the limits imposed by recording have been comforting. “I’ve never trained as a journalist. I’m coming at everything as a fiction writer and, therefore, I operate in a state of anxiety and insecurity, feeling like everyone else knows how to do this job and I just keep making it up, figuring it out each time,” he said. “If I’m trying to get at some kind of truth, it helps to have a feeling of support, a sense of backup, that I’m not going to have to just rely on my memory.” He’s retrospectively distrustful of journalism whose form implies that the writer might not have stuck to this ethic. “Creativity with quotes, to me, was always a slippery slope, from the very beginnings of the New Journalism, from Truman Capote doing prison interviews entirely reliant on his supposedly great memory to someone like John McPhee. If you go back and read his book about oranges, when he’s in Florida, and he’s sitting in this orange grower’s office, and we get, it feels like, 14 pages of verbatim dialogue with no paragraph breaks, I do wonder in those moments, what’s the fidelity to the tape recorder?” Baldwin said. “With McPhee, I wouldn’t be surprised if it was 100 percent accurate. I also wouldn’t be surprised, in some cases, if there were liberties taken to smooth it out in a way that sort of wanted to achieve a dramatic or even just a plain information effect.”

And yet, Baldwin admits there’s been at least one instance where he’s smoothed out a monologue with his interviewee’s permission. In his book Everything Now, he lets the creator of a textbook for a literacy class on Skid Row in Los Angeles hold forth. “Something like in the third minute of us talking, I asked him some random question, really left field, something very obtuse, like, ‘Do you think history in Los Angeles is chronological?’ And he responded with a monologue, he just went off. It lasted, it felt like, three and a half minutes. It felt like he only took two gulps of air to get through it. It was very spontaneous, and it was marvelous, but it also was really disjointed,” he said. “I asked him afterward, I was like, ‘I’m gonna preserve all of the ideas, and all the sense of the thing, but I want to be able to put the reader in the seat where I was sitting, when I got to hear you say that. To do so in a way that will hold them, I want to just make a couple of little snips here and there.’ So if you listen back to the tape recording of what he said and how I rendered it on the page, they’re not exactly the same. I took a creative decision to not necessarily improve what he said, just make it better for the page. Especially because I worked with his consent on that, I feel okay about that, and I feel the book was better serviced without any real harm to how he presented himself and what he meant.”

Journalists aren’t the only ones who’ve adopted the new standards of recording everything. Politicians will now regularly record interviews as well. “If you’re doing an in-person interview, and it’s a formal interview, like you’re in someone’s office, normally their press secretary, or an aide will record it as well,” said New York Times Magazine contributing writer Jason Zengerle. One of the first times Zengerle recalls a subject pulling a recorder on him was when he was working on a 2006 story about Bob Woodward for GQ. “It was kind of a critical piece about Woodward and how he’d blown his reporting on Bush and Iraq, and he really didn’t want to do the interview,” he recalled. “I went to his house where his office is, and we were sitting downstairs. I think I was still using probably the little tape recorders and I put it on the table and started to record and he had his research assistant with him who lugged out this giant tape recorder, put it on the table, and pressed play to record as well. It was just such an intimidating move.”

That was far from the last time. “Another time, I did an interview in the Pentagon with some Defense Department official, and I remember they had an even bigger tape recorder,” he said. “I remember it as almost like an intimidation tactic. But now it’s really common, especially with iPhones, it’s so easy.” This dual recording had an amusing echo down the line on one recent story. “The fact-checker didn’t even have to come to me for the transcript or the tape. The politician had given it to him,” he said. “I don’t have a problem with that, as long as it’s the same transcript.”

After William Wordsworth and Samuel Taylor Coleridge published Lyrical Ballads in 1798, Wordsworth claimed to be experimenting with what happens when you introduce “the real language of men in a state of vivid sensation” into poetry. The introduction of the new, more wieldy tape recorders in the ’60s gave rise to new strictures in journalism ethics and a series of similar vernacular experiments in American art and journalism, some of which loudly announced the centrality of recording to their conception. When Studs Terkel published Division Street in 1967, the first in a series of oral histories that would win him a Pulitzer in 1985, he was upfront about his vexed relationship with the new technology. “On occasion, it might have become an inhibiting factor, making for self-consciousness, were it not for my clowning. I’d kick it, not too hard, in the manner of W.C. Fields with a baby,” he wrote in the introduction. “Yet, paradoxically, without my abused mechanical ally, this book would not have been possible.” In 1968, Linda Rosenkrantz, the founding editor of Sotheby’s Auction magazine, published Talk, a novel consisting entirely of dialogue pulled from transcripts. “I had the tape recorder running all summer,” she told Stephen Koch in his introduction to the 2015 New York Review Books edition, “even dragging the bulky monster to the beach. At first, there were about 25 different characters and fifteen hundred pages of single-spaced transcript, which I took close to two years honing down to the three characters and 250 pages.”

Perhaps the longest-running project to emerge from this firmament was Andy Warhol’s magazine, Interview. In 1968, head to head with Rosenkrantz, Warhol published his own transcript novel, but he wouldn’t start the publication based around that method until the following year. One of the side-effects of Interview’s longevity — the magazine’s September cover story made a splash last week — has been to reveal some of the artificiality behind the conceit. “Does anybody say that? So bizarre,” Chloë Sevigny said in a 2021 feature reflecting on something she supposedly said in a 1995 cover story. The writer and cultural theorist Mark Fisher connected Warhol’s project to the larger effects recording had on American society and politics while drawing out the duplicity behind claims of unvarnished transmission in recording projects. “Like Watergate, Interview was made possible by taping. The interviews, which ranged over the trivial minutiae of its subjects’ lives, were transcripts; they weren’t framed by the interposing persona of the writer. Yet Warhol understood that tape recording did not capture an unmediated real,” he wrote in an article about Celebrity Big Brother. “Rather — and as Warhol’s admirer Jean Baudrillard recognized — ubiquitous taping destroyed any illusion that such a real existed. Instead, there would now only be an anxious and unanswerable question: are those who are recorded performing for the tape or the camera? (Some said they felt that Nixon, at the heart of a White House riddled with recording apparatus, would often seem to say things for the benefit of the tape.)”

Though widespread recording made these projects possible, it also created an enormous amount of work. Some batty outliers claim to enjoy manual transcription — “I love transcribing. I’ve always loved transcribing interviews,” Sheila Heti, former interviews editor at The Believer and author of the most interesting transcription-based fiction since Warhol and Rosenkrantz, said on the Longform podcast in 2018. “You’re so attentive when you’re transcribing and there’s a level of attentiveness and care that turns into love.” — but, for most, it’s a grind. “I have weak wrists, as my physical therapist informed me recently. He’s like, ‘Wow, you’re really strong, you can deadlift a lot, but you’ve got these weak little wrists.’ I’m like, ‘Thank you so much, thank you so much for that lovely compliment,’” Daily Beast culture reporter Helen Holmes told The Fine Print. “I get little carpal tunnel things every once in a while just from writing.” And typing out hours of interviews can take forever.

When Zengerle was starting out at The New Republic in the ’90s, he would transcribe directly from his cassette tape recorder. “I would do dutiful transcriptions, I would not skim or paraphrase. I would spend hours doing each one. And it was much, much harder back then, because I’m technologically pretty incompetent — there were probably ways around this, I’m sure I could have used a foot pedal or something — but I would actually have the recorder and press play, and then pause and then rewind, to get it all while typing. So it would take forever,” he told The Fine Print. “I think I’d probably spend more time doing that than I did actually reporting or writing.”

Worse in some ways than the physical and temporal toll is the mental burden of transcription. “The worst part of it is just hearing your own voice,” said Zengerle. “You hear the stupid questions you ask. There’s a pause, and you should just let the pause go on until the person keeps on talking, and you interject and lose the thing you were looking for. That’s the really taxing stuff.”

At one time, Baldwin felt uncomfortable with the persona he’d sometimes take on in interviews. “I’m such a more of a dick to people that I’m writing about or interviewing than I am to my friends and family. I’m a very peaceful, hopefully, friendly, nice person, but sometimes you just have to press people a little bit,” he said. Nowadays, he’s less conscious of that, what he calls, “nasty little robot” quality when going through his transcripts. “I’ve just done it enough, I have so little ego involved in those moments that I really don’t give a shit. I don’t mind hearing my voice anymore. I don’t mind hearing me stumble through questions. It’s no big deal,” he said. “It’s not fun. I’m not gonna seek it out. I’m not like Chris Rock, listening to his performances over and over to fine-tune himself. But it’s not a big deal.”

While all of that can make automated transcription attractive, manual transcription has some real upsides. “A real downside to hating it is that it’s a really good thing to do,” Baldwin said. “In the person’s responses, in the subtext, I’ll hear a feeling that I didn’t see at the time when I’m interviewing.” That extra processing time isn’t something we have for most conversations, and that kind of sustained attention can grant a reporter authority in their writing. “In my regular life, I have conversations with people and have no idea what I said or what they said or what we were talking about,” Holmes said, “so processing what has gone on in the interview is a huge part of why transcribing things yourself is important.”

Widely available automated transcription services aren’t good enough for that to be lost yet. “Depending on the quality of the transcription service, it can be clunky and just straight up get things wrong,” said Holmes. “When that’s been the case, in the past, sometimes I’ve just abandoned the transcript that it spat out at me, and then just gone back and done it myself.”

Monroe shared an example of a particularly chaotic recent Otter transcript: “now I get bored check out big five rock for Brian COVID Bashevis rock I can’t do heavy damn cars really aways freeway for skate gallery yellow tablet we have a beautiful high off it has this little off the far away we move this roof up here and then make a bigger law afterwards that tribal who didn’t lift that or have a bedroom of blurred.” The whole of that transcript, she explained, was just word salad. “He had a really, really thick Texas accent and I think he had also had a stroke of some kind and so the A.I. did not know what to do with him,” she said. “Sometimes, I interview somewhere that’s noisy or a few different people — I just had one where I went out to lunch in Uvalde with five or six people and that transcript is pretty useless. It can’t tell who is who, some people it’s ignoring. It was at an Italian restaurant attached to the Exxon station in Uvalde and it just has loud Frank Sinatra music sometimes intruding. So that one I really did have to go back and relisten to and do myself.”

Baldwin hops between different transcription services, depending on what the magazine he’s working for at the moment is willing to fund. “I haven’t found anything that’s good enough yet. The very best transcription software I’ve seen is either Dragon, which I’m unwilling to pay for — I’m not Richard Powers lying in bed writing novels with my eyes closed — or the Google transcription software which they haven’t made available yet as a standalone app,” he said. “I don’t think Otter is good enough. I don’t think Transcribe is good enough. Once it reaches the point of the same accuracy as a human transcriber then I’ll be glad to adopt it and I’ll pay a monthly fee or whatever the fuck, but I just haven’t seen it yet.”

Zengerle, who started using Otter a couple of years ago at the urging of one of his editors, appreciates its imperfections. “The inexactness of it is actually good for me in the sense that it makes me keep on listening to it. I think if I weren’t listening to it, that would be a problem. If it was so good, that I didn’t have to listen to it, I think it probably would change my process. And I think I might feel a little silly listening to it again, if it was exact, like, ‘Why am I doing this?’ But having to listen to it again, I think, is actually good for the way I write,” he said. “Even when I use a [manual] transcription service, if there’s a part that I’m particularly interested in, I will go back and listen to the tape. Not because I don’t trust the transcriber, but just because I do find it valuable to hear it again. It comes alive a little bit more if you actually hear it, than just what it looks like on the page.”

Monroe also emphasized the importance of relistening to transport yourself back to the moment of the interview. “Listening to the whole interview, I can remember facial expressions and the feeling of being there with a person and sometimes that’ll provoke a nice little moment of description, or it’ll just feed more abstractly into the piece, just trying to capture the quality of what it’s like being with somebody,” she said. “Not all pieces do you necessarily need that, sometimes you just need the quote, sometimes the words are more just like information being transmitted, and I guess that’s the situation in which the automatic transcription is most useful. Then there are the situations where what you’re trying to create is richer than that.”

Most of the writers quoted in this story work on print pieces with relatively long lead times that can sometimes allow for those leisurely listening sessions, but Holmes turned to automated transcription because of the demands of a digital production schedule. “I never really intended to use it, but it was one of those moments where I had way too much to do and not enough time. So I did a little free trial,” she said. “I definitely understand philosophical, existential protests against it, but, honestly, at the end of the day, and there are a lot of nuances to it, I need to transcribe in my job a ton of interviews and it is simply not possible to do it all myself.”

Still, she knows something is different when she relies on an automated transcript. “I feel more confident about how I’m arranging it when I’m transcribing myself because you’re just more engaged with what’s being said. And the copy-and-pasting-ness of using a transcription service and just plunking the words in,” she said, “I don’t know if it has a significant impact on story quality or the way in which I go about writing and arranging it, but I do think it’s probably not nothing.”

Some readers have picked up on what they see as an over-reliance on quotation in recent years. “Everything I read has really bad quotes,” said Robin Kaiser-Schatzlein, a contributor to Harper’s, The New Republic, and The Baffler. “When you look at the way that people talk, it’s not necessarily how they would want themselves to be represented on the page, even as writers, and so I think that it takes a lot of care and consideration to look at what someone said and turn it into prose. And I think that brings up a lot of very important questions: Are you turning it into white American vernacular when you should or should not be? Are you condescendingly making something they said more clear than it really was? I don’t know how people resolve those. Personally, I think that most people just don’t talk about it, because it’s very sticky.”

When we talked about it, this reporter was reminded of what Paris Review softball pitcher Joshua Pashman said after appearing in our Vital Moments social column this summer. “I know I said quote me very literally, and you did, leaving in the ‘totally’s’ and the ‘um’s’ that I do when I’m speaking,” he said. When The Fine Print told him we just like human speech, Pashman said, “Oh, I thought you were kind of owning me.”

“I think about how my source wants to represent themselves and what they would want from it. I’m sure I don’t always get it right, but I don’t think that the transcript is how they want to be represented,” Kaiser-Schatzlein said. “I know for fucking sure, whenever I hear myself, the very few times, on a podcast, I’m just like, ‘I want to disown this entire thing. This makes no sense. This was not the argument that I wanted to make. This is not the idea I wanted to present.’ I was just talking out of my ass.”

Sometimes, people spouting off spontaneously can be more interesting and tell a reader more than what the interview subject consciously meant to say. “What’s the balance? You want to be true to how a person speaks, right? The actual cadence and grammar, and all of the quirks of the way that they use language. And then, sometimes there are moments when a person might stumble or contradict or repeat themselves or just get kind of weird in their language because of what they’re talking about,” Monroe said. “It’s just being able to read the moment and know if it’s a purposeful quirk of speech or one that doesn’t really add any additional meaning.”

Whatever the individual approach, they are each an effort to achieve one of the central tasks of journalism: comprehending voices other than your own. “Transcription is hard, but it’s very simple: You have to pay very, very close attention to what other people are saying. It is about listening, listening to them in conversation, and listening back to what they said, and making really sure that you are representing them correctly,” Holmes said. That includes the tiniest details, like deciding whether your subject said “going to” or “gonna.” “I’m going to try to adhere closest to how that person sounded when they said that word or those words. Particularly if they are anyone but a news anchor from Connecticut, I’m going to want to hold on to some of where they’re coming from and how it shows up in their voice,” Baldwin said. “This job can be done really lazily, and sloppily, and I don’t even know that the end result is that much worse. But, if you’re gonna do it at a high level, you can’t fuck the little things.”