Why computer scientists and linguists don’t always see eye-to-eye

Linguists have many theories about how language works. How much should computer scientists who work on language care?

Linguists have many theories about how language works. But how much should the computer scientists who work with language care? (CC image courtesy of Flickr/surrealmuse)

“You’ve just explained my entire life to me.” This was the last thing I was expecting to hear from Lori, my graduate advisor, in the midst of a discussion of my career plans. I gave a stiff smile and shifted uncomfortably in my chair. “What you just said,” she continued, “that’s why I’m here, not in a linguistics department. In a linguistics department in the 80’s, I might have felt like a hypocrite.”

What I’d said hadn’t been a deliberate attempt to enlighten a researcher 30 years my senior. I’d simply mentioned my preference for application-oriented research groups, because I care more about producing useful insights than true theories. Apparently, though, the distinction between usefulness and truth struck a chord with Lori: in the field she and I work in, what’s useful isn’t always true, and what’s true is often not useful. Lori, a linguist in a school of computer science, has found her career path to be largely determined by that distinction.

Choosing Usefulness

IBM's Watson computer system competes against Ken Jennings and Brad Rutter in a practice Jeopardy! match in 2011. (Image credit: IBM)

IBM’s Watson computer system competes against Ken Jennings and Brad Rutter in a practice Jeopardy! match in 2011. (Image credit: IBM)

I study natural language processing (NLP) – how to get computers to understand and produce both speech and written text. A quintessential example is IBM’s Watson. To answer each Jeopardy question, Watson first had to match the question against thousands of documents (understanding written text). It then had to compose a response (producing text) and speak that response aloud (producing speech). Watson is just one member of the rapidly growing swarm of NLP-based technologies: spam filters, Google Translate, Siri – all blossomed from academic research in NLP.

As you might expect, such projects rely heavily on analyzing language. For Siri to produce speech, it has to translate letters into sounds; to interpret a sentence, it helps to identify the verbs, subjects, and objects. These problems are the bread and butter of linguistics, so naturally, when researchers first started coaxing computers to use human language, they turned to linguists for help.

Lori was one of those linguists. When her struggles with tenure-track publishing drove her from her first academic post, she could have stayed in linguistics, spending her days debating the correctness of various formalizations of English and Japanese grammar. Instead, she accepted a position in automatic translation systems, bringing her expertise to the domain of artificial intelligence.

This was an ideologically laden choice. Linguists have traditionally viewed their job as one of scientific discovery: to construct comprehensive, rigorous descriptions of how human language works, much as a physicist might create a mathematical model of carbon atoms or star formation. Linguists see their contentious, passionate battles over sentences’ underlying structure (and they are sometimes quite passionate) as a struggle to defend the one truth. Lori, meanwhile, had serious doubts about whether any of those theories were fully true – none could account for every linguistic phenomenon. Marching on the front lines of dogmatic, theory-driven linguistics, then, would make her feel like a hypocrite, peddling theories she didn’t believe herself.

What she did see in those theories, though, was usefulness; to quote statistician George E.P. Box, “All models are wrong, but some are useful.” Correct or not, linguistic theories provide a framework for describing language in a precise way, which allows them to make testable predictions about language use. When we test those predictions against real-world data, we can be enlightened to previously unspotted linguistic phenomena.

Linguistic theories are also useful in another way: their precision is just what you need if you wish to instruct a computer, and even an inaccurate theory can work well for that. In NLP, descriptions of language are judged not by how true they are, but by how well they enable a system to function. Nobody cares whether Google Translate analyzes a sentence based on Chomsky’s linguistic theories, cognitive psychology, or Kabbalistic mysticism, so long as the right English sentence comes out the other end.

This view of theories – as mere tools for understanding and applications – would have been anathema to many traditional linguists, but it’s a core aspect of NLP. It took a linguist with Lori’s perspective, then, to make the leap to this new field.

Still Too True to be Useful

Perhaps unfortunately for Lori, NLP has since become one of the biggest flash points in the tension between truth and usefulness.

At first, researchers in the field carefully translated the rules of language, as divined by linguists, into computer programs. After all, these rules were supposed to be decent approximations of the truth, so however imperfect they were, they should still be useful.

Bag of Words art

Many computer models treat documents as “bags of words,” ignoring all internal structure. Carnegie Mellon University’s Gates Center for Computer Science sports an art installation that takes this concept quite literally. (Image credit: Stanford NLP Group)

But the researchers began to find that these supposed rules were more like guidelines, and that for some tasks, they could do better by ignoring linguistics altogether. Say you wanted to figure out the topic of this article. You could carefully analyze the structure of the text and its meaning – or you could just count the number of times I’ve written “linguistics,” “NLP,” “mysticism,” and so on. Comparing those numbers would give you a pretty good idea of what topics the article is about. Of course, nobody thinks a bunch of word counts is a “truer” way to represent the document. But if the goal is simply to be useful, then theoretical rigor can be replaced with easier-to-implement mathematical tricks.

Increasingly, linguistics was to NLP as quantum mechanics was to 1980’s-era electronics manufacturing: the industry knew that classical physics had been proven wrong, but it was close enough, and thinking in terms of quantum mechanics was simply impractical. Likewise, for the applications facing NLP researchers, the “wrong” theories – those that ignored linguistics – were proving much more useful.

This shift dramatically altered the dynamic between linguists and computer scientists. The change was best summed by IBM researcher Frederick Jelinek, who famously said, “Every time I fire a linguist, the performance of the speech recognizer goes up.” What NLP really needed, the thinking went, was not linguists, but statisticians. The same emphasis on usefulness that pulled linguists like Lori to the field had rendered them increasingly irrelevant to it.

Truth Makes a Comeback

Still, all is not lost for linguistics in NLP. Even at the height of this trend, many applications still required input from linguistics (for instance, someone had to suggest non-braindead ways to represent sentence structure). Lori did, after all, have a job all these years. And for some applications, researchers are now pushing up against the limits of what they can do without linguistics. Companies like Google are hiring linguists right and left to help them analyze the language samples they train their systems on. Much like computer manufacturers, who have miniaturized so far that they do have to worry about quantum effects, NLP systems have advanced to the point where the small amount of wrongness introduced by questionable linguistic models is starting to matter.

For me, this new trend is mostly an encouraging one: as a student of Lori’s, I’ve spent much of my Ph.D. on incorporating linguistic insights into NLP, so it’s nice to know there’s a market for that. At the same time, it leaves me struggling to avoid exactly the same sort of conflict that Lori steered clear of herself: a misplaced commitment to theoretical correctness. I told her I value usefulness, but I find it all too easy to slip into theoretical rabbit holes, making every corner of every system as linguistically sound as possible. Periodically, I have to remind myself to heed the insights of the past few decades: sure, make it elegant, and yes, make it right, but most of all, make it work. Because in this field, what’s useful beats what’s merely true every time.

Advertisements

3 thoughts on “Why computer scientists and linguists don’t always see eye-to-eye

  1. Just curious—when was this written? (I used to do research myself in NLP and computational linguistics, and remember the Jelinek quote from the time. Didn’t he add “…and every time I hire a statistician, the performance goes up again!”?)

    Like

Thoughts?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s