U.S. Supreme Court uses corpus created by BYU professor Mark Davies

His billion-word online corpora aid study of language, culture

Published: Sunday, March 13 2011 10:45 p.m. MDT

BYU linguistics professor Mark Davies

Provided by Mark Davies

Enlarge photo»

PROVO — Linguists like to joke that you can tell a lot about a word by the words it hangs out with. And 50 years ago, "gay" was hanging out with "grave" and "brilliant" while "sex" palled around with "hygiene" and "conflicts." Today, the words "gay" and "sex" have much more controversial companions, illustrating not only a change in the grammatical and lexical structure of the English language, but a cultural shift as well — all seen through the study of words.

"I love linguistics," said BYU linguistics professor Mark Davies. "I love looking at how and why language changes, but I'm equally as interested in history and culture, and language can serve as a beautiful window on that."

Davies, regarded by many in the linguistic community as a standard setter, has created a window of more than 1 billion words, gathered from books, magazines, newspapers, academic sources and transcribed interviews.

His corpora, plural for corpus, the Latin word meaning a body or collection of writings used for analysis, are the largest, free collections of English words on the Internet, searched by tens of thousands of users each month, from linguists, teachers, students to district Judges and Supreme Court justices, all trying to make sense of this odd language we call English.

Corpora in courtooms

Thirty years ago, when lawyers or judges disagreed on a word's meaning, there were two solutions: dictionaries or telephone surveys.

"Both of them are unreliable," said BYU linguistics professor and department chair William Eggington. "Dictionaries are usually way behind the times and usually don't cover the full range of the meaning of the word, and in a dictionary, there's no way to measure frequency, how often this meaning is used. And surveys, they're hit and miss. But then the corpus comes along and changes everything."

Suddenly, instead of relying on stale definitions or unscientific survey methods that left room for doubt, judges and attorneys could turn to hefty databases that painted a much more accurate picture of words in context, he said.

These corpora are even finding their way into high-profile cases, like the March 1 Supreme Court decision, where Chief Justice John Roberts cited corpus data as a foundation for limiting the descriptive ability of 'personal' to people, not corporations.

AT&T had been asking for a "personal exemption" so they didn't have to reveal certain financial documents. It made perfect sense, the company argued, because legally a business can already be considered a "person."

Get The Deseret News Everywhere

Subscribe

Mobile

RSS