When faced with the daunting task of combing through six years’ worth of Michael Olesker’s columns to check for plagiarism and attribution problems, Sun editors had a choice: They could conduct a painstaking manual search that would take weeks, or rely in part on LexisNexis CopyGuard, a new software product that promises to reduce the plagiarism-detection process to just hours.
The stakes were high for the newspaper. The Sun’s decision last month to ask a venerated metro columnist to resign on the same day it learned that a handful of his recent columns contained arguably minor attribution problems drew heated accusations of overreaction from some journalists inside and outside Calvert Street, as well as from readers. Moreover, Olesker had consulted an attorney and was contemplating legal action against the Tribune Co.-owned daily, according to a newsroom source familiar with the situation. If The Sun were able to establish a long-term pattern of ethical violations by Olesker, it might more quickly put the controversy behind it.
The stakes were also high for LexisNexis, the searchable news and legal database, which is actively marketing the subscription-based CopyGuard, released in August 2005, to its existing media clients. A successful application of CopyGuard to a high-profile plagiarism case would provide real-world evidence of the program’s usefulness.
The science behind CopyGuard was engineered by iParadigms, the Oakland, Calif., software developer of Turnitin, a popular academic plagiarism-detection tool. Using iParadigms’ “pattern-matching” engine, CopyGuard compares each submitted document to more than 6.1 billion documents in LexisNexis and web databases, and then identifies any matching phrases and their respective sources.
After testing CopyGuard to see if it would have detected a lack of originality in Olesker columns already known to be problematic, Sun editors determined the software was unreliable.
“The test that I saw didn’t convince me that it was going to work well,” says Dave Rosenthal, the Sun’s assistant managing editor for administrative matters. “When we tried it on some examples of stories, it didn’t give the results that we expected.”
(The Sun’s initial test was conducted following a LexisNexis sales presentation set up before the Olesker incident arose and happened to coincide with it.)
Instead, public editor Paul Moore and Olesker’s immediate supervisor, city editor Howard Libit, spent about three weeks combing through more than 600 columns, manually plugging into the LexisNexis news database suspicious phrases, and numerical data likely reported elsewhere first, and then comparing the search results for violations of the newspaper’s reporting standards.
“I couldn’t begin to tell you how many hours” the review took, says Libit, who reread every column Olesker had written since January 2000. “A lot of time at night, weekends, and during the day. Because I really wanted to be thorough and fair.”
Moore sums up the experience this way: “I’m completely burned out.”
Their work appears to have paid off. In his Feb. 5 column, Moore noted five new incidents in Olesker’s stories where the columnist “used material from other sources without meeting The Sun’s standard for attribution.” Among these were instances of Olesker inappropriately lifting language or reporting from a letter to the editor and an obituary in The New York Times, a press release, and from syndicated columnist Clarence Page. Moore also reported an incident of plagiarism previously identified, but not published, by City Paper. The public editor wrote that these five instances were a subset of a larger group of newly identified problem columns, but declined to elaborate in an interview on his other findings.
“Rest assured, all of that would have been done well within a day,” insists Malik AboRashid, director of sales at iParadigms. “This is something that took a news staff three weeks to do, and we’re saying it could be done within an hour.”
So why didn’t the software work adequately when The Sun tested it?
IParadigms president and CEO John Barrie says that the initial failure of CopyGuard to detect some problems with Olesker’s columns alerted the company to a technical shortcoming in the software’s “matching engine.” While it was adequate for the academic market, the journalism context would require a more “sensitive” matching algorithm. The CopyGuard technology, says Barrie, has since been refined.
After being told that The Sun had expressed dissatisfaction with its initial test of CopyGuard, LexisNexis agreed to provide City Paper with a trial use of the software. A test conducted last weekend suggests that CopyGuard, in its current form, would have identified 10 out of the 12 Olesker attribution omissions that have already been publicly reported. The test also identified at least five additional instances of Olesker’s unattributed use of reporting or language previously published elsewhere. Among these were multiple new borrowings from the New York Times, and from the weekly Forward and Baltimore Jewish Times newspapers.
It took two people about five hours apiece to use CopyGuard to jointly evaluate the more than 600 Olesker-bylined stories published in The Sun since 2000. (The time it takes an actual editor to evaluate CopyGuard’s analysis will vary, of course. As LexisNexis senior vice president Elizabeth Rector points out, “It’s not a solution where you submit the work product and the horn blows and the light flashes. It is a product that gives you the tool to be able to decide for yourself whether you have a problem.”)
LexisNexis envisions CopyGuard being used by content publishers not only to police their own content but also to determine if their intellectual property is being misappropriated by others. Indeed, the City Paper trial revealed at least one instance where Olesker’s original writing may have been copied by another journalist
A more detailed summary of City Paper’s test results accompanies this article below.
LexisNexis will not disclose the cost of a CopyGuard subscription, or reveal any current customers, though Rector says the company already has some customers using the software.
IParadigms’ president sees a bright future for plagiarism-detection technology in journalism. “We’ve pretty much got the academic market wrapped up,” John Barrie says of his company’s Turnitin software, which, according to the company, is used by university systems in more than 80 countries. “It’s a done deal. Now we’re moving into other markets, very rapidly. Every single person in every single company that deals with text in any aspect of their job has got the same problem that every academic has, but they’ve got it in spades. If they fail on that problem, then they put the integrity of their institution at risk.”
The use of such software by newspapers would be a positive development, says Craig Silverman, who runs Regret the Error, a web site that catalogs media corrections. “I think these kinds of technological measures should be evaluated for use by newspapers,” says Silverman, who is writing a book about accuracy in the media. “The school I went to as a kid has installed anti-plagiarism software to check student papers. Yet many of North America’s leading newspapers won’t even bother to the check the previous work of a staffer found lifting material, let alone invest in technology. Something’s wrong here, and it is in the best interest of the press to raise the bar and institute better practices for preventing and investigating plagiarism.”
Tom Rosenstiel, of the nonprofit Project for Excellence in Journalism, offers a more cautious endorsement. “I think, by and large, this is not one of the biggest problems facing the American news media,” Rosenstiel says. “We have a lot of other issues, starting with declining circulation, declining trust, cutting of resources, a short-term investment mentality, and lack of vision in the future. If you’re going to spend money on software to patrol plagiarism, that’s fine. But it’s not probably issue number one.”
After previewing a refined version of CopyGuard late last week, the Sun’s Dave Rosenthal modified his initial impression of the software. “[LexisNexis] gave us [another] demonstration, and it worked much better than the first time around,” Rosenthal says. “And now we’ll be testing it further, to see if those results hold up.”
City editor Libit says that he would have welcomed the opportunity to supplement his manual review with plagiarism-detection technology, but that he wouldn’t have wanted to rely on it entirely. “There was a lot of value to going back and rereading,” he says, emphasizing a point also made by public editor Moore in his final assessment of Olesker: that the overwhelming majority of the columnist’s prodigious output was enterprising and original. “Mike is and was a very good columnist,” Libit says. “And when he was out in the community reporting, there was a body of very good work.”
Anne Howard contributed additional research to this article.
Following is a summary of the results of a one-day trial by City Paper of LexisNexis CopyGuard. The test attempted to duplicate a three-week manual review by The Sun of six years’ worth of Michael Olesker’s columns, to determine whether the plagiarism-detection software would yield the same or better results in less time.
Released in August 2005, CopyGuard is a joint product of LexisNexis, the aggregator of news, business, and legal content, and iParadigms, the software developer behind Turnitin and iThenticate, two other plagiarism-detection software tools designed for the academic and corporate markets, respectively.
For the purposes of a trial use, LexisNexis permitted City Paper to run only 120 documents through the CopyGuard system, but by merging several of Olesker’s columns into each tested document it was possible to submit for analysis nearly all of the more than 600 stories carrying an Olesker byline published by The Sun since January 2000. It took two people with moderate computer skills about five hours each to jointly complete the evaluation.
Of the 12 Olesker columns previously reported in either The Sun or City Paper as containing some degree of attribution problem, CopyGuard identified all but two. The software did not discover the columnist’s unattributed use of boilerplate analysis from an August 27, 2004, New York Times article by David Leonhardt. Neither did it identify one of the more noteworthy examples uncovered in the Sun’s recent review: that Olesker apparently relied without attribution in one of his columns on information and language previously published only in a letter to the editor of the Times.
CopyGuard did detect a number of additional instances—not previously reported—where Olesker appears to have neglected to cite his sources. Five of these new cases are detailed below.
Based on this test, it appears conceivable that the Sun’s review of six years’ of Olesker columns could have been largely completed in one or two days, if editors were convinced of CopyGuard’s reliability.
The CopyGuard test also revealed at least one instance where Olesker’s original content may have been used without attribution by another journalist. In a March 17, 2002, story, Olesker wrote that a new book by former Newsday columnist Jimmy Breslin “reminds us that the journey to America is more than a Kodak moment at the Statue of Liberty.”
A month later, Charlotte Observer book critic Polly Paddock observed that the same book “offers a reminder that the journey to America is more than a Kodak moment at the Statue of Liberty.”
CopyGuard displays its analysis in a variety of “originality reports,” which allow users to visually compare the submitted text with the corresponding portions of matching text from LexisNexis and web databases. The program registers a match between documents even when the corresponding phrases or paragraphs are not word-for-word matches, making it easy to detect instances of close paraphrase, or what academics sometimes call “plagiaphrase.”
Most of the new instances of Olesker’s attribution problems detected by CopyGuard are of this type:
(Note: CopyGuard does not detect plagiarism per se; rather, it compares text documents and determines the degree of “originality” between them. Because most language duplication across news articles is routine and acceptable—say, when multiple reporters refer to the same publicly available document, or quote from press conferences—CopyGuard doesn’t obviate the need for editors to make judgment calls about ethical violations. The following examples would not necessarily result in a finding by the Sun’s editors of ethical violations. But they would likely raise an editor’s eyebrows and merit further investigation.)
On Nov. 23, 2001, Leonard Fein discussed in the Forward weekly newspaper a Congressional Budget Office report:
“The best available data from the Congressional Budget Office tell us that from 1979 to 1997 the top 1% of the population nearly doubled the share of national income after taxes it receives. In other words, by 1997 the 2.6 million Americans with the highest incomes had as much aftertax income as the 100 million Americans with the lowest incomes. Similarly, the 20% of Americans with the highest incomes earned as much as the remaining 80%. Now, under the House proposal, Americans with incomes of more than 5 million per year would get an additional tax benefit of 500,000 over the next four years, this above and beyond the very substantial tax benefits they received from the original tax-cut package enacted in June.”
Four days later, Olesker offered a similar take on the CBO’s numbers:
“From 1979 to 1997, the top 1 percent of the population nearly doubled the share of national income after tax cuts, according to the Congressional Budget Office. In raw numbers, the 2.6 million Americans with the highest incomes had as much after-tax income as the 100 million Americans with the lowest incomes. And the 20 percent of Americans with the highest incomes earned as much as the remaining 80 percent.
And the new proposal? Americans with incomes of more than $5 million a year would get an additional tax break of $500,000 over the next four years - over and above the hefty breaks they received from the tax-cut package in June.”
On Dec. 20, 2002, Jonathan Sheir wrote in the Baltimore Jewish Times about a young cancer survivor with a heart of gold:
“They found Enviro Solutions, a South Carolina-based company that refurbishes and resells old printer cartridges and then donates some of the proceeds to a charity of the participant’s choice.”
And a few paragraphs later:
“Eli and his mother drafted a letter headlined ‘Cartridges for a Cure!!’ and sent it out to businesses and organizations in the Baltimore metropolitan area.”
On May 1, 2003, Olesker retold the story of little Eli Kahn:
“His family found the company, Enviro Solutions, a South Carolina-based firm that refurbishes and resells old printer cartridges and then donates some of the proceeds to a charity of the participant’s choice.
“To find local participants, Eli and his mother drafted a letter, titled ‘Cartridges for a Cure,’ and sent it to businesses and organizations in the metro area.”
In his column, Olesker does not mention the earlier Jewish Times article.
On Feb. 3, 2004, New York Times reporter Edmund L. Andrews reported President Bush’s proposed budget cuts. Three days later, Olesker also discussed the proposals. (The Sun’s Julie Davis and David Greene also wrote about Bush’s budget proposals, but Olesker appears to have relied on the Times’ reporting.)
From Andrews’ article:
“The Bush plan would eliminate 65 programs and cut back 63 others, but the total savings for next year would add up to only $4.9 billion. By comparison, the White House is predicting that the federal deficit will hit $521 billion this year and $364 billion in 2005.”
From Olesker’s column:
“The total savings from these 65 eliminated programs, and these 63 diminished programs, would add up to a piddly $4.9 billion. This, with the White House predicting a $521 billion federal deficit this year and $364 billion in 2005.”
“At the same time, he is pushing Congress to make permanent most of his tax cuts from 2001 and 2003, which with other proposed tax cuts would increase the federal deficit by an additional $1.1 trillion over the next 10 years.”
“This, with Bush pushing Congress to make permanent most of his tax cuts to the wealthy—which would increase the federal deficit by an additional $1.1 trillion over the next 10 years.”
“Though Mr. Bush has proposed big increases in spending on education for the No Child Left Behind initiative, the administration’s list calls for eliminating or cutting back more than a dozen other education programs. Among other actions, it would eliminate $34 million spent to help pay for secondary school counselors; $30 million for a program in schools to combat alcohol abuse; $38 million for projects to provide employment services to people with disabilities; and $18 million for a national writing project.”
“While grandly promoting his No Child Left Behind initiative, George W. Bush calls for eliminating and downsizing more than a dozen education programs, including $34 million for guidance counselors, $30 million to combat alcohol abuse among adolescents, $18 million for a national writing project.”
On Dec. 6, 2005, in one of his last columns, Olesker profiled Seymour Hersh on the occasion of the celebrated investigative journalist’s speech at Baltimore’s Park School. In his column, Olesker frequently quotes Hersh according to journalistic convention. But in one paragraph, Olesker appears to have copied background material directly from one of Hersh’s New Yorker articles, without indicating its source.
From Hersh’s New Yorker story:
“For example, Murtha reported that the number of attacks in Iraq has increased from a hundred and fifty a week to more than seven hundred a week in the past year. He said that an estimated fifty thousand American soldiers will suffer ‘from what I call battle fatigue’”
From Olesker’s column:
“In his famous Nov. 17 speech, Murtha reported that the number of attacks in Iraq has increased from 150 a week to more than 700 a week in the past year. He said that an estimated 50,000 American soldiers will suffer from ‘what I call battle fatigue.’”
In the March 13, 2000, issue of the American Prospect magazine, Edward Cohn described businessman Michael Milken thus:
“ . . . he has also expanded the Milken Family Foundation, a philanthropic venture best known for its education awards, and is a lead investor in Knowledge Universe, a family of companies specializing in early childhood education.”
In June 2004, this is how Olesker described Milken’s post-jail, post-cancer charitable enterprises:
“Also, he has expanded the Milken Family Foundation, best known for its education awards. He’s a leader in Knowledge Universe, a mix of companies specializing in early childhood education.”
Buying in on It (7/12/2006)
Who's On Board With The City's New "Get In On It" Campaign?
Raw Vegan Eats Nachos in Fells Point
Behind the Glass (6/28/2006)
After 72 Years In the Same Spot, a Legendary Hollins Market Tavern Is Still Thriving--Though Its Bar Business Is All But Bellied Up.
812 Park Ave.
Baltimore, MD 21201