IE 11 is not supported. For an optimal experience visit our site on another browser.

The hidden messages buried in your documents

The pitfalls of hidden "metadata" have been long known in computer-savvy circles, but these high-profile leaks are driving new efforts to keep a lid on metadata.
/ Source: The Associated Press

When the New England Journal of Medicine used a word-processing function to reveal that Merck & Co. had deleted study data about Vioxx and heart attacks, the pharmaceutical giant joined a long line of organizations bitten by information lurking in electronic files.

It's happened to no less than the White House, the Pentagon, the British prime minister's office and the United Nations.

Each time, making minor electronic adjustments to documents aired juicy details not meant for public disclosure — such as the true author of a file or sensitive data hacked from a final draft.

The pitfalls of such hidden "metadata" have been long known in computer-savvy circles, but these high-profile leaks are driving new efforts to keep a lid on metadata.

So sensitive is the topic for the U.S. government that the National Security Agency released guidance in December on how agencies can properly redact reports.

For the corporate world, several companies are finding success in selling tools to automatically scan for and remove metadata.

Metadata is data about data. A word-processing document, for instance, has metadata on who authored it, when someone saved it and what that person did to it. Microsoft Corp.'s Word has a "track changes" feature that preserves a file's original text and shows another person's edits. All that is metadata.

This information is designed to stick around because it can help people organize their files and collaborate with one another.

But because it doesn't show up when a document is printed and doesn't appear on screen in normal settings, it's easy to forget about.

Fears about the hazards of metadata led Microsoft to pull back on a planned feature in Vista, the upcoming Windows operating system. Originally, Vista was going to let users drag and drop files into certain spots on the desktop in order to label documents with personalized categories. Currently, grouping files by category is more laborious.

But Vista testers told Microsoft it might become "a little too easy" to apply the categories and have them stick permanently to the document, said Mike Burk, a Windows product manager. That could get ugly if a file categorized in "projects I hate" were e-mailed, say, to a boss.

(MSNBC.com is a Microsoft - NBC joint venture.)

Meanwhile, the next generation of Microsoft's widely used Office software, which includes Word, Excel and PowerPoint, will make it simpler to strip metadata from files before they are disseminated.

Even so, Gartner Inc. analyst Michael Silver says the problem will remain — metadata will exist in documents unless users make a point of getting rid of it.

"It's still a manual process. It's still something you have to remember to do," Silver said. "Any time you're relying on the user to remember something, there's a good chance that they'll forget."

Government agencies run into trouble
And so it wasn't surprising that when the White House posted a policy paper about strategy in Iraq last year, a quick command revealed the author. What was significant was that it was penned outside the administration, by Duke University political scientist Peter Feaver, a National Security Council adviser.

Earlier, a United Nations report on the assassination of former Lebanese Prime Minster Rafik Hariri developed new layers of intrigue when it was revealed that damaging accusations about Syria's involvement had been removed before publication.

The NSA's December paper ("Redacting with Confidence: How to Safely Publish Sanitized Reports Converted from Word to PDF") could have helped the Pentagon prevent its hidden-data episode last May.

Before posting a report in Adobe Systems Inc.'s Portable Document Format about a U.S. soldier who had accidentally killed an Italian secret service agent in Iraq, officials covered up classified information with black bars. But there's a difference between covering and deleting information. Readers simply uncloaked the text by cutting it from under the black and pasting it elsewhere.

Automated tools to help protect against metadata releases have existed for a while, but they are beginning to see wider use.

For example, Workshare Inc. sells a product called Trace (a free version can be downloaded) that scans documents for metadata and ranks the findings by risk level. One way a high risk is assigned is when a document lists all its authors and storage locations, because such data can guide hackers.

For most of Workshare's six years in existence, the company's customers were primarily lawyers, who are particularly sensitive about client information escaping to the opposing side.

But in the past year, Workshare has seen business expand to 60 percent of the Fortune 1000, said CEO Joe Fantuzzi. Revenue has surpassed $25 million, and Fantuzzi believes metadata protection is on the verge of being a must-have for corporate technology buyers.

"I think it could just break out, just like antivirus," he said.

Unlike antivirus protection, however, the tools to strip metadata from documents rest in everyday users' hands. Microsoft has long made it possible for users to erase telling metadata before documents are disseminated. The problem is that the steps are often cumbersome or obscure.

Microsoft hopes to resolve that in part by adding a metadata scan as a "file" menu option in Office programs. And a tweak in Vista could help.

Despite having yanked the drag-and-drop metadata tool, Microsoft is adding new labeling functions to Vista, including an ability to slap comments and a rating of one to five stars on documents. The twist is that the box that will facilitate this will also include an option for erasing the information.

"We've worked really hard to find a middle ground," Burk said. "We feel like we've got a good kind of solution that gives customers the benefit of metadata without exposing them to too much risk."

Gartner's Silver suggests that Microsoft could go further, by upgrading its Exchange e-mail server software to let companies automatically strip metadata from documents before they are e-mailed or posted online — a trick offered by third-party metadata protectors.

Burk said Microsoft might consider it.

Of course, wider use of metadata-scanning tools will reduce the juicy finds that have benefited journalists and others.

Richard M. Smith, a computer privacy expert at Boston Software Forensics, mined metadata to determine who in the British prime minister's office worked on a 2003 dossier on Iraq. He also has picked up clues about the provenance of video clips on al-Qaida-related Web sites.

Even in a world more attuned to the perils of metadata, however, Smith doesn't think the material will dry up.

"There are simply too many people who work in governments around the world and there is no way to educate them all about metadata," he wrote in an e-mail. "I expect to see a steady stream of slip-ups in the future."