Get the latest updates as we post them — right on your browser

. Last Updated: 07/27/2016

Advanced Scanners Lending Historians a Hand

BERLIN -- Throughout the 1980s, Sascha Anderson was one of the leading voices to speak out against the East German government and its dreaded secret police, the Stasi.

But his credibility gradually evaporated after the Communist government's collapse as rumors about him acquired the weight of proof: he had been informing on his dissident compatriots all along.

He had been told that his Stasi file had been destroyed. In fact, it was manually reconstructed from some of the millions of shreds of paper that panicked Stasi officials threw into garbage bags during the regime's final days in the fall of 1989.

Now, if all goes as planned by the German government, the remaining contents of those 16,000 bags will also be reconstructed.

Advanced scanning technology makes it possible to reconstruct documents previously thought safe from prying eyes. And although a great deal of sensitive information is stored digitally these days, recent corporate scandals have shown that the paper shredder is still very much in use.

"People perceive it as an almost perfect device," said Jack Brassil, a researcher for Hewlett-Packard who has worked on making shredded documents traceable. If people put a document through a shredder, "they assume that it's fundamentally unrecoverable," he said. "And that's clearly not true."

The art of reconstructing shredded documents has been around for as long as shredders have. After the takeover of the U.S. Embassy in Tehran in 1979, Iranian captors laid pieces of documents on the floor, numbered each one and enlisted local carpet weavers to reconstruct them by hand, said Malcolm Byrne of the National Security Archive at George Washington University.

That episode helped convince the U.S. government to update its procedures for destroying documents. The expanded battery of techniques now includes pulping, pulverizing and chemically decomposing sensitive data. Yet these more complex methods are not always at hand in an emergency, which is why the vagaries of de-shredding will be of interest to intelligence officials for some time to come.

Modern image-processing technology has made the rebuilding job a lot easier. A Houston-based company, ChurchStreet Technology, already offers a reconstruction service for documents that have been conventionally strip-shredded into thin segments.

The Stasi archives are a useful reference point for researchers tackling the challenge. In 1995 the German government commissioned a team in Bavaria to reassemble the torn Stasi files one by one. Yet by 2001, the three dozen archivists had gone through only about 300 bags, so officials began a search for another way to piece together the remaining 33 million pages a bit faster.

Four companies remain candidates for the job, including Fraunhofer IPK of Berlin, part of the Fraunhofer Gesellschaft research institute, which helped develop the MP3 music format. The institute is drafting plans to sort, scan and archive the millions of pages within five years, drawing on expertise in office automation, image processing, biometrics and handwriting analysis as well as sophisticated software.

"It's more than just the algorithms about the puzzles," said Bertram Nickolay, the head of the security and testing technologies department. Indeed, the archive is a massive grab bag of randomly torn documents, many with handwritten and typewritten text on the same page. Combining all these technologies in a project of this scope "is on the borders of what's possible," Nickolay said.

His system's accuracy rate is about 80 percent. "It will take time for the algorithms to be optimized," Nickolay said.

Some of the companies competing for the job concentrated on the shape, color and perforations of the shreds, while other contenders opted for semantically driven systems, which looked for keywords and likely text matches.

The Fraunhofer plan is to combine its smart scanning software with the know-how of the Zirndorf archivists, who have amassed years of experience working with these tiny pieces of history. After all the shreds have been scanned (at 500 dots per centimeter), the interactive software will suggest possible matches, which an operator can accept or reject.

While Fraunhofer IPK eventually plans to use a similar technique, several companies say they can do so already.

ChurchStreet's software analyzes the graphical patterns that go to the edge of each piece. First, workers paste the random shreds onto standard sheets of paper, which takes three to seven minutes per page. The pages are scanned, and software analyzes the shreds for possible matches.

Cody Ford, the company founder, said the ChurchStreet service can recover up to 70 percent of a document's content, although he stressed that the goal was to get blocks of information rather than to recreate the original formatting.

Cross-shredding makes the job a lot trickier. "The problem is not whether it's possible with the software, which is possible," said Werner Vugeli, the managing director of the German office of SER Solutions, a company in Dulles, Virginia. "The problem is how to scan these documents."

In Germany, meanwhile, a decision about whether to proceed with the reconstruction of Stasi documents is not expected before September.

Anderson, the dissident discredited by the files, is among those who hope the project goes forward. "Of course I would have preferred that they weren't found," he said by phone from Frankfurt. "But I realize that it's a unique chance for a society to have access to this information."