Why don't you tell everybody what the fuck you gotta say?

I'm sure some of you have read this story (or in French) about using dictionary attacks to figure out what the blacked out text is in partially declassified documents. To explain it in simple terms, since you know how long the hidden word or phrase is, you can vastly reduce the possibilities and make an educated guess as to what is hidden.

I've written software in the past for detecting graphical collisions in text layout, so the sizing and matching part was easy. It's actually a part of the Windows API, so it was really easy to create a prototype of the software described in the article (and it's a lot faster than I thought it would be):

You may remember my goofy little program that spit out new text based on your diary. That uses Markov chain analysis to create a statistical model of language (even the nuances of an individual writer). I've done a lot of work with Markov chain software, and I think that coupled with other contextual tools (a la automated grammar checkers, as well as “guesser” functions in OCR and voice recognition), you could get far, far better accuracy than they were able to get in the article.

My automated rhyming dictionary is built around a word database that understands whether a word is a noun, a verb, or whatever, as well as tense… I think adding this could tune the output even more, and I'm pretty sure that in a lot of these documents, once you get a few of the censored bits, the rest start falling into place — so you have to find the single word ones, and then prioritize those in your dictionary and attempt to build the phrases around those — and a lot of the time you already have a short-list of potential words (ie. in the document above, you can short-list the names of all the suspects). Et cetera. If you get this entry in general, I'm sure you're understanding just how simple this is!

Does anyone know if it's illegal to release something like this?

Wow Shannon, that's really annoying! What is it, 1997 on Geocities? Retroweb is NOT cool!

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*