Now I'm definitely getting arrested…

Steganography is the art of communicating in a way that hides the existence of that communication… that is, obfuscation (there's no encryption involved). Most commonly this involves things like hiding data in the least significant bit of an image file — it is said that terrorists have used this method to communicate using images in USENET newsgroups.

One of my interests as an artist is the work of Russian mathematician Andrey Andreyevich Markov, after whom Markov chains are named. In short Markov chains are sequences of events, followed by a statistical representation of potential future events. I thought it might be possible to use subtle variations and fluctuations in the statistical output of applications of such data in order to encode secret messages. For example, the string below, seeded with the Book of Genesis, encodes the simple phrase, “Hello, world!” followed by a carriage-return.


Jahzeel, and replenish the fowl of every living creature after their hand of Hagar the land by his father, and Accad, and Calah, And they brought forth his voice, and she conceived, that Pharaoh for my power I pray you, saying, Jacob: all these words? God saw Rachel envied him. And I offended their possession: he and thou shalt thou in the sight because I shall be gone. And they might not toward Israel's left communing with thy brethren, sons of the ark; And he could we may preserve seed of Pharaoh awoke. And Abram said to interpret it. And the nakedness of the earth. Make ready the word of the face to him a well; whose land ye shall tell me; that he saw that betwixt me swear, saying, He is Hiddekel: that I have made the tree, of Canaan.
= Hello, world!

Basically what it does is it splits the Markovian data into two sets rather than a single set, with one set representing a '0' bit and the second set representing a '1' bit. Encoding is fairly simple — here's the metacode (input is a seed word — the last word rendered — and a bit value):


find word
 if word has 0 markovian sets (ie. terminating word)
  select random start word
  add word to buffer
  seed array with new word and start over
 if word has 1 markovian set (ie. single possibility)
  add word to buffer
  seed array with new word and start over
 if word has 2 markovian sets (ie. normal data)
  for bit 0, select random word (statistically balanced)
   from first set, add to buffer, and exit
  for bit 1, select random word (statistically balanced)
   from second set, add to buffer, and exit

What's interesting if it's not obvious is that the output data will always be different — there are billions of different ways that the “Hello, world!” example could have encoded, each one just as valid — it's not the characters that are important, but the subtle variations their selections make in the statistical flow of the words. The decode process is of course the same in reverse. Again, here is the metacode (this time the input is two words; a potential Markov chain which may or may not contain data):


find word1
 if 0 or 1 markovian sets, return -1 (no meaningful data)
 if 2 markovian sets,
  return 0 if word2 is in set 1
  return 1 if word2 is in set 2

Simple! If you'd like to play with it, you can use any seed data you'd like, although I've used the book of Genesis personally. This version is case sensitive, and requires some words in the set to have capital letters (since it considers them potential 'start words'). Only spaces and linebreaks are considered terminators between words. Here's the links to the software and the seed data I use:


genesis.txt (seed data, text of Genesis)
markenc.exe (Win32 command line executable)

Note that it's a rather incomplete piece of code (the last option below doesn't work, there's minimal error checking, and it's far from optimized) — just a proof of concept really… The usage is pretty simple;

Usage: markenc seedfile infile action outfile [options]
Actions:   e  encode   ...or...  d  decode
Options:   y  confirm all
           t  use timer as random seed
           w=word  force start word
           b  byte mode

The command I used to encode the above data was 'markenc genesis.txt hw.txt encode hw.enc t y' and to test the decode (successful), I used 'markenc genesis.txt hw.enc decode hw.dec t y'hw.txt is the file containing the input text, genesis.txt is of course the data that's used to seed the Markov chains, and hw.enc and hw.dec are the encoded and subsequent decoded files.

Well, I thought it was amusing.

I'm thinking about using it to write a BRAINF*CK-style language using three bits to encode each command… I'm going to call the compiler “THE BIBLE CODE”. If that's got you laughing as hard as it got me laughing, well, first of all you're probably an autistic savant, and second of all, you can have a beer with me any time you want because I'd definitely enjoy your company.

3 Comments

  1. soroush wrote:

    Hi,
    all of your links are broken :(
    Could u please send me the cover and your exec file?
    I wrote it in java but my output is very longer than u in hiding something like Hello! World.
    Thank you very much.

    Tuesday, August 25, 2009 at 4:46 am | Permalink
  2. Soroush wrote:

    Finally, I could increase the capacity of hiding in my application. Now, I can hide/find everything in text ;) thank you for your clues in this page.

    Friday, September 18, 2009 at 2:45 pm | Permalink
  3. Shannon wrote:

    Links are all fixed now, sorry.

    Sunday, July 15, 2012 at 10:11 pm | Permalink
Wow Shannon, that's really annoying! What is it, 1997 on Geocities? Retroweb is NOT cool!

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*