an HTML steganography tool
The definition of steganography is, roughly, "hiding a secret message within a larger one in such a way that others can not discern the presence or contents of the hidden message."
As a concept, it's not confined to the computer world; an example of a "real-world" application of steganography is invisible ink. However, with the recent massive growth in the quantity of available and alterable information accessible by computer, the number of practical implementations of steganography has increased immensely.
Note that steganography is distinct from cryptography: steganography seeks to hide a message in such a way that it will not be discovered; cryptography seeks to encode a message in such a way that it may be discovered but not decoded.
Deogol embeds a small message into an HTML container file without increasing its size or changing the non-tag text of the file. An HTML container file produced by Deogol (containing an embedded secret message) will be indistinguishable by a browser from the original HTML file used to produce it.
An HTML file consists of text interspersed with tags, which have the form
<tagname attribute1=value1 attribute2=value2 ... >
What the tags actually mean and do is left to the browser to handle, and is unimportant for our purposes. The key idea is that the attributes can have arbitrary order within the tag. Let's consider a simple example:
<IMG SRC="picture.jpg" ALT="A picture">
Here we have a simple HTML instruction to display a JPEG file, or the text "A picture" if the file picture.jpg cannot be displayed for some reason. The above tag puts the SRC attribute before ALT, but it could just as well have been:
<IMG ALT="A picture" SRC="picture.jpg">
So we have two equally valid ways of expressing the same HTML tag. When designing a webpage, we can pick one or the other at our whim. However, we can also pick some convention for associating one or the other of the tags with a piece of information, say 0 or 1. Then, if we embed an HTML file with one or the other of these two equivalent tags, and give the file to someone, we can say we're using this file as a container (or carrier) for a one-bit message.
That's the basic idea. One might argue that one bit is not very much, which is quite true. Consider this example, however:
<TD NOWRAP ROWSPAN=1 COLSPAN=4 ALIGN=left VALIGN=top HEIGHT=40 WIDTH=40 id=col1>
This tag has 8 attributes. Some high-school math tells us that the number of possible distinct permutations of these attributes is 8!=40320. This means that there are 40320 tags equivalent to this one, but with attributes permuted. Thus, by choosing a particular tag of the 40320 possible in a container document, we can record log2(8!) ≅ 15.3 bits, or slightly less than 2 bytes.
At first glance this seems pretty unimpressive, considering we could record 80 bytes directly in the space it took us to express the tag. But that's true only if the information is in plain sight; our system of tag-encoding is tags is not foolproof, but is at least obscure. And, as the above example shows, the content-to-noise ratio of the container does considerably increase as the average number of attributes per tag increases.
We've seen the idea for a single tag; how does Deogol operate on a document? First, it ignores but preserves all non-tag content and any tags with 0 or 1 attributes (as these tags cannot contain any information).
It then converts the message to be encoded into a large number M, and proceeds through the container file one tag at a time. On encountering a tag with n elements, it computes M' = M div n! and p = M mod n!. The number p is a number between 0 and n!-1, which Deogol then transforms into a permutation. It then permutes the current tag according to this ordering, and outputs the new tag. The number M is then updated with the new, strictly smaller number M', and the process continues.
This process continues until one of two criteria is met: either M=0 (i.e. the message has been completely transcribed), or there are no more tags left to permute (i.e. the container is full). Deogol checks for the latter case in advance, and issues a warning if the container capacity is too small to encode the message.
Each of these approaches offer easily-testable clues. It's simple to write a program to search for excessive whitespace in HTML documents on the web, and simple to write a program to test for data after the EOF character. Both approaches result in "unusual" files.
On the other hand, Deogol's output is an unremarkable HTML file. One might notice a rather unusual convention for ordering tags inside, but given the lack of any clear convention among HTML authors and HTML-producing programs for ordering tags, it is probably rather difficult to write a computer algorithm to test for this.
The following examples illustrate the basic use of Deogol. They assume some basic knowledge of UNIX syntax on the part of the reader; note that Deogol is not restricted to UNIX platforms: it will run anywhere Perl does.
Lines below starting with % indicate commandline input under UNIX.
First, grab a large HTML file with a lot of tags. (News sites are generally good for this.) Save this HTML file locally as container.html. Test its size with
% deogol.pl -c < container.html
Deogol will print "Container capacity:" followed by the capacity in bytes. For the following examples to work, this value must be at least 25.
In this example we create a message, encode it, then decode it and check that the encoded message is correct.
Create a short message, and check its size:
% echo "Hello, world!" > message1.txt
% deogol.pl --size message1.txt
Having confirmed this message is less than the container capacity, embed this message into the HTML container:
% deogol.pl message1.txt < container.html > full_container_1.html
So now we've generated our HTML container with an embedded message. Look at it in your browser, or in a text editor, and note the differences from the original container.html.
Now extract the encoded message from the container to a file:
% deogol.pl -d message1-decoded.txt < full_container_1.html
% cat message1-decoded.txt
We see that the encoded message was what we expected.
In this example we write a message, compress it to save space, and then encode it in our container file. Then we decode and decompress it, and check the result.
Enter the following to create a message file:
% echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > message2.txt
First, let's check the message size with Deogol:
% deogol.pl --s message2.txt
So message2.txt is a file with 40 characters. This may be too much for our container size, so we'll use the compression utility gzip to compress it. (We'll assume the recepient knows to decompress the file before reading it.)
% deogol.pl message2.txt --size --filter="gzip -f"
Thus, after filtering our message through gzip, we see that our rather contrived example can be fitted into a 25-byte container. So we go ahead with encoding:
% deogol.pl message2.txt --filter="gzip -f" < container.html > full_container_2.html
Thus we have generated a file full_container_2.html with the embedded compressed message. Again, compare this to the original container.html and note the differences.
Let's decode our embedded message now. To do this correctly, we'll need to invert the gzip compression, using gunzip:
% deogol.pl -d message2_decoded.txt --filter="gunzip -f" < full_container_2.html
% cat message2_decoded.txt
which is the original message, as required.
“Þu wast gif hit is
swa we soþlice secgan hyrdon
þæt mid Scyldingum sceaðona ic nat hwylc,
deogol dædhata, deorcum nihtum eaweð þurh egsan uncuðne nið, hynðu ond hrafyl.” — Beowulf, 272-277
“We hear, thou knowest
if truly we speak, the saying of men,
that amid the Scyldings a scathing monster,
hidden ill-doer, in dusky nights shows terrific his rage unmatched.”
More information on Deogol's commandline parameters may be found in the accompanying man page.
For the most recent code and documentation, consult the Deogol webpage.