Introduction to Deogol

an HTML steganography tool

  1. What is Deogol?
  2. What is steganography?
  3. What does Deogol do?
  4. How does it work?
    1. Basic idea
    2. Generalization
    3. Across a document
  5. The importance of preserving size
  6. Examples of Using Deogol:
    1. Preliminaries
    2. Example 1
    3. Example 2
  7. Why is it called “Deogol”?
  8. Container capacities of some websites
  9. More about Deogol

What is Deogol?

Deogol is a commandline Perl program implementing basic steganography on HTML files. Its current version, as of the last revision of this document, is 0.11.

What is steganography?

The definition of steganography is, roughly, "hiding a secret message within a larger one in such a way that others can not discern the presence or contents of the hidden message."

As a concept, it's not confined to the computer world; an example of a "real-world" application of steganography is invisible ink. However, with the recent massive growth in the quantity of available and alterable information accessible by computer, the number of practical implementations of steganography has increased immensely.

Note that steganography is distinct from cryptography: steganography seeks to hide a message in such a way that it will not be discovered; cryptography seeks to encode a message in such a way that it may be discovered but not decoded.

What does Deogol do?

Deogol embeds a small message into an HTML container file without increasing its size or changing the non-tag text of the file. An HTML container file produced by Deogol (containing an embedded secret message) will be indistinguishable by a browser from the original HTML file used to produce it.

How does it work?

Basic idea

An HTML file consists of text interspersed with tags, which have the form

<tagname attribute1=value1 attribute2=value2 ... >

What the tags actually mean and do is left to the browser to handle, and is unimportant for our purposes. The key idea is that the attributes can have arbitrary order within the tag. Let's consider a simple example:

<IMG SRC="picture.jpg" ALT="A picture">

Here we have a simple HTML instruction to display a JPEG file, or the text "A picture" if the file picture.jpg cannot be displayed for some reason. The above tag puts the SRC attribute before ALT, but it could just as well have been:

<IMG ALT="A picture" SRC="picture.jpg">

So we have two equally valid ways of expressing the same HTML tag. When designing a webpage, we can pick one or the other at our whim. However, we can also pick some convention for associating one or the other of the tags with a piece of information, say 0 or 1. Then, if we embed an HTML file with one or the other of these two equivalent tags, and give the file to someone, we can say we're using this file as a container (or carrier) for a one-bit message.

Generalization

That's the basic idea. One might argue that one bit is not very much, which is quite true. Consider this example, however:

<TD NOWRAP ROWSPAN=1 COLSPAN=4 ALIGN=left VALIGN=top HEIGHT=40 WIDTH=40 id=col1>

This tag has 8 attributes. Some high-school math tells us that the number of possible distinct permutations of these attributes is 8!=40320. This means that there are 40320 tags equivalent to this one, but with attributes permuted. Thus, by choosing a particular tag of the 40320 possible in a container document, we can record log2(8!) ≅ 15.3 bits, or slightly less than 2 bytes.

At first glance this seems pretty unimpressive, considering we could record 80 bytes directly in the space it took us to express the tag. But that's true only if the information is in plain sight; our system of tag-encoding is tags is not foolproof, but is at least obscure. And, as the above example shows, the content-to-noise ratio of the container does considerably increase as the average number of attributes per tag increases.

Across a document

We've seen the idea for a single tag; how does Deogol operate on a document? First, it ignores but preserves all non-tag content and any tags with 0 or 1 attributes (as these tags cannot contain any information).

It then converts the message to be encoded into a large number M, and proceeds through the container file one tag at a time. On encountering a tag with n elements, it computes M' = M div n! and p = M mod n!. The number p is a number between 0 and n!-1, which Deogol then transforms into a permutation. It then permutes the current tag according to this ordering, and outputs the new tag. The number M is then updated with the new, strictly smaller number M', and the process continues.

This process continues until one of two criteria is met: either M=0 (i.e. the message has been completely transcribed), or there are no more tags left to permute (i.e. the container is full). Deogol checks for the latter case in advance, and issues a warning if the container capacity is too small to encode the message.

The importance of preserving size

Aside from the benefit of the container being an ordinary HTML file, Deogol's other appeal lies in the fact that each of the alternate expressions of a tag are the same length, so the total container size is unchanged after the insertion of the message. This contrasts with some other steganographic tools, which insert information in places it won't affect the rendered output, but which are still visible by looking at the HTML source. Examples:

Each of these approaches offer easily-testable clues. It's simple to write a program to search for excessive whitespace in HTML documents on the web, and simple to write a program to test for data after the EOF character. Both approaches result in "unusual" files.

On the other hand, Deogol's output is an unremarkable HTML file. One might notice a rather unusual convention for ordering tags inside, but given the lack of any clear convention among HTML authors and HTML-producing programs for ordering tags, it is probably rather difficult to write a computer algorithm to test for this.

Examples of Using Deogol

The following examples illustrate the basic use of Deogol. They assume some basic knowledge of UNIX syntax on the part of the reader; note that Deogol is not restricted to UNIX platforms: it will run anywhere Perl does.

Lines below starting with % indicate commandline input under UNIX.

Preliminaries

First, grab a large HTML file with a lot of tags. (News sites are generally good for this.) Save this HTML file locally as container.html. Test its size with

% deogol.pl -c < container.html

Deogol will print "Container capacity:" followed by the capacity in bytes. For the following examples to work, this value must be at least 25.

Example 1:

In this example we create a message, encode it, then decode it and check that the encoded message is correct.

Create a short message, and check its size:

% echo "Hello, world!" > message1.txt
% deogol.pl --size message1.txt
14

Having confirmed this message is less than the container capacity, embed this message into the HTML container:

% deogol.pl message1.txt < container.html > full_container_1.html

So now we've generated our HTML container with an embedded message. Look at it in your browser, or in a text editor, and note the differences from the original container.html.

Now extract the encoded message from the container to a file:

% deogol.pl -d message1-decoded.txt < full_container_1.html
% cat message1-decoded.txt
Hello, world!

We see that the encoded message was what we expected.

Example 2:

In this example we write a message, compress it to save space, and then encode it in our container file. Then we decode and decompress it, and check the result.

Enter the following to create a message file:

% echo aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa > message2.txt

First, let's check the message size with Deogol:

% deogol.pl --s message2.txt
40

So message2.txt is a file with 40 characters. This may be too much for our container size, so we'll use the compression utility gzip to compress it. (We'll assume the recepient knows to decompress the file before reading it.)

% deogol.pl message2.txt --size --filter="gzip -f"
25

Thus, after filtering our message through gzip, we see that our rather contrived example can be fitted into a 25-byte container. So we go ahead with encoding:

% deogol.pl message2.txt --filter="gzip -f" < container.html > full_container_2.html

Thus we have generated a file full_container_2.html with the embedded compressed message. Again, compare this to the original container.html and note the differences.

Let's decode our embedded message now. To do this correctly, we'll need to invert the gzip compression, using gunzip:

% deogol.pl -d message2_decoded.txt --filter="gunzip -f" < full_container_2.html
% cat message2_decoded.txt
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

which is the original message, as required.

Why is it called "Deogol"?

Deogol is an Old English word meaning "hidden".

“Þu wast gif hit is swa we soþlice secgan hyrdon þæt mid Scyldingum sceaðona ic nat hwylc,
deogol dædhata, deorcum nihtum eaweð þurh egsan uncuðne nið, hynðu ond hrafyl.” — Beowulf, 272-277

“We hear, thou knowest if truly we speak, the saying of men, that amid the Scyldings a scathing monster,
hidden ill-doer, in dusky nights shows terrific his rage unmatched.”

Container capacities of some websites

To provide some idea of what scale of information can be stored in a file using Deogol, here is a list of container capacities reported by Deogol for some popular websites, as of 2002/10/28:
Site Capacity (bytes)
news.google.com 346
www.netscape.com 324
www.cbc.ca 212
www.cnn.com 183
www.microsoft.com119
www.uwaterloo.ca 116
slashdot.org 127
www.theonion.com 104
www.ebay.com 76
www.memepool.com 59
www.google.com 18

More about Deogol

More information on Deogol's commandline parameters may be found in the accompanying man page.

For the most recent code and documentation, consult the Deogol webpage.


Deogol main page.