Warning: this post contains partial spoilers for the 2009 MIT Mystery Hunt puzzle "Invasion of the Micronauts." You've been warned!
This post is an IPython Notebook, which you can also view online through nbviewer here.
I'm a big fan of puzzles of all kinds, especially cryptic crosswords, and I've been participating in the MIT Mystery Hunt for the last few years. Mystery Hunt is a weekend marathon of puzzle solving, consisting of puzzles of every style imaginable, from sudoku to physics to Minecraft. Puzzle solving generally involves some mix of cleverness and persistence, and perhaps a bit of brute force.
This week, I've been going through the Mystery Hunt archives from the 2009 Hunt, and I found a puzzle I'd heard of but never solved: Invasion of the Micronauts. This puzzle consists entirely of an (apparently) blank PDF, with no instructions or hints. I remembered hearing from friends of mine that the PDF actually contains a tiny image or description of some kind, but it's too small to see normally. But after a few minutes of zooming and scrolling, I got bored and decided that this was a perfect opportunity for some brute force. A record of my adventures follows.
%pylab inline
from IPython.display import Image
If the PDF actually contains an embedded image or text, then we should be able to find it in the PDF source. Opening puzzle.pdf
in vim gives the following:
display(Image('pdf_text.png'))
I had never actually looked inside a PDF file before, but there's a lot we can learn from this already. The basic format is a text file, with some blobs of binary data embedded in it. Scrolling through the file, I don't see anything that looked like it might be the text that the image supposedly contained, so the data must be hidden inside those binary blobs. Time to go exploring...
Each binary blob is contained between two readable tags: stream
and endstream
, but I can't see any obvious patterns in them. This is where we break out the big guns: the Adobe Portable Document Format Reference, third edition, from Addison-Wesley. The PDF reference has pages and pages of data on these Content Stream
types. Most of that is going to be irrelevant, so we need something more specific to look for.
Just before the stream is a tag /FlateDecode
. Aha! According to the manual, that means:
(PDF 1.2) Decompresses data encoded using the zlib/deflate compression method, reproducing the original text or binary data.
That sounds promising! Let's try it out.
# First, let's get the PDF file contents as a byte stream
data = open('puzzle.pdf', 'rb').read()
Next, I manually searched through the data
array to find the chunk between the stream
and endstream
tags, which turned out to start at index 78 and last for 324 bytes. Decompressing it is as easy as this:
import zlib
decompressed = zlib.decompress(data[78:(78+324)])
And here's the result:
print(''.join(chr(x) for x in decompressed))
Cool! I have no idea what any of that means, but the last two numbers in every row look a lot like coordinates. Let's try graphing them:
lines = ''.join(chr(x) for x in decompressed).split('\n')
elements = [l.split(' ') for l in lines]
x = []
y = []
for row in elements[:-3]:
x.append(row[-2])
y.append(row[-1])
plot(x, y)
axis('equal')
That looks a lot like a letter 'L'. Maybe the rest of the streams will also come out as letters.
Manually specifying the indices for the streams is a pain, so let's automate that with some regular expression magic.
import re
matches = list(re.finditer(b'stream\n((.|\n)*?)\nendstream', data))
We now have a list matches
containing each binary stream blob. Let's look at a few (printing just the first 200 characters from each match):
for m in matches:
decompressed = zlib.decompress(m.group(1))
print('============ MATCH =============')
print(''.join(chr(x) for x in decompressed[:2000]))