Reading Raw Memory with Julia

Note: This post was written as a Jupyter notebook file. You can view it here, or you can see its original notebook form on github here.

My girlfriend Michele recently started a new project over at http://learningjulia.com. You should check it out! I'll be here when you get back...

Okay, welcome back. In a recent post, Michele tried constructing an array of Julia color types:

constructing an uninitialized array

So where did the cool pattern come from? Why isn't that image just black?

The reason is that constructing an array in Julia does not (currently) set the values contained in that array to anything. Instead, those values correspond to whatever happened to be in the chunk of memory that was allocated for that array. The image is mostly black, because when the Julia process receives a new block of memory from the operating system, that block is initially set to zero (to avoid exposing leftover data from other programs). But it looks like in this case, some of the memory that was allocated for that array had already been used in Julia for something. Maybe we can find out what...

Disclaimer

Please note: exposing your raw memory data to the internet is a terrible, terrible idea. I'm only posting this with Michele's permission, and because we've both looked through the data for anything that we don't want saved on the internet forever. Even then, we've probably missed something and might eventually come to regret this. Don't put random memory dumps from your computer on the internet.

Whew. Okay. Back to the fun.

Reading the Raw Data

First, we need to turn that image back into raw memory data. The first thing to do is to load up the image. The Images.jl package makes that easy:

In [2]:
using Images
In [179]:
img = load("raw.png")
Out[179]:

This is, indeed, a pretty cool pattern. There is a lot of structure in this data, and I don't understand all of it. I do see a stripe with a lot of entropy (randomness) that looks interesting, though. Let's try to extract it:

In [25]:
stripe = img[:,42:47]
Out[25]:

It's easy to extract that stripe as an image, but we want to reconstruct the raw bytes that were in that chunk of memory. To do that, we need to look at how this image is being stored:

In [26]:
typeof(img)
Out[26]:
Array{ColorTypes.RGB4{FixedPointNumbers.Normed{UInt8,8}},2}

As I discussed in a previous post, an image in Julia is just a 2-dimensional Array of pixels, where each pixel is some kind of ColorType. The type of this particular image is ColorTypes.RGB4{FixedPointNumbers.Normed{UInt8,8}}. Let's unpack that:

In [27]:
?ColorTypes.RGB4
Out[27]:

RGB4 is a variant of RGB which has a padding element inserted at the end. In some applications it may have useful memory-alignment properties.

Like all other AbstractRGB objects, the constructor is still called RGB4(r, g, b).

In [29]:
# The RGB4 type has a dummy element inserted after the red, 
# green, and blue values. That means that this image has no
# transparency (alpha) data. We can see the actual fields 
# of a single pixel to confirm:
dump(img[1,1])
ColorTypes.RGB4{FixedPointNumbers.Normed{UInt8,8}}
  r: FixedPointNumbers.Normed{UInt8,8}
    i: UInt8 72
  g: FixedPointNumbers.Normed{UInt8,8}
    i: UInt8 50
  b: FixedPointNumbers.Normed{UInt8,8}
    i: UInt8 228
  alphadummy: FixedPointNumbers.Normed{UInt8,8}
    i: UInt8 255

The FixedPointNumbers.Normed{UInt8, 8} type indicates that each of the red, green, and blue values is stored as a single byte (a UInt8) which is being used to represent 256 evenly spaced values between 0 and 1.

That's all the information we need: assuming that the Julia image corresponds to the available data in the raw image that we got from Michele's blog, we now know that each pixel in the image gives us exactly three bytes of data (from the red, green, and blue channels). Let's extract that data:

In [47]:
open("stripe_data", "w") do f
    for pixel in stripe
        write(f, reinterpret(UInt8, red(pixel)))
        write(f, reinterpret(UInt8, green(pixel)))
        write(f, reinterpret(UInt8, blue(pixel)))
    end
end

The strings program is shipped with most Unix variants (like Linux and macOS). It's a simple tool that knows how to do one thing: look through a big blob of memory for things that look like readable strings. Let's try it out on our data:

In [55]:
stripe_strings = split(readstring(`strings stripe_data`))
stripe_strings[1:25]
Out[55]:
25-element Array{SubString{String},1}:
 "[A\\A]A^A_]"
 "AWAVAUATSH" 
 "[A\\A]A^A_]"
 "AWAVAUATSP" 
 "[A\\A]A^A_]"
 "[A\\A]A^A_]"
 "[A\\A]A^A_]"
 "[A\\A]A^A_]"
 "[A\\A]A^A_]"
 "fffff."     
 "AWAVAUATSH" 
 "[A\\A]A^A_]"
 "ffffff."    
 "AWAVAUATSH" 
 "[A\\A]A^A_]"
 "AWAVATSH"   
 "[A\\A^A_]"  
 "ffffff."    
 "AWAVAUATSH" 
 "[A\\A]A^A_]"
 "fffff."     
 "AWAVAUATSH" 
 "[A\\A]A^A_]"
 "fffff."     
 "AWAVATSH"   

Those first few strings are interesting. There are several instances of AWAVAUATSH and AWAVATSH. What's up with that? Fortunately, Google knows: http://stackoverflow.com/a/39335836/641846. The short answer is that certain executable instructions happen to result in values which are all ASCII, and these are a few common examples. Google found 9,820 hits for AWAVAUATSH at the time of this writing.

The rest of the strings in that stripe are all related to Julia:

In [57]:
stripe_strings[end-25:end]
Out[57]:
26-element Array{SubString{String},1}:
 "julia_write_71320"          
 "sizeof"                     
 "sizeof;"                    
 "elsize"                     
 "elsize;"                    
 "julia_print_71321"          
 "julia_escape_string_71318"  
 "escape_string"              
 "julia_limitstringmime_71312"
 "limitstringmime"            
 "julia_limitstringmime_71322"
 "julia_limitstringmime_71324"
 "julia_limitstringmime_71326"
 "julia_limitstringmime_71328"
 "julia_limitstringmime_71330"
 "julia_limitstringmime_71332"
 "julia_limitstringmime_71334"
 "Int64"                      
 "data"                       
 "#self#"                     
 "#temp#"                     
 "kws..."                     
 "Char"                       
 "UInt32"                     
 "UInt8"                      
 "mime"                       

We can see some Julia data types (like Int64, and UInt8), some internal Julia names (like #self#, and some C functions which are part of Julia itself (like julia_print_...). That all makes sense: this is memory that has been used by Julia, so it's full of Julia-related data.

The Rest of the Image

We've gotten some interesting strings out of that high-entropy stripe in the image. Is there anything else to find in the rest? Let's take a look:

In [60]:
img[:,1:41]
Out[60]:

Reshaping this image shows some interesting patterns:

In [80]:
reshaped = reshape(img[:,1:41], (32, 656))
Out[80]:
In [82]:
reshaped[:, 370:402]
Out[82]:
In [84]:
reshaped[:, 570:602]
Out[84]:
In [83]:
reshaped[:,256:288]
Out[83]:

If you look carefully at the last image above, you might see an interesting pattern: there is a graident pattern that occurs across several rows, occurring in either the red, green, or blue color channel. Seeing the same pattern occur in different color channels makes me wonder if maybe there's an alignment issue. The RGB4 data type uses three bytes of data per pixel, so our pixels are aligned with a spacing of three bytes. But essentially everything in a computer is aligned in powers of two. What if we were to reorganize the data so that pixels are aligned to 4-byte boundaries?

In [135]:
block = img[:,1:64]

# Extract the raw data bytes from the image block
data = UInt8[]
for px in block
    for val in (red(px), green(px), blue(px))
        push!(data, reinterpret(UInt8, val))
    end
end

# Then create a new image using 4-byte aligned sections
# of that data. This means we actually skip every fourth
# data byte. Interestingly, those bytes, outside of the 
# stripe we looked at before, are almost all zero anyway. 
new_block = RGBA{N0f8}[]
i = 1
while i < length(data)
    push!(new_block, RGBA{N0f8}(
            reinterpret(N0f8, data[i]),
            reinterpret(N0f8, data[i+1]),
            reinterpret(N0f8, data[i+2]),
            reinterpret(N0f8, 0xff)))
    i += 4
end

Doing that re-alignment gives a much clearer picture of the structure of our data:

In [136]:
new_block = reshape(new_block, (length(new_block) รท 128, 128))
Out[136]:

There's so much more to learn about what this data represents, but I'm afraid it will have to wait until another time.

Until then, here's one more cool result. What happens if I try allocating an uninitialized image of my own?

I tried running Array{RGB{N0f8}, 2}(size(img)) a few times. Most of the time I got all zeros, but eventually I found something more interesting:

my raw memory data

Yup, that's Michele's image, recovered from the used memory data of my Julia process. Cool!

Robin Deits 27 February 2017