PREHISTORIC VIRTUAL MACHINES (September 2010)
When people in the industry talk about intentional obfuscation using virtual machines, (note that these are not the same virtual machines as Virtual PC or VMware, but rather it's a technical term that was in use long before these products came into existence), the two examples that are most likely to come to mind are VMProtect and Themida. Both of them have been around since about 2004, only six years ago.
I'm trying to choose my terms carefully here, because by their nature, virtual machines provide a level of obfuscation as a side-effect of their very existence. By converting native code into pseudo-code (or "p-code", for short), the translation results in something that is much harder to read. Of course, the most common use of virtual machines is portability. Code that is translated to p-code can be run wherever an interpreter exists. Thus, many code samples can run on all platforms using a single constant interpreter for a particular platform, instead of one code sample for each platform. Of course, some of those interpreters allowed the execution of native code on the appropriate platform to perform actions that could not be provided by the virtual machines. For example, the Magnetic Scrolls interpreter allowed the execution of Motorola 68000 code directly on the Amiga platform. The game named "Amnesia" from Electronic Arts allowed the execution of Intel x86 code directly on the IBM PC platform.
We can also find early examples of virtual machines in some adventure games from companies such as Infocom since the late 1970s, and Magnetic Scrolls since the early 1980s. Some of those games had copy-protection built into the code that ran in the virtual machines. I consider those as well to be a kind of obfuscation by side-effect, too.
So, almost back to the intentional obfuscation. Just a little diversion first. I was browsing through my collection of Apple II stuff recently, and I noticed that I had a disk image in "nibble"* format. A disk image is the contents of a floppy disk saved as a file, for use with an emulator because I no longer have the hardware to run the original disk. Disk images come in one of two formats, because of how the disk drive works. We'll have to forgo the primer to explain the details, but the point is that a disk image in "nibble" format is the contents of the disk exactly as the disk drive would read it, before decoding it into the "disk" format. The "nibble" format is used to store images of disks that are copy-protected by changes to the disk structure.
Now really back to the intentional obfuscation. Here was a game from 1983. That's 27 years ago. It contained a virtual machine devoted to implementing the copy protection. The virtual machine supported only 18 instructions (add, subtract, increment, load, store, arithmetic shift left, move, branch if equal, branch if not equal, call, return, jump, decrypt, and execute native code). The p-code hooked the reset vector, and copied and decrypted the next layer which was another virtual machine. The second virtual machine supported only 13 instructions, and contained a funny twist: most of the tokens were the same between the two virtual machines, but in particular, the branch instructions were reversed. That meant that a parser or emulator that understood the code of the first virtual machine would misbehave when reading the code of the second virtual machine. It caught me, at first. The virtual machine called the native code to read the disk sectors. The sectors used a modified data header, and that's why the "nibble" format was needed.
I spent some time figuring out how both virtual machines worked, and wrote a disassembler for them to see what the p-code was doing. It's exactly the same thing that some people have done for VMProtect and Themida, though both VMProtect and Themida work very hard to obfuscate the interpreter, too.
The game was "The Last Gladiator" from Electronic Arts. The copy protection probably took longer to write than the game did, but I converted it to "disk" format anyway, so that I won't rediscover it years from now and do it all again.
Seeing that 27 year old virtual machine made me think of an anthropologist finding 40,000 year old paintings in a cave that was thought to have been inhabited for only a couple of thousand years. It's probably not as exciting, though. :-)
* as opposed to "nybble", which is half of a byte. Byte, nybble. Ha ha. Computer people are funny.
Peter finding an ancient virtual machine