A BIT OF ARCHAEOLOGY (July 2011)
This entry has nothing to do with malware. Just so you know.
Some people know that I like the demo scene. I've been following it for more than 20 years now, but it's even older than that. I like the size-optimisation competitions best, and I've even participated in a few - most recently, smallest downloader on 32-bit Windows XP: 233 bytes (255 bytes on Vista and later), print the EICAR test string: 56 bytes. Of particular interest to me are the demos in 512 bytes or less. They are so small that in order to have cool effects, a structured file is unusable, so only a .com file works here. As a result, they only run in DOS or a 32-bit console window (or via an emulator). No 64-bit systems here. Even now, in 2011, there was a 128-bytes competition, and the year is not over yet.
How do you make a file that small? Mostly it's just amazing code, but to save a few bytes it's also quite common to rely on the initial register values instead of initialising them manually.
The question, though, is which registers hold what values... and why? This is something that I have never seen written down. I suspect that it's just something that "everybody knows".
Let's take a look at a few versions of DOS, to see what I mean:
Note that these values are for real DOS. For certain versions of the Windows console, the bp register value is 091e.
So that's the which and the what. As for the why...
0019:000041DA MOV SP, 0920
0019:000041F9 CALL NEAR WORD PTR SS:[05EA]
Now the sp register value is 091e.
0019:00009B6E PUSH BP
Now the sp register value is 091c.
0019:00009B6F MOV BP, SP
And now so is the bp register value.
0019:00009FA6 MOV DX, WORD PTR SS:[BP - 12]
This value is the result of a memory allocation, and depends on the size and structure of the image being loaded.
0019:0000A02F REPE MOVS BYTE PTR ES:[DI], BYTE PTR DS:[SI]
Now the cx register value is 0000.
0019:0000A031 DEC CL
And now it's 00ff.
0019:0000A035 XOR BH, BH
0019:0000A040 XOR BL, BL
Now the bx register value is 0000.
0019:0000A0AC LDS SI, DWORD PTR SS:[0FC4]
Now the si register value is assigned, and depends on the structure of the image being loaded (0100 for .com files).
0019:0000A0B1 LES DI, DWORD PTR SS:[0FC0]
Now the di register value is assigned, and depends on the structure of the image being loaded (fffe for .com files).
0019:0000A0B6 MOV AX, ES
0019:0000A0E1 MOV SS, AX
Here we see that the dx register is not the source of the ss register value, as is commonly assumed.
0019:0000A0E3 MOV SP, DI
Now the sp register is assigned, and we see that the di register is its source.
0019:0000A0E6 PUSH DS
0019:0000A0E7 PUSH SI
Aliases for the cs and ip registers are pushed onto the stack, and we see that the dx register is not the source of the cs register value, either.
0019:0000A0E8 MOV ES, DX
0019:0000A0EA MOV DS, DX
0019:0000A0EC MOV AX, BX
Now the ax register value is 0000.
The file runs, and the mystery is solved.