FURTHER UNEXPECTED RESUTLS [sic] (May 2010)

It's been ten years since I first noticed the word "callback" in the Thread Local Storage (TLS) section of the Portable Executable format documentation. Since then, we've seen it used and abused by virus writers, packer vendors, and general mischief-makers (and me, too, of course, as part of my research). During that time, I thought that I had discovered everything that there was to know about it. Apart from the fact that it runs before the main entrypoint, there are other things that it can do:

The TLS callback array can be altered (later entries can be modified) and/or extended (new entries can be appended) at runtime. Newly added or modified callbacks will be called, using the new addresses. There is no limit to the number of callbacks that can be placed.
TLS callback addresses can point outside of the image, for example, to newly loaded DLLs. This can be done indirectly, by loading the DLL and placing the returned address into the TLS callback array. It can also be done directly, if the loading address of the DLL is known. The imagebase value can be used as the callback address, if the DLL is structured in such a way as to defeat DEP, in case it is enabled, or a valid export address can be retrieved and used. TLS callback addresses can contain RVAs of imported addresses from other DLLs, if the import address table is altered to point into the callback array. Imports are resolved before callbacks are called, so imported functions will be called normally when the callback array entry is reached.
TLS callbacks receive three stack parameters, which can be passed directly to APIs. The first parameter is the ImageBase of the host process. It could be used by APIs such as the kernel32 LoadLibrary() or kernel32 WinExec() functions. The ImageBase parameter will be interpreted by the kernel32 LoadLibrary() or kernel32 WinExec() functions as a pointer to the file name to load or execute. By creating a file called "MZ[some string]", where "[some string]" matches the host file header contents, the TLS callback will access the file without any explicit reference. Of course, the "MZ" portion of the string can also be replaced manually at runtime, but many APIs rely on this signature, so the results of such a change are unpredictable.
TLS callbacks are called whenever a thread is created or destroyed (unless the process calls the kernel32 DisableThreadLibraryCalls() or the ntdll LdrDisableThreadCalloutsForDll() functions). That includes the thread that is created by Windows when a debugger attaches to a process. The debugger thread is special, in that its entrypoint does not point inside the image. Instead, it points inside kernel32.dll. Thus, a simple debugger detection method is to use a TLS callback to query the start address of each thread that is created.
Since TLS callbacks run before a debugger can gain control, the callback can make other changes, such as removing the breakpoint that is typically placed at the host entrypoint. When combined with the ntdll DbgBreakPoint() function patch, the result is a file that cannot be debugged by ordinary means. The debugger will attach to the debuggee, and then wait for the exception, which never occurs. Using Ctrl-C to break in will work well enough to look at the code, but breakpoints that are placed within the other threads will not activate.
The execution of TLS callbacks is also platform-specific. If the executable imports only from either ntdll.dll or kernel32.dll, then callbacks will not be called during the "on attach" event when run on Windows XP and later.

That should just about cover it, except for one thing. I was asked recently about some internal details regarding TLS callbacks in DLLs. Since I didn't have any test files easily accessible, I created a new one. It was essentially a do-nothing file that contained a TLS callback. Then I created an .exe file and statically linked my test DLL to it. I tried to run the .exe file, and saw a message that it failed to load. Okay, something wrong in my hand-crafted DLL. The most likely reason is that I put a byte in the wrong place. The simplest way to find out is to step through loading the DLL dynamically, to see where it's failing, so that's what I did. I created a new .exe file, which loaded the DLL dynamically, and then I stepped through the LoadLibrary() function code...

Imagine my surprise when I saw the loader examining the TLS data. Why surprise? Well, because it directly contradicts the existing Portable Executable format documentation. The documentation states that "Statically declared TLS data objects", that is to say, Thread Local Storage callbacks, "can be used only in statically loaded image files. This fact makes it unreliable to use static TLS data in a DLL unless you know that the DLL, or anything statically linked with it, will never be loaded dynamically with the LoadLibrary API function". However, in Windows Vista and later versions, the kernel32 LoadLibrary() function does call Thread Local Storage callbacks in DLLs. Further, the Thread Local Storage callbacks will be called, no matter what is present in the import table. Thus, the DLL can import from ntdll.dll or kernel32.dll or even no DLLs at all (unlike the .exe case described above), and the callbacks will be called!

This is a significant change in behaviour. It also leads to a very neat anti-emulator trick. This behaviour can be detected most easily by an .exe file that co-operates with the DLL. However, it is possible for a DLL to determine if it was loaded statically or dynamically by examining, for example, the value of the stack pointer or the Structured Exception Handler frame list, among other things. That would allow the .exe file to remain unaware of the behaviour, such as the case of an existing DLL being altered to provide this behaviour.

Example code for the DLL file TLS callback looks like this:

inc b [offset l1]
...
l1: db 0 ;set to 1 on attach
export l1

Example code for the .exe file looks like this:

push offset l2
call LoadLibraryA
push offset l3
push eax
call GetProcAddress
xchg ebx, eax
call GetVersion
cmp al, 6 ;Vista+
setnb al
cmp [ebx], al
jne being_emulated
...
l2: db "mydll", 0
l3: db "l1", 0

In this example, if the callback is called (presumably by the kernel32 LoadLibrary() function), then the value at l1 will be set to one. If the Windows version suggests Windows Vista or later, then a flag will be set to one. If the callback is not called, then the value at l1 will remain zero. If the Windows version suggests Windows XP or earlier, then a flag will be set to zero. If the two results match (either both set or both clear), then the expected behaviour has been demonstrated. Otherwise, the presence of the emulator is revealed.

Oh yes, the problem with my DLL was that the TLS data pointer was off by one byte, resulting in a crash. That's why the DLL didn't load.

Oops.