x86 FETCH-DECODE ANOMALIES (February 2007)

A colleague of mine came to see me one morning recently with an unusual result. For reasons that he didn't explain to me (he called it "a secret project"), he had intentionally placed a particular encoding of an invalid instruction near the end of a valid page, next to an unallocated page, then executed that instruction. However, instead of seeing the expected invalid opcode exception, he was seeing a page fault. Initially, I thought that it was related to the unexpected LOCK exception bug in Windows that I documented here, but it turned out to be something else entirely.

It turns out that the CPU performs a complete fetch, including parsing the ModR/M byte, prior to performing any kind of decoding. Thus, because of the instruction encoding that he had used, the CPU was attempting to retrieve all of the necessary bytes first, before it knew that the instruction was invalid. The emphasis on the word "all" is intentional - the CPU will fetch all of the necessary bytes, even when the instruction will exceed the maximum instruction length of 15 bytes as a result. The result is that 14 prefix bytes prior to a multi-byte opcode at the end of a page will trigger a page fault instead of a general protection exception. The CPU also knows when surrounding instructions in the same range contain immediate bytes (e.g., C6/C7), and will fetch those bytes, too.

For his results, my colleague used the FF B8 opcode, but I ran some tests and found quite a large list of opcodes that perform in the same way. The list follows, and assumes an Intel CPU with SSE3 technology (i.e., Pentium 4 CPUs since early 2004) using 32-bit encoding to produce the SIB effects.

8c 34
8c 35
8c 3c
8c 3d
8c 70-7f
8c b0-bf
8e 0c
8e 0d
8e 34
8e 35
8e 3c
8e 3d
8e 48-4f
8e 70-7f
8e 88-8f
8e b0-bf
8f 0c
8f 0d
8f 14
8f 15
8f 1c
8f 1d
8f 24
8f 25
8f 2c
8f 2d
8f 34
8f 35
8f 3c
8f 3d
8f 48-7f
8f 88-bf
c6 08-3f
c6 48-7f
c6 88-bf
c6 c8-ff
c7 08-3f
c7 48-7f
c7 88-bf
c7 c8-ff
d9 0c
d9 0d
d9 48-4f
d9 88-8f
db 24
db 25
db 34
db 35
db 60-67
db 70-77
db a0-a7
db b0-b7
dd 2c
dd 2d
dd 68-6f
dd a8-af
fe 14
fe 15
fe 1c
fe 1d
fe 24
fe 25
fe 2c
fe 2d
fe 34
fe 35
fe 3c
fe 3d
fe 50-7f
fe 90-bf
ff 3c
ff 3d
ff 78-7f
ff b8-bf
0f 00 2c
0f 00 2d
0f 00 70-7f
0f 00 b0-bf
0f 01 2c
0f 01 2d
0f 01 68-6f
0f 01 a8-af
0f 50 04
0f 50 05
0f 50 0c
0f 50 0d
0f 50 14
0f 50 15
0f 50 1c
0f 50 1d
0f 50 24
0f 50 25
0f 50 2c
0f 50 2d
0f 50 34
0f 50 35
0f 50 3c
0f 50 3d
0f 50 40-bf
0f 5d 04
0f 5d 05
0f 5d 0c
0f 5d 0d
0f 5d 14
0f 5d 15
0f 5d 1c
0f 5d 1d
0f 5d 24
0f 5d 25
0f 5d 2c
0f 5d 2d
0f 5d 34
0f 5d 35
0f 5d 3c
0f 5d 3d
0f 5d 40-bf
0f 71 00-cf
0f 71 d8-df
0f 71 e8-ef
0f 71 f8-ff
0f 72 00-cf
0f 72 d8-df
0f 72 e8-ef
0f 72 f8-ff
0f 73 00-cf
0f 73 d8-df
0f 73 e0-ef
0f 73 f8-ff
0f ae 24
0f ae 25
0f ae 2c
0f ae 2d
0f ae 34
0f ae 35
0f ae 60-77
0f ae a0-b7
0f ba 00-1f
0f ba 40-5f
0f ba 80-9f
0f ba c0-df
0f c4 00-bf
0f c5 00-bf
0f c7 04
0f c7 05
0f c7 14
0f c7 15
0f c7 1c
0f c7 1d
0f c7 24
0f c7 25
0f c7 2c
0f c7 2d
0f c7 34
0f c7 35
0f c7 3c
0f c7 3d
0f c7 40-47
0f c7 50-87
0f c7 90-bf
0f d6 04
0f d6 05
0f d6 0c
0f d6 0d
0f d6 14
0f d6 15
0f d6 1c
0f d6 1d
0f d6 24
0f d6 25
0f d6 2c
0f d6 2d
0f d6 34
0f d6 35
0f d6 3c
0f d6 3d
0f d6 40-bf
0f d7 04
0f d7 05
0f d7 0c
0f d7 0d
0f d7 14
0f d7 15
0f d7 1c
0f d7 1d
0f d7 24
0f d7 25
0f d7 2c
0f d7 2d
0f d7 34
0f d7 35
0f d7 3c
0f d7 3d
0f d7 40-bf

The anomalies can be demonstrated in 16-bit mode, too. The only change is that the x4/x5/xc/xd entries are replaced with x6/xe. 64-bit mode should behave in the same way, since it's only a matter of a prefix to switch from 32 to 64 bits. For earlier Intel CPUs, there are additional opcodes that behave in the same way. CPUs with VT-x technology actually reduce the gap in the 0f c7 range. CPUs with SSSE3 technology (i.e., Core 2 and later CPUs) potentially introduce gaps in the 0f 38 range. Presumably AMD CPUs will behave in the same way.

An interesting bug was also revealed in Windows NT - an invalid opcode that causes a page fault actually triggers a blue screen crash instead. However, since Microsoft no longer supports Windows NT, this will probably never be fixed.

Some people might point to the Intel documentation, which says that the page fault has a higher priority than the invalid opcode exception, so of course it would happen that way. Yes, that's what the documentation says, but no, that's not what it says. "Priority" is for servicing the exception, not for raising the exception. Anyway, I'm not saying that it's a bug, I'm saying that it's an anomaly. Intel, on the other hand, apparently considers it a bug, at least for the Core Duo processor, where the specification update notes (vaguely) this behaviour, though even the Pentium 3 demonstrates this behaviour.

There's another anomaly, when we play with these opcodes:

0f 20
0f 21
0f 22
0f 23

They are documented as accepting only register encodings ("The 2 bits in the mod field are always 11B"). Therefore, anything else should cause an exception, but that's not what happens. In fact, they support the full range of encodings, but in a special way. The quote should actually say, "The 2 bits in the mod field are always interpreted as 11B". That is, no matter what value is in the mod field, the instruction always decodes to a register access, not a memory access. So, for those opcodes, there is no ModR/M parsing, no fetch of additional bytes, and no page fault.

Here's another anomaly: 0f 18 (prefetch) is undocumentedly fully allocated. Only the first four entries are documented, but the other four also execute without exception. I don't know how to test what they are doing, though.

Finally, 0f 1f (multi-byte NOP) is also undocumentedly fully allocated. Interestingly, despite its name, it does access memory if the Mod/RM byte tells it to, so this "No OPeration" can cause page faults. Not quite a NOP after all.

Make your own free website on Tripod.com