The blog continues at

April 16, 2014

You May Not Need to Debug SSE Instructions

There are binaries that contain implementation of an algorithm in two ways. The first one is optimized to run on all architectures and so it consists of i386 instructions only. The second one is optimized to run fast and therefore it has SSE instructions. When the application runs it checks the architecture to decide which implementation of the algorithm to be executed.

It is common thing that binaries can contain various implementations of the same algorithm. One example is the Microsoft Visual C++ runtime.

You may not need to debug SSE instructions though. What you need to do is to tell your application that SSE support is not available - which is most likely a lie in 2014.

Recently, when I debugged a Windows application I noticed it executes SSE instructions. Here is how I got my application to believe that there is no SSE support available.

I knew about CPUID instruction. It can come back with plenty information about the processor. If CPUID is used with input EAX set to 1 feature information is returned in ECX and EDX.

We only need the SSE-related bits of the feature information. Here are they (source: Intel Developer Manual).

    Bit 0  SSE3 Extensions
    Bit 9  SSSE3 Extensions
    Bit 19 SSE4.1
    Bit 20 SSE4.2

    Bit 25 SSE Extensions
    Bit 26 SSE2 Extensions

The idea is when CPUID is executed with EAX set to 1 we need to clear SSE bits in ECX and EDX. To clear SSE bits we have to mask the registers like below.


I used the following Windbg command to search for CPUID instructions in the code section of the virtual image.

# cpuid <address> L?<size>

I saw CPUID at few places. I checked all of them to find the ones that have EAX set to 1 input. I found few fragments like these.

xor eax,eax
inc eax

I put breakpoints just after each of the right CPUID instructions. When the breakpoint hit the SSE flags are cleared and the execution resumes.

bp <address> "reip; recx=ecx&0ffe7fdfe; redx=edx&0f9ffffff; gc"

And it worked as expected in my experiment. The application took the alternate, but slower, code path of i386 instructions.

A final note, this technique may be used to avoid debugging SSE instructions but it can also be useful to increase code coverage during security testing.
  This blog is written and maintained by Attila Suszter. Read in Feed Reader.