It is common thing that binaries can contain various implementations of the same algorithm. One example is the Microsoft Visual C++ runtime.
You may not need to debug SSE instructions though. What you need to do is to tell your application that SSE support is not available - which is most likely a lie in 2014.
Recently, when I debugged a Windows application I noticed it executes SSE instructions. Here is how I got my application to believe that there is no SSE support available.
I knew about CPUID instruction. It can come back with plenty information about the processor. If CPUID is used with input EAX set to 1 feature information is returned in ECX and EDX.
We only need the SSE-related bits of the feature information. Here are they (source: Intel Developer Manual).
Bit 0 SSE3 Extensions
Bit 9 SSSE3 Extensions
Bit 19 SSE4.1
Bit 20 SSE4.2
Bit 25 SSE Extensions
Bit 26 SSE2 Extensions
The idea is when CPUID is executed with EAX set to 1 we need to clear SSE bits in ECX and EDX. To clear SSE bits we have to mask the registers like below.
I used the following Windbg command to search for CPUID instructions in the code section of the virtual image.
# cpuid <address> L?<size>
I saw CPUID at few places. I checked all of them to find the ones that have EAX set to 1 input. I found few fragments like these.
I put breakpoints just after each of the right CPUID instructions. When the breakpoint hit the SSE flags are cleared and the execution resumes.
bp <address> "reip; recx=ecx&0ffe7fdfe; redx=edx&0f9ffffff; gc"
And it worked as expected in my experiment. The application took the alternate, but slower, code path of i386 instructions.
A final note, this technique may be used to avoid debugging SSE instructions but it can also be useful to increase code coverage during security testing.