Reversing on Windows: 2014

December 22, 2014

Use of refactor can make Visual C# 2013 to crash

For the change I'm working on a C# project not much to do with security. I was happy to re-discover and use the refactor feature in Visual C# 2013 until the point it crashed. Here are the simplified steps to reproduce the crash.

Create an empty Visual C# project, or open any C# project.
Add a class like below.
Do right-click on hello in hello = true, select Refactor, then select Remove Parameters...

Visual C# crashes.

Note, I was able to trigger the crash by an alternate way by simply using the hotkey Ctrl+R, Ctrl+V when the cursor is within hello in hello = true.

The bug reproduces with the up-to-date version of Visual C# 2013 Community Edition.

December 19, 2014

Variable-length permutation with repetition using backtracking

Recently, I needed the implementation of the search algorithm that qualifies for the followings.

It must be able to produce the variable-length permutation with repetition of the given set.
It must be based on backtracking.
It must be a state machine.
It must have basic programming elements only.
It must be clean without irrelevant part.

Even I needed the final implementation in ActionScript, I was searching for the algorithm in C that I would convert later. Thought my odds would be better if I did so. Unluckily I didn't find anything useful, and so I wrote the code myself. Here is the sample implementation in C.

// Features:
// - Variable-length permutation with repetition
// - Backtracking
// - State machine

#include <stdio.h>

#define ITEMS 3 // number of items
#define LASTITEM_IDX (ITEMS-1) // index of the last item
#define INIT -1 // initial value
#define EMPTY 0 // empty

static char set[ITEMS] = {1,2,3}; // set

// State
static char idx[ITEMS] = {INIT,INIT,INIT}; // indices into set
static char out[ITEMS] = {EMPTY,EMPTY,EMPTY}; // output

static int index = INIT;
static bool dir = 1; // direction: deep(1) / wide(0)
// State - END

bool backtrack();

// item to try
void add_item() {
idx[index]++;
out[index] = set[idx[index]];
}

bool step_wide() {
if (index == INIT) {
// reached initial level
// no more items to take
return false;
}
// way to go wider?
if (idx[index] < LASTITEM_IDX) {
add_item(); // take item
return true;
}
else {
return backtrack(); // backtrack a level higher
}
}

bool backtrack() {
// backtrack a level higher
idx[index] = INIT;
out[index] = EMPTY;
index--;

return step_wide();
}

bool step_deep() {
// way to go deeper?
if (index < LASTITEM_IDX) {
index++; // step a level deeper
add_item(); // take item
return true;
}
else {
return step_wide(); // step wide
}
}

// returns true if step is successful
// returns false if no more items to take
bool step() {
if (dir) {
return step_deep();
} else {
return step_wide();
}
}

int main(int argc, char* argv[])
{
while (step()) {
for (int i=0; i<ITEMS && out[i]!=EMPTY; i++) {
printf("%02x ", out[i]);
}
printf("\n");
}

return 0;
}

The source can be downloaded from here.

The output of the run looks like this.

01
01 01
01 01 01
01 01 02
01 01 03
01 02
01 02 01
01 02 02
01 02 03
01 03
01 03 01
01 03 02
01 03 03
02
02 01
02 01 01
02 01 02
02 01 03
02 02
02 02 01
02 02 02
02 02 03
02 03
02 03 01
02 03 02
02 03 03
03
03 01
03 01 01
03 01 02
03 01 03
03 02
03 02 01
03 02 02
03 02 03
03 03
03 03 01
03 03 02
03 03 03

August 9, 2014

Instrumenting Flash Player to Inspect JITted Pages for Integer Errors

In this blog post I'm writing about the method I experiment with to discover potential areas, that may or may not be prone to integer errors, in Flash Player.

I have 26k flash files that are used as a corpus to generate the test samples for Flash Player. The test samples have an element of 0x41424241 in the integer pool. I have a total of 344k files generated to test Flash Player with.

During the test, I use a pintool to instrument the JITted code. The pintool is based on this implementation. Since the elements in the integer pool being dereferenced by action script it makes sense to restrict the instrumentation to JITted code.

I use instruction-level instrumentation that allows to check the register values at every single instruction being executed. If any of the general registers have the value of 0x41424241 and the instruction is referencing to that register, the instruction information along with general registers are logged.

The pintool pre-allocates the address of 0x41424241 so Flash Player won't use that memory address and so reducing the irrelevant lines in the log.

There is no need to wait for the test to finish to get partial results. A log file is generated for each test file rendered by Flash Player.

The size of log files are vary. Some are close to size 0 specially if the value above makes the code to fail early. There are many log files with size 4k. When the execution keeps going long the size is about 16k. If the value has to do something with a loop the log can reach 100k but that's rare.

Logs can be grouped and many can be thrown away as they don't contain instructions associated with vulnerabilities.

What I look for is like signed shift, addition, subtraction, or multiplication instructions. If the value is used in displacement with lea instruction that counts suspicious too.

Once an address in the log is chosen for closer inspection, I reproduce the log on isolation with an option to dump the JITted pages so I can manually analyze the surrounding area of that address in disassembler. Knowing the state information it's also possible to debug the code.

If results positive, certain level of automation can be added.

July 30, 2014

Practical Suggestions for Writing a Pintool

This is my list of practical suggestions to people developing a pintool. Since I dealt with these previously I thought to jot them down to help others. By applying this you should be somewhat closer to avoid your pintool from unexpected termination.

Start from scratch. So you use a sample pintool to develop your own. Rather than to modify the sample, start with an empty project and gradually build it up by taking elements from the sample.

Simplicity. Keep the code-base small and easy to understand.

Testing. As a part of development, aim to test if all blocks have been exercised. Refrain from adding unreachable blocks.

Errors. Check for errors as early as possible, specially when returning from a Pin API.

Safe memory dereference. Whenever you have to dereference the target's memory use PIN_SafeCopy. If you want to read an integer you should use this function, too, rather than the dereference "*" operator.

Thread safety. Be aware the target may be running with multiple threads. Possibly, you want your pintool to be thread safe.

Multi-threading. Sometimes you want your variables to be stored in the thread context to have the ability to distinguish the analysis between threads. In that case looking at the sample inscount_tls.cpp is a good start.

Probe mode. Use of probe mode is always preferred as it gives better performance. However, only limited Pin facilities available in probe mode.

Limit instrumentation. Consider restricting the instrumentation to routines or libraries and even can avoid the instrumentation of shared libraries to get better performance.

Standard library. It's good idea to use C++ standard library in a pintool as it provides the most frequently needed data structures.

Visual Studio. Visual C++ project file is available with Pin framework in MyPinTool folder. Alternatively, you can create one for yourself after looking at an earlier post.

Trace vs Ins. Instruction instrumentation is practically the same as trace instrumentation. You can do instruction analysis from the trace by iterating through the instructions.

Output. Having output routines in Fini makes the application to run faster than having them in analysis functions. However if the application terminates unexpectedly and so Fini is not called there will be no results shown. By having output routines in analysis functions makes the application to run slower but if the application terminates unexpectedly partial results may be shown.

July 26, 2014

Inspection of SAR Instructions

SAR stands for Shift Arithmetic Right and the instruction performs arithmetic shift. The instruction preserves the sign of the value to be shifted and so the vacant bits are filled according to the sign-bit.

Compilers generate SAR instruction when right shift operator ">>" is used on a signed integer.

The use of SAR instruction can potentially lead to create a signedness bug if it's assumed the shift is unsigned.

Given the following simplified example.

char retItem(char* arr, int value)
{
return arr[value>>24];
}

If value is positive the code is working as expected. However if value is negative the program can read out of the bounds of arr.

Other example would be to compare the signed value after the shift to an unsigned value leading to implicit conversion that may lead to trigger bug.

In my experiment, in several cases, it is seen that memory is being dereferenced involving SAR instruction. These places may be worthy to look for bugs, specially if the value to be shifted is a user input or is a controlled one.

If an unsigned jump is followed by a signed shift that could be a potential to look for bugs as well.

Regular expressions or scripts can be used to search for patterns of occurrences of SAR instructions. When it's not feasible to review all occurrences of SAR, a pintool may be used to highlight what SAR instructions have been executed, and only focus on those executed.

July 22, 2014

Examining Native Code by Looking for Patterns

Earlier this year a post was published of examining data format without using the program that reads the format. That post discusses patterns to look for, in order to identify certain constructs. This post focuses on static methods of examining code that can be either the complete code section of the file, memory dump, or just fragment. It also describes selected ideas what patterns to look for when examining a given code.

The reason one may look for patterns in code is to locate certain functionalities or to get high-level understanding of what the code does. Others may look for certain construct that may be the key part of the program in security point of view.

It's true to say one can expect this to be a rapid method compared to other methods such as line-by-line instruction analysis.

But, it's always good to read documentation, if possible at all, to get an overview of the expectations.

There are methods that more effective if performed on small region. Therefore to narrow the scope of the search wherein to look for pattern is something good to do at the beginning of analysis albeit it's not always feasible to do with enough certainty. Anyway, one can always widen the search region if required at a later stage.

Compilers tend to produce executable files with particular layout. Some have the library code at the beginning of the code section, while others have it at the end of the code section.

If there is no information about the compiler or no information about the layout there are other ways to locate the library in the code.

You may look for library function calls that can be visible in disassembler. Library code may have distinct color in disassembler.

Library/runtime code often have many implementations of functions to use the advantage of latest hardware. An example is MSVC. And so SSE instructions/functions may indicate the presence of library/runtime code.

Library code can be spotted by looking for strings can be associated with particular libraries.

Library/runtime code can be spotted by looking for constant values that can be associated with particular libraries such as cryptographic libraries that tend to have many constants.

To guess the compiler that was used to generate the code is possible by analyzing the library/runtime code.

In case the code is just a fragment of user code you may consider examining the instructions how they are encoded. Intel encodings are redundant and one instruction can have multiple encodings. This is something to make guess on what compiler was used.

If multiple encodings of an instruction is found in a binary the code that could be generated with a polymorphic encoder.

Also, code has other characteristics that may differ between compilers such as padding and stack allocation.

Imports and exports as well as strings can tell a lot. You may check where they are referenced in the code.

Debugging symbols can help awfully lot if the disassembler can handle that. Sometimes it's available sometimes it's not.

No matter what code you're looking at it most likely deals with input data. That case it may get the data from file, from network, via standard API calls. These are valuable areas to audit for security problems, and it's possible to follow how the data returned by these APIs. It may require to analyze caller functions as usually these APIs are wrapped around many calls before using the input.

Just like when reading the data the code may write data, or send data via standard API calls. These areas may be security-sensitive.

Programs have centralized, well-established functions. These functions, for example, read dword values, read data into structures and propagate any other internal storage. Discovery of these functions not considered hard, they are normally small, and have instructions of memory read and write. By looking where they referenced from we can find good attack surfaces.

Good to keep in mind that code sections can contain data besides code. But normally data is stored in data section. In the disassembler it's convenient to see how the data is referenced, and may decide if there is an attack surface nearby.

CRC and hash constants may indicate there is some data which is being CRC'd or hashed. You may figure out where is that data from and how can you perform security testing around.

When a library is using a parameter hardcoded it's often encoded as a part of the instruction rather than stored in data section of the executable. Example encoding looks like mov eax, <param> or mov al, <param>.

When a data format is parsed often a magic value is tested. Looking for instructions like cmp reg, <magic> or cmp dword ptr [addr], <magic> or similar instructions can help to locate attack surfaces.

Longer strings may be broken into immediate values and compared with multiple cmp instructions.

Looking for strcmp function calls is good idea to look for if you want to find code that test for data format as often strcmp functions are used for this purpose.

If the code is optimized for speed there are many ways to confirm. Normally the readability of code bad, for example when the code performs division or use the same memory address for multiple variables. If EBP register is used in arithmetic or other than to store stack base address that could indicate the code is optimized.

Perhaps there are circumstances when looking at the frequency of instructions, looking for undocumented instructions, or rare instruction, or instructions that not present can give us valuable clues that help the examination.

Intuitively going through the code and looking for undefined patterns can be good idea if the scientific ways have been exhausted.

July 16, 2014

251 Potential NULL Pointer Dereferences in Flash Player

251 potential NULL pointer dereference issues have been identified in Flash Player 14 by pattern matching approach. The file examined is NPSWF32_14_0_0_145.dll (17,029,808 bytes).

The issues are classified as CWE-690: Unchecked Return Value to NULL Pointer Dereference.

I don't copy&paste all the issues in this blog post but bringing up few examples.

First Example

0:012> uf 5438a1d0
NPSWF32_14_0_0_145!BrokerMainW+0xf6f6b:
5438a1d0 f6410810 test byte ptr [ecx+8],10h
5438a1d4 8b4104 mov eax,dword ptr [ecx+4]
5438a1d7 7411 je NPSWF32_14_0_0_145!BrokerMainW+0xf6f85 (5438a1ea)

NPSWF32_14_0_0_145!BrokerMainW+0xf6f74:
5438a1d9 85c0 test eax,eax
5438a1db 740b je NPSWF32_14_0_0_145!BrokerMainW+0xf6f83 (5438a1e8)

NPSWF32_14_0_0_145!BrokerMainW+0xf6f78:
5438a1dd 8b4c2404 mov ecx,dword ptr [esp+4]
5438a1e1 8b448808 mov eax,dword ptr [eax+ecx*4+8]
5438a1e5 c20400 ret 4

NPSWF32_14_0_0_145!BrokerMainW+0xf6f83:
5438a1e8 33c0 xor eax,eax <--Set return value to NULL

NPSWF32_14_0_0_145!BrokerMainW+0xf6f85:
5438a1ea c20400 ret 4 <--Return with NULL
0:012> u 5438a47b L2
NPSWF32_14_0_0_145!BrokerMainW+0xf7216:
5438a47b e850fdffff call NPSWF32_14_0_0_145!BrokerMainW+0xf6f6b (5438a1d0)
5438a480 8a580c mov bl,byte ptr [eax+0Ch] <--Dereference NULL

Second Example

0:012> uf 54362e60
NPSWF32_14_0_0_145!BrokerMainW+0xcfbfb:
54362e60 8b4128 mov eax,dword ptr [ecx+28h]
54362e63 8b4c2404 mov ecx,dword ptr [esp+4]
54362e67 3b4804 cmp ecx,dword ptr [eax+4]
54362e6a 7205 jb NPSWF32_14_0_0_145!BrokerMainW+0xcfc0c (54362e71)

NPSWF32_14_0_0_145!BrokerMainW+0xcfc07:
54362e6c 33c0 xor eax,eax <--Set return value to NULL
54362e6e c20400 ret 4 <--Return with NULL

NPSWF32_14_0_0_145!BrokerMainW+0xcfc0c:
54362e71 56 push esi
54362e72 8b748808 mov esi,dword ptr [eax+ecx*4+8]
54362e76 56 push esi
54362e77 e8e4b0faff call NPSWF32_14_0_0_145!BrokerMainW+0x7acfb (5430df60)
54362e7c 83c404 add esp,4
54362e7f 85c0 test eax,eax
54362e81 7407 je NPSWF32_14_0_0_145!BrokerMainW+0xcfc25 (54362e8a)

NPSWF32_14_0_0_145!BrokerMainW+0xcfc1e:
54362e83 8b4010 mov eax,dword ptr [eax+10h]
54362e86 5e pop esi
54362e87 c20400 ret 4

NPSWF32_14_0_0_145!BrokerMainW+0xcfc25:
54362e8a 8bc6 mov eax,esi
54362e8c 83e0f8 and eax,0FFFFFFF8h
54362e8f 5e pop esi
54362e90 c20400 ret 4
0:012> u NPSWF32_14_0_0_145+006b4eb2 L2
NPSWF32_14_0_0_145!BrokerMainW+0xd1c4d:
54364eb2 e8a9dfffff call NPSWF32_14_0_0_145!BrokerMainW+0xcfbfb (54362e60)
54364eb7 8b7004 mov esi,dword ptr [eax+4] <--Dereference NULL

Third Example

0:012> uf 5429979a
NPSWF32_14_0_0_145!BrokerMainW+0x6535:
5429979a 0fb74108 movzx eax,word ptr [ecx+8]
5429979e 48 dec eax
5429979f 48 dec eax
542997a0 740c je NPSWF32_14_0_0_145!BrokerMainW+0x6549 (542997ae)

NPSWF32_14_0_0_145!BrokerMainW+0x653d:
542997a2 83e815 sub eax,15h
542997a5 7403 je NPSWF32_14_0_0_145!BrokerMainW+0x6545 (542997aa)

NPSWF32_14_0_0_145!BrokerMainW+0x6542:
542997a7 33c0 xor eax,eax <--Set return value to NULL
542997a9 c3 ret <--Return with NULL

NPSWF32_14_0_0_145!BrokerMainW+0x6545:
542997aa 8d4110 lea eax,[ecx+10h]
542997ad c3 ret

NPSWF32_14_0_0_145!BrokerMainW+0x6549:
542997ae 8d410c lea eax,[ecx+0Ch]
542997b1 c3 ret
0:012> u NPSWF32_14_0_0_145+005f3423 L2
NPSWF32_14_0_0_145!BrokerMainW+0x101be:
542a3423 e87263ffff call NPSWF32_14_0_0_145!BrokerMainW+0x6535 (5429979a)
542a3428 8038fe cmp byte ptr [eax],0FEh <--Dereference NULL

You can find a list of 251 potential NULL pointer dereferences in Flash Player here.

July 14, 2014

Issues with Flash Player & Firefox in Non-default Configurations

Few months ago I encountered a bug when a fuzzed flash file is being rendered by Flash Player in Firefox. This bug can be reached only in the non-default configuration described below so very unlikely you are affected by this bug.

To trigger the bug the flash player module has to be loaded into Firefox's virtual address space. And this can be achieved if Flash Player protected mode is disabled and Firefox plugin container process is disabled too.

The bug involves to dereference arbitrary memory address via a CALL instruction in the vtable dispatcher. Here you can see the bug in the exception state.

0:048> g
Implementation limit exceeded: attempting to allocate too-large object
error: out of memory
(170fc.16998): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=00000001 ebx=00000000 ecx=0034f670 edx=00000000 esi=1600f2c8 edi=0000001c
eip=5996bd5f esp=0034f638 ebp=0034f668 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\SysWOW64\Macromed\Flash\NPSWF32_14_0_0_145.dll -
NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5c2:
5996bd5f 8b461c mov eax,dword ptr [esi+1Ch] ds:002b:1600f2e4=????????
0:000> u eip L10
NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5c2:
5996bd5f 8b461c mov eax,dword ptr [esi+1Ch] <--Read unmapped address
5996bd62 a801 test al,1
5996bd64 7420 je NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5e9 (5996bd86)
5996bd66 33d2 xor edx,edx
5996bd68 39550c cmp dword ptr [ebp+0Ch],edx
5996bd6b 7519 jne NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5e9 (5996bd86)
5996bd6d 8b4e04 mov ecx,dword ptr [esi+4]
5996bd70 83e0fe and eax,0FFFFFFFEh
5996bd73 89461c mov dword ptr [esi+1Ch],eax
5996bd76 8b06 mov eax,dword ptr [esi] <--Read unmapped address
5996bd78 51 push ecx
5996bd79 8bce mov ecx,esi
5996bd7b 895604 mov dword ptr [esi+4],edx
5996bd7e 895618 mov dword ptr [esi+18h],edx
5996bd81 ff500c call dword ptr [eax+0Ch] <--Dereference arbitrary memory content
5996bd84 eb06 jmp NPSWF32_14_0_0_145!unuse_netscape_plugin_Plugin+0x5ef (5996bd8c)

I had reported this bug to Adobe and they opened a case PSIRT-2707 on 14/April/2014 but so far Adobe didn't confirm whether or not it was able to reproduce the bug or the exception state reported.

Again, the bug doesn't affect the default configuration, and so very unlikely you're affected by this. However, users using Firefox with plugin-container disabled as well as Flash Player plugin with protected mode disabled are affected by this issue.

The original report is about Flash Player 13_0_0_182 and Firefox 28.0 but the testcase fails with Flash Player 14_0_0_145 and Firefox 30.0 (latest available till today).

These are the steps to reproduce the bug.

Edit mms.cfg to have ProtectedMode=0 to disable protected mode in Flash Player
Start cmd.exe and type "set MOZ_DISABLE_OOP_PLUGINS=1" to disable plugin-container in Firefox

These settings above required to get Flash Player plugin loaded in firefox.exe's address space.

Start Firefox from command prompt opened previously
Open fuzzed.swf in Firefox (drag n drop should work)
Attach firefox.exe process to Windbg when you notice that Firefox is hanging
Exception should occur in few second. If you see the out-of-memory error in the debugger log without exception you may restart the browser and try again.

The fuzzed flash file has the following changes compared to the template file. The value of the first item in the integer pool has been changed to a large value. TagLength of DoAbc tag and FileSize of the main header have been therefore updated to maintain the integrity of the flash file.

Drop me an email if you think you need the testcase.

May 13, 2014

Security Implications of IsBad*Ptr Calls in Binaries

IsBad*Ptr [1] functions are to test whether the memory range specified in the argument list is accessible. Despite the fact they have been banned, they are still being referenced in many binaries shipped with popular applications.

In this post I'm describing the inner working of IsBad*Ptr, the steps the attacker may follow to abuse them, and mention few examples of binaries that have a reference to these banned functions.

Inner Working

When IsBad*Ptr is executed it first registers an exception handler. Then, it attempts to access to the memory specified in the argument list.

For example, IsBadReadPtr has the following instruction to read memory. ECX is the memory address specified in the argument list.


mov al,byte ptr [ecx]

If the instruction raises an exception, the execution is transferred to the exception-handler code. And IsBad*Ptr returns TRUE, meaning, it is a "bad" pointer because the data pointed by is inaccessible.

If the instruction executes without an exception being raised IsBad*Ptr returns FALSE.

Steps of Attacking

The attack against IsBad*Ptr looks like this.

The attacker attempts to supply an invalid pointer parameter to IsBad*Ptr that returns TRUE.
~~The attacker refines step #1 in a way that the supplied invalid pointer becomes valid due to a forced allocation for the location pointed by the invalid pointer.~~ The attacker refines step #1 in a way that the supplied invalid pointer will point to valid memory location allocated by heap spray. And so, IsBad*Ptr returns FALSE leading to enter in an inconsistent state; that may or may not be an exploitable state.
If the attacker can perform step #2 with IsBadWritePtr, when the call returns, it's expected to reach code that writes the location pointed by the pointer -- and that has attacker controlled data. And so, he reaches a presumably exploitable condition.

Referencing IsBad*Ptr can be easily checked during binary analysis and it is worthy to do.

Examples

This code snippet below can be found in msvbvm60.dll in Windows folder.

.text:72A0FEE5 push 38h ; ucb
.text:72A0FEE7 push edi ; attacker supplies pointer was invalid before
.text:72A0FEE8 call ds:IsBadReadPtr ; and now it's valid because he's filled memory up
.text:72A0FEEE test eax, eax
.text:72A0FEF0 jnz loc_72A0FF80 ; fall through
.text:72A0FEF6 mov eax, [edi+4]
.text:72A0FEF9 mov eax, [eax+4]
.text:72A0FEFC mov esi, [eax+8] ; ESI is attacker controlled
.text:72A0FEFF and [ebp+arg_0], 0
.text:72A0FF03 mov ax, [edi+2]
.text:72A0FF07 test esi, esi
.text:72A0FF09 jz short loc_72A0FF42 ; fall through
.text:72A0FF0B movzx ebx, ax
.text:72A0FF0E mov eax, [esi] ; EAX is attacker controlled
.text:72A0FF10 push esi
.text:72A0FF11 call dword ptr [eax+0Ch] ; EIP is attacker controlled

This one below can be found in dxtrans.dll in Windows folder.

.text:35C6142C push 4 ; ucb
.text:35C6142E push esi ; attacker supplies pointer was invalid before
.text:35C6142F call ds:__imp__IsBadWritePtr@8 ; and now it's valid because he's filled memory up
.text:35C61435 test eax, eax
.text:35C61437 jz short loc_35C61440 ; jump is taken
.text:35C61439 mov eax, 80004003h
.text:35C6143E jmp short loc_35C6144A
.text:35C61440 loc_35C61440: ; CODE XREF: CDXBaseSurface::GetAppData(ulong *)+14
.text:35C61440 mov eax, [ebp+this]
.text:35C61443 mov eax, [eax+24h]
.text:35C61446 mov [esi], eax ; ESI is attacker controlled

The next code snippet is taken from v2.0.50727\mscorwks.dll in Windows folder. IsBadReadPtr is used to test the pointer that is passed to MultiByteToWideChar.

.text:7A0D17FB push eax ; ucb
.text:7A0D17FC push ebx ; attacker supplies pointer was invalid before
.text:7A0D17FD call ds:__imp__IsBadReadPtr@8 ; and now it's valid because he's filled memory up
.text:7A0D1803 test eax, eax
.text:7A0D1805 jz short loc_7A0D17A7 ; jump is taken
[...]
.text:7A0D17A7 cmp edi, esi
.text:7A0D17A9 jle short loc_7A0D17BF
.text:7A0D17AB push esi ; cchWideChar
.text:7A0D17AC push esi ; lpWideCharStr
.text:7A0D17AD push edi ; cbMultiByte
.text:7A0D17AE push ebx ; lpMultiByteStr - attacker's data
.text:7A0D17AF push 1 ; dwFlags
.text:7A0D17B1 push esi ; CodePage
.text:7A0D17B2 call ?WszMultiByteToWideChar@@YGHIKPBDHPAGH@Z ; WszMultiByteToWideChar(uint,ulong,char const *,int,ushort *,int)

I was collecting files with IsBad*Ptr in them, and have found plenty others including but not exclusively MSCOMCTL.OCX, EXCEL.EXE, Lenovo's, Corel's, Nokia's, AVerMedia's products...

UPDATE 13/May/2014 To add IsBad*Ptr to the program doesn't automatically mean to create bugs. However if IsBad*Ptr is present we have reason to believe that the function is expecting a pointer that might be invalid in certain circumstances. In that case IsBad*Ptr may be used to attack the program. And that's why it's important to conduct the audit according to this.

UPDATE 16/May/2014 The term "valid pointer" had an ambiguous meaning in the part Steps of Attacking. This is now reworded. Reference.

[1] IsBad*Ptr functions are IsBadReadPtr, IsBadCodePtr, IsBadWritePtr, and IsBadStringPtr. All of them are exports of kernel32.dll.

April 28, 2014

Order of Memory Reads of Intel's String Instructions

Neither the Intel Manual nor Kip R. Irvine's assembly book discusses the behavior I'm describing about x86 string instructions in this post.

Given the following instruction that compares the byte at ESI to the byte at EDI.


cmps    byte ptr [esi],byte ptr es:[edi]

To perform comparison the instruction must read the bytes first. The question is whether byte at ESI or byte at EDI is read first?

Intel Manual says:

Compares the byte, word, doubleword, or quadword specified with the first source operand with the byte, word, doubleword, or quadword specified with the second source operand

Kip R. Irvine's book titled Assembly Language for Intel-Based Computers (5th edition) says:

The CMPSB, CMPSW, and CMPSD instructions each compare a memory operand pointed to by ESI to a memory operand pointed to by EDI

Both of the descriptions explain what the instructions do but none of them says how. So I needed to do some experiments in Windbg to find the answer to the question.

The first experiment was not a good one. Initially, I thought I'd put processor breakpoint (aka memory breakpoint) at ESI and another one at EDI. I also thought to execute CMPS and let the debugger to break-in on either of the processor breakpoints. And here it goes why it was a bad idea. The execution of CMPS has to complete for debugger break-in. And, by the time the CMPS completes it hits both of the breakpoints.

The other experiment I came up with is like this. Set both ESI and EDI to point two distinct memory addresses that are unmapped. The assumption is when CMPS is executed it raises an exception when trying to read memory. By looking at the exception record we can tell the address the instruction tries to read from. Given that, we can tell if that value was assigned to ESI, or to EDI, and so we can tell whether byte at ESI or byte at EDI is read first.

Here is how I did the experiment in Windbg.

I opened an executable test file in Windbg. I assembled CMPS to be placed in the memory at EIP.

0:000> a
011113be cmpsb
cmpsb

I changed ESI to point to invalid memory. And I did the same with EDI.

0:000> resi=51515151
0:000> redi=d1d1d1d1

Here is the disassembly of CMPS. Also, you know from the highlighted text that both ESI and EDI point to unmapped memory addresses.

0:000> r
eax=cccccccc ebx=7efde000 ecx=00000000 edx=00000001 esi=51515151 edi=d1d1d1d1
eip=011113be esp=0022fb70 ebp=0022fc3c iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
test!wmain+0x1e:
011113be a6 cmps byte ptr [esi],byte ptr es:[edi] ds:002b:51515151=?? es:002b:d1d1d1d1=??

I executed the process leading to an expected access violation triggered by CMPS instruction.

0:000> g
(5468.1eec8): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
eax=cccccccc ebx=7efde000 ecx=00000000 edx=00000001 esi=51515151 edi=d1d1d1d1
eip=011113be esp=0022fb70 ebp=0022fc3c iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00010202
test!wmain+0x1e:
011113be a6 cmps byte ptr [esi],byte ptr es:[edi] ds:002b:51515151=?? es:002b:d1d1d1d1=??

I got the details of the exception like below.

0:000> .exr -1
ExceptionAddress: 011113be (test!wmain+0x0000001e)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 00000000
Parameter[1]: d1d1d1d1
Attempt to read from address d1d1d1d1

As you can see the access violation was occurred due to an attempt to read from d1d1d1d1 that is the value of EDI. Therefore, to answer the question at the beginning of the article, the byte at EDI is read first.

To give more to the fun, you may try how emulators handle this - that is the order of memory reads of CMPS instructions.

If you like this you may click to read more debugging posts.

UPDATE 28/April/2014 Of course I don't encourage people to rely on undocumented behavior when developing software. Stephen Canon says Intel may optimize the microcode for operands from time to time, and so we don't have a good reason to believe this behavior to be stable.

UPDATE 29/April/2014 Here is a Windows C++ program to test this behavior on your architecture: cmps-probe.cpp

April 23, 2014

Inspection of Division & Multiplication

Division and multiplication calculations can lead to trigger bugs, and potentially pose as security risks. Here are few things that I believe to be helpful for those who do binary inspection.

Division

Production quality binaries are normally built with optimization enabled which makes the binary to run fast. One of the optimizations technique for the compiler is to emit a series of fast instructions instead of a single slow instruction.

DIV and IDIV instructions are known to be slow. As a part of optimization the compiler emits a series of fast instructions that are functionally equivalent to DIVs. The fast instructions are shift, multiplication, and addition instructions that take magic (constant) values depending on the divisor. Therefore the divisor has to be known at compile-time to apply optimization.

If the optimized binary has any DIVs, that means, the divisor was not known at compile time. Thus it's known at run-time, and so it could be a user-controlled value or a user input taken as it is.

Division can cause exception if the divisor is 0, or if the result is to large to store.

Division by Zero in CLR's Native Code

As an interesting experiment I looked at what happens when an integer is divided by zero in C#.

CLR generates native code with division instruction in it. When the instruction of division by zero is executed, an exception is raised that is handled by CLR's exception handler.

So the generated code with division in it doesn't have a test for the divisor. It's left for the exception handler to handle division by zero situations.

Multiplication

Like division, multiplication can be optimized, too, by using a sequence of fast instructions (sometimes one instruction). Whether or not it's worth optimizing depends on the value of multiplier (or multiplicand).

The multiplication you can see in binary might not be seen on source-code level. And some multiplication cannot be easily spotted in binary code due to optimization. And, multiplications can lead to trigger bugs.

Overflow in Multiplication

Multiplication can lead to integer overflow. Multiplication of two values are more likely to lead to integer overflow than addition of the two values. Multiplication of two word length integers can overflow on 32-bit but addition can't.

Few instances of the IMUL instruction can take immediate value, that is the multiplier. It's easily possible to calculate what multiplicand overflows the multiplication. The challenging part is to determine how the value could be assigned to the multiplicand to trigger overflow.

It's worth searching for MUL and IMUL instructions in the binary using the # (hash mark) Windbg command.

Overflow in Multiplication by Scale Factor

Scale factor is a constant value by which the register in the instruction is multiplied. The scale factor is either 1, 2, 4 or 8.

Use of scale factor could be the result of optimization of multiplication. The example below demonstrates to multiply a value by 4.


lea eax,[edi*4]

Other common use case involves to dereference an element of the array. In the below example the array consists of elements of size 8. On source-level there is no multiplication but on low-level there is.


mov [edi+edx*8],eax

If the value of the register by which the scale factor is multiplied is large enough an integer overflow can occur.

Look at the below instruction. Even the multiplication might not overflow the result can, due to base (esp) and displacement (8000).


mov [esp+ebx*4+8000],eax

Method of Inspection

Generally, it's not feasible to review all the occurrences of certain instructions but on critical areas it might be reasonable to do. Instruction tracing, and tracing like this can be a good start to narrow the area that can be inspected closer.

April 16, 2014

You May Not Need to Debug SSE Instructions

There are binaries that contain implementation of an algorithm in two ways. The first one is optimized to run on all architectures and so it consists of i386 instructions only. The second one is optimized to run fast and therefore it has SSE instructions. When the application runs it checks the architecture to decide which implementation of the algorithm to be executed.

It is common thing that binaries can contain various implementations of the same algorithm. One example is the Microsoft Visual C++ runtime.

You may not need to debug SSE instructions though. What you need to do is to tell your application that SSE support is not available - which is most likely a lie in 2014.

Recently, when I debugged a Windows application I noticed it executes SSE instructions. Here is how I got my application to believe that there is no SSE support available.

I knew about CPUID instruction. It can come back with plenty information about the processor. If CPUID is used with input EAX set to 1 feature information is returned in ECX and EDX.

We only need the SSE-related bits of the feature information. Here are they (source: Intel Developer Manual).

In ECX:


    Bit 0  SSE3 Extensions

    Bit 9  SSSE3 Extensions

    Bit 19 SSE4.1

    Bit 20 SSE4.2

In EDX:


    Bit 25 SSE Extensions

    Bit 26 SSE2 Extensions

The idea is when CPUID is executed with EAX set to 1 we need to clear SSE bits in ECX and EDX. To clear SSE bits we have to mask the registers like below.


ECX<-ECX&FFE7FDFE

EDX<-EDX&F9FFFFFF

I used the following Windbg command to search for CPUID instructions in the code section of the virtual image.

# cpuid <address> L?<size>

I saw CPUID at few places. I checked all of them to find the ones that have EAX set to 1 input. I found few fragments like these.

xor eax,eax

inc eax

cpuid

I put breakpoints just after each of the right CPUID instructions. When the breakpoint hit the SSE flags are cleared and the execution resumes.

bp <address> "reip; recx=ecx&0ffe7fdfe; redx=edx&0f9ffffff; gc"

And it worked as expected in my experiment. The application took the alternate, but slower, code path of i386 instructions.

A final note, this technique may be used to avoid debugging SSE instructions but it can also be useful to increase code coverage during security testing.

April 8, 2014

Examining Unknown Binary Formats

This post is about to discuss the methods for examining unknown binary formats that can be either a file, file fragment, or memory dump.

Before discussing the methods I'm describing few scenarios when examination of an unknown format is appropriate.

Imagine you deal with an application that handles a certain file type that you want to fuzz. You think to carry out some dumb fuzzing with a bit of improvement. Before doing so, you may be examining the format to create an approximate map of the layout. So you'll get an idea what parts of the files are worth fuzzing, and what fuzzing method is reasonable to apply for each part.

In other scenario you might have the binary file but don't have the program that parses it. You want to know as much as possible of the format of the binary file to understand it's layout.

If the application that reads the file format is available you can use debugger to watch how the data is parsed. This scenario is not discussed here.

If the application that writes the format is available you can try the following idea. You may produce output file using the application. This can be done by save, export, convert options available in the application. Next time when producing output you change something minor in the settings that may produce a similar output file. Comparing the two output files you may see what changed.

Entropy analysis is very useful to locate compressed, encrypted, and other way encoded data. Higher entropy can indicate encoding of some kind. Lower entropy is likely anything else including text, code, header, data structures. Redundancy analysis is analogue to entropy analysis; the lower the redundancy the most likely the data is encoded.

Encoded data could be anything, even multimedia data. The compressed streams can have headers and/or magic bytes identify the compression type.

Character distribution of the file can tell us a lot. Creating a byte frequency map is very straightforward by using modern programming languages. That can tell us what are the most and less frequent bytes. We can easily know what are the bytes that are not present at all.

Strings can be discovered even with popular tools like a hex-editor. Most common encodings are ASCII and Unicode. If there is no terminating zero the length of the string is likely stored somewhere in the binary. It's often the preceding byte(s) of the first letter of the string.

Consecutive patterns, byte sequences are seen to be used for padding, for alignment, or to fill slack space.

Random-looking printable characters can indicate some kind of encoding of any data in plain text.

Scattered bytes, scattered zeros, scattered 0FFh bytes can indicate sequence of encoded integers. Integers can be offsets and lengths. Scattered zeros might indicate text in Unicode format.

It could be useful to analyze the density of zeros, printable characters, or of other patterns. This could be applied on the whole file or on a particular region of the file.

Consecutive values, integers might indicate an array of pointers. It might be useful to know if the values increasing, decreasing, or random values.

Also, good to know in what endianness the integers stored.

x86 code can be be detected by running disassembler on the binary. If you see a sequence of meaningful instructions that might be code-area.

There is a simpler way to look for x86 code though. You write a small program in some high level language that searchers for E8 (CALL) / E9 (JMP) patterns and calculates the absolute offset where the instruction jumps. If there is an absolute offset referenced from different places that might be an entry point of a real function. The more functions are identified the better the chance you have found code.

If you know what native code to look for you can search for a sequence of common instructions, like bytes at function entry point.

Meaningful text fragment in high-entropy area might indicate run-length encoding which is also known as RLE compression.

There is data format that looks like this. It consists of a sequence of structures, or chunks. The size of each structure is encoded sometimes as a first value in the structure. It's commonly seen that a sequence of compressed data is stored like that.

If it's known the binary is associated with certain time stamp or version number those constants might worth searching for.

Some methods described here can be combined with xor-search, and with other simple decoding techniques to discover the structure of the file.

April 5, 2014

Thoughts About Finding Race Condition Bugs

Race condition bugs can exist in multi-threaded applications. Improper synchronization can be the root cause of race condition bugs.

Executing stress testing is a good start to find bugs. It might not be an ideal black-box testing method though as it is mostly for developers to test their proprietary software. Injecting delays at various points into the target could help finding bugs but we need to know the right locations to inject the delays. Cuzz is a Microsoft tool for finding concurrency bugs by injecting random delays - it looks promising.

Using DBI (Dynamic Binary Instrumentation) it's possible to tell if an EIP is executed, and if so by what thread(s). Therefore it's possible to tell what code is executed by what thread(s).

Using DBI it is also possible to tell where (value of EIP) the thread context switch happens.

By having the above information we can make educated guesses where to inject the delays.

If a bug is found it might not be reachable from outside. That's always a possibility. However it's good to see if you can provide input that makes the application to run longer near the location of the intended delay. There might be a ReadFile that can take longer to complete if the file is large enough. Or there might be a loop where the iteration count can be controlled by user...

April 3, 2014

Change of Execution Flow in Debugger

When debugging sometimes we need to force the execution to either take or not take the conditional jump.

There are several ways to achieve this. One possibility is to overwrite the conditional jump with either JMP or NOP instruction to force the execution into the desired path.

The next trick is to simply change the instruction pointer. The below example demonstrates to increment the instruction pointer by 2 in Windbg.


reip=eip+2

Another idea involves to see what are the conditions of taking or not taking the conditional jump. Knowing the conditions you can change the register or data at the right memory location to influence the execution flow.

My favorite is to change the x86 flags when the instruction pointer points to the conditional jump. Below is how to set the zero flag in Windbg.


rzf=1

To see more info about flags check out msdn or Windbg's help.

April 1, 2014

Tracking Down by Pin

Recently, there was a challenging situation I had faced. At first sight it looked like a common debugging problem that can be solved with some experiment but the more time I spent on it the more difficult the situation looked like.

The situation was the following. The below instruction reads memory.

00400000+006026de mov eax,dword ptr [ebp+4]

What is the EIP of the instruction that writes [ebp+4]? This is all I wanted to know that stage.

Note, while looking at the instruction it looks like ebp+4 reads a stack address -- it reads actually a heap address.

First I was looking at the function if I can find the instruction in it that writes [ebp+4]. It wasn't there so I investigated the caller functions, and their callers, and so on. Again, it wasn't there but noticed something. The functions passed a pointer to a context as a parameter containing many variables including [ebp+4].

At this point I had a good reason to believe the situation looked difficult because the context is likely to set by an initializer that may be on a completely different code path to the one I was investigating.

You may ask why I didn't use processor breakpoint too see what instruction writes [ebp+4]. It was a heap address kept changing on every execution and the address was not known to put breakpoint on.

I could have gone back to the point when the structure is allocated, and I could have set a breakpoint relative to the base address and see what code writes [ebp+4]. That sounded good and I would have gone to that direction if I hadn't had a better idea.

I thought I could write a PinTool that tracks write and read memory accesses. It adds all instructions writing memory to the list. When the instruction that reads memory is reached the program searches the list for instructions wrote that address. Of course this has to be thread safe.

It took me a day to develop the PinTool and find the EIP that writes [ebp+4].

This is how I executed the PinTool from command line.

pin.exe -t TrackDownWrite.dll -ip 0x6026de -s 4 -- c:\work\<redacted>.exe

-ip is the instruction pointer where [ebp+4] is read
-s is the size of read/write to track

The result looked like this.


0bab05dc is read by 006026de

0bab05dc was written by 005ee358 before read

The wanted instruction is below.


00400000+005ee358 mov     dword ptr [eax+4],edx

The prototype is available for download. After finished my debugging task I also tested it a bit on Windows x86 and I think it looks useful for similar problems might arise in the future.

March 10, 2014

On-the-fly Switching Between Debuggers

Sometimes it's useful to switch between debuggers without restarting the target application. An example for doing so is when you want to use another debugger's capability that the one doesn't have. Here is how to do by using the well-known EB FE trick.

Instruct the debugger to break-in, and memorize the two bytes at EIP.
Replace the two bytes at EIP with EB FE that is JMP EIP.
Detach the debugger leaving the application in an endless loop.
Attach the other debugger to the running process.
Locate the thread of the endless loop by switching between threads, and when found, restore the two bytes you memorized.
Carry-on with the debugging using the other debugger.

Note, the patched thread could interfere with watchdog thread if any, however I haven't experienced it yet.

March 5, 2014

Trace And Watch

This is how I recently performed dynamic integer analysis on a 32-bit binary application that reads DWORD values from the file.

The file format contains many fields of type DWORD. There was given a sample file. I made as many copies of the sample file as many DWORD fields it had. I crafted each sample to have 0x41414141 in a DWORD field. Only one DWORD field was changed per sample so all DWORD fields were covered by the change.
I wrote a PinTool, called TraceAndWatch, for this occassion that checks the value of the general registers before every instruction is executed. It shows memory state including disassembly of the instruction when a register value matches 0x41414141.
I executed the application using TraceAndWatch and let the application to parse the first sample containing 0x41414141. TraceAndWatch produced a log and I saw what instructions using 0x41414141.
In static disassembly code, I located the instructions using 0x41414141 and saw arithmetic and comparison operations with that value.
In some cases I realized I can enter to other code path by changing 0x41414141 in the sample to other value e.g. to signed value like 0x88888888. And re-run the test with TraceAndWatch specifying to trace and watch instructions using 0x88888888.
I executed this manual test on all the samples produced earlier.

The following weaknesses can be audited by this approach.

CWE-839: Numeric Range Comparison Without Minimum Check
CWE-195: Signed to Unsigned Conversion Error
CWE-682: Incorrect Calculation
CWE-190: Integer Overflow or Wraparound
CWE-680: Integer Overflow to Buffer Overflow
CWE-191: Integer Underflow (Wrap or Wraparound)

Final Notes

This is a generic, and quick way to locate comparison and arithmetic of integers.

TraceAndWatch doesn't track other than general registers so you can loose track of integers when value copied to, like, SSE register.

When arithmetic is performed on the value e.g. 0x41414141 is multiplied by 2, you need to set TraceAndWatch to look for 0x82828282 not to loose the tracking.

TraceAndWatch is available for download on my OneDrive space. If you use it you may contact me with your experience.

February 19, 2014

Bug in Flash Player when processing PNG format

The Bug

The PNG file consists of a sequence of data structures called chunks. A chunk has a Length field that is a DWORD value. A specially crafted Length field can cause integer overflow in Flash Player leading to read out of the designated buffer. Here is the disassembly code snippet explaining the bug.

015344a0 e8f7feffff call FlashPlayer!WinMainSandboxed+0x1f1122 (0153439c) ;Read CHUNK.Length from attacker controlled buffer
015344a5 8bd8 mov ebx,eax ;CHUNK.Length = 0ffffffd3h
015344a7 6a04 push 4
015344a9 8d45fc lea eax,[ebp-4]
015344ac 50 push eax
015344ad 8bce mov ecx,esi
015344bb e8dcfeffff call FlashPlayer!WinMainSandboxed+0x1f1122 (0153439c)
015344c0 8b4d08 mov ecx,dword ptr [ebp+8]
015344c3 8901 mov dword ptr [ecx],eax
015344c5 8b560c mov edx,dword ptr [esi+0Ch] ;Current Position in buffer = 29h
015344c8 8945fc mov dword ptr [ebp-4],eax
015344cb 8d441a04 lea eax,[edx+ebx+4] ;<-First integer overflow
;TotalValue = Position + CHUNK.Length + 4
;TotalValue = 29h + 0ffffffd3h + 4 = 0
015344cf 3b4610 cmp eax,dword ptr [esi+10h] ;Compare TotalValue (0) to FileSize (3d0h)
015344d2 7351 jae FlashPlayer!WinMainSandboxed+0x1f12ab (01534525) ;Unsigned evaluation. Jump is not taken
015344d4 57 push edi
015344d5 6afc push 0FFFFFFFCh
015344d7 58 pop eax
015344d8 83cfff or edi,0FFFFFFFFh
015344db 3bd8 cmp ebx,eax ;Compare CHUNK.Length (0ffffffd3h) to hardcoded 0FFFFFFFCh
015344dd 7e26 jle FlashPlayer!WinMainSandboxed+0x1f128b (01534505) ;Signed evaluation. Jump is taken.
[...]
01534505 8b4e14 mov ecx,dword ptr [esi+14h] ;Set pointer to Buffer
01534508 03ca add ecx,edx ;Set Current Position in Buffer
0153450a 03cb add ecx,ebx ;<-Second integer overflow
;Increment by CHUNK.Length leading to position out of the buffer backward
0153450c e88bfeffff call FlashPlayer!WinMainSandboxed+0x1f1122 (0153439c)
[...]
0153439c 0fb601 movzx eax,byte ptr [ecx] ;<-Can read out of designated buffer
0153439f 0fb65101 movzx edx,byte ptr [ecx+1] ;<-Can read out of designated buffer
015343a3 c1e008 shl eax,8
015343a6 0bc2 or eax,edx
015343a8 0fb65102 movzx edx,byte ptr [ecx+2] ;<-Can read out of designated buffer
015343ac 0fb64903 movzx ecx,byte ptr [ecx+3] ;<-Can read out of designated buffer
015343b0 c1e008 shl eax,8
015343b3 0bc2 or eax,edx
015343b5 c1e008 shl eax,8
015343b8 0bc1 or eax,ecx
015343ba c3 ret

State in the erroneous code path looks like below. The designated buffer containing the content of PNG file starts at 00e4c810 where the PNG signature is seen. Due to the bug the instruction reads the memory at 4 bytes minus the pointer to the buffer, at 00e4c80c. Note, the instruction doesn't cause access violation because the illegally accessed memory address is mapped.

0:000> t
eax=fffffffc ebx=ffffffd3 ecx=00e4c80c edx=00000029 esi=0019e134 edi=ffffffff
eip=0153439c esp=0019dbf4 ebp=0019dc08 iopl=0 nv up ei pl nz na pe cy
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000207
FlashPlayer!WinMainSandboxed+0x1f1122:
0153439c 0fb601 movzx eax,byte ptr [ecx] ds:002b:00e4c80c=00
0:000> db ecx
00e4c80c 00 00 00 00 89 50 4e 47-0d 0a 1a 0a 00 00 00 0d .....PNG........
00e4c81c 49 48 44 52 00 00 01 2c-00 00 01 2c 08 02 00 00 IHDR...,...,....
00e4c82c 00 f6 1f 19 22 ff ff ff-d3 49 44 41 54 78 9c ed ...."....IDATx..
00e4c83c d9 31 8a c3 40 14 44 c1-1e e3 fb 5f 59 8a 9d 09 .1..@.D...._Y...
00e4c84c 1c bc 40 55 6c b4 20 70-f2 68 98 7f b6 6b bb ce ..@Ul. p.h...k..
00e4c85c ef df b6 f3 e8 9f f3 ad-6f 7d fb e7 b7 9f 01 a9 ........o}......
00e4c86c ef 4e fd 13 e0 dd 44 08-31 11 42 4c 84 10 13 21 .N....D.1.BL...!
00e4c87c c4 bc 8e 42 cc 12 42 4c-84 10 13 21 c4 44 08 31 ...B..BL...!.D.1

Root Cause

Two incorrect sanity checks were identified.

Incorrect sanity check (015344cf) because it happens after the overflow (015344cb).
Incorrect sanity check (015344db) because signed comparison is performed on CHUNK.Length that is unsigned.

Severity

The technical severity of this bug is low because diverting execution flow is not possible. Further analysis suggests that address disclosure is not possible because the memory region can be accessed out of the designated buffer doesn't contain address.

Reproduction

Open Flash Player 12.0.0.38 (flashplayer_12_sa.exe has a size of 10,339,208) in Windbg. Then execute the following command.
0:006> bp flashplayer + 001f44a0 2
0:006> g
Open the PoC in Flash Player (send me an e-mail for a copy). Debugger breaks-in so you can step through the disassembly code and see the data-flow as explained above.

I'm aware there is a new version of Flash Player 12.0.0.44. I verified and it's affected by this bug, too.

UPDATE On 26th February an Adobe engineer confirmed via e-mail that he could reproduce the bug.

February 13, 2014

Data Flow Tracking in Flash Player: Undocumented Bytecodes and JIT

Undocumented Bytecodes

I did some analysis how the bytecodes in DoABC tag parsed, and compared the result against what I saw in the AVM2 documentation (May 2007). I found that Flash Player can parse certain bytecodes that are not mentioned in the documentation.

Bytecode	Note	Bytecode	Note	Bytecode	Note	Bytecode	Note
0x00	RESERVED	0x40	newfunction	0x80	coerce	0xc0	increment_i
0x01	UNDOCUMENTED	0x41	call	0x81	UNDOCUMENTED	0xc1	decrement_i
0x02	nop	0x42	construct	0x82	coerce_a	0xc2	inclocal_i
0x03	throw	0x43	callmethod	0x83	UNDOCUMENTED	0xc3	declocal_i
0x04	getsuper	0x44	callstatic	0x84	UNDOCUMENTED	0xc4	negate_i
0x05	setsuper	0x45	callsuper	0x85	coerce_s	0xc5	add_i
0x06	dxns	0x46	callproperty	0x86	astype	0xc6	subtract_i
0x07	dxnslate	0x47	returnvoid	0x87	astypelate	0xc7	multiply_i
0x08	kill	0x48	returnvalue	0x88	UNDOCUMENTED	0xc8	RESERVED
0x09	label	0x49	constructsuper	0x89	UNDOCUMENTED	0xc9	RESERVED
0x0a	RESERVED	0x4a	constructprop	0x8a	RESERVED	0xca	RESERVED
0x0b	RESERVED	0x4b	RESERVED	0x8b	RESERVED	0xcb	RESERVED
0x0c	ifnlt	0x4c	callproplex	0x8c	RESERVED	0xcc	RESERVED
0x0d	ifnle	0x4d	RESERVED	0x8d	RESERVED	0xcd	RESERVED
0x0e	ifngt	0x4e	callsupervoid	0x8e	RESERVED	0xce	RESERVED
0x0f	ifnge	0x4f	callpropvoid	0x8f	RESERVED	0xcf	RESERVED
0x10	jump	0x50	UNDOCUMENTED	0x90	negate	0xd0	getlocal_0
0x11	iftrue	0x51	UNDOCUMENTED	0x91	increment	0xd1	getlocal_1
0x12	iffalse	0x52	UNDOCUMENTED	0x92	inclocal	0xd2	getlocal_2
0x13	ifeq	0x53	UNDOCUMENTED	0x93	decrement	0xd3	getlocal_3
0x14	ifne	0x54	RESERVED	0x94	declocal	0xd4	setlocal_0
0x15	iflt	0x55	newobject	0x95	typeof	0xd5	setlocal_1
0x16	ifle	0x56	newarray	0x96	not	0xd6	setlocal_2
0x17	ifgt	0x57	newactivation	0x97	bitnot	0xd7	setlocal_3
0x18	ifge	0x58	newclass	0x98	RESERVED	0xd8	RESERVED
0x19	ifstricteq	0x59	getdescendants	0x99	RESERVED	0xd9	RESERVED
0x1a	ifstrictne	0x5a	newcatch	0x9a	RESERVED	0xda	RESERVED
0x1b	lookupswitch	0x5b	RESERVED	0x9b	RESERVED	0xdb	RESERVED
0x1c	pushwith	0x5c	RESERVED	0x9c	RESERVED	0xdc	RESERVED
0x1d	popscope	0x5d	findpropstrict	0x9d	RESERVED	0xdd	RESERVED
0x1e	nextname	0x5e	findproperty	0x9e	RESERVED	0xde	RESERVED
0x1f	hasnext	0x5f	UNDOCUMENTED	0x9f	RESERVED	0xdf	RESERVED
0x20	pushnull	0x60	getlex	0xa0	add	0xe0	RESERVED
0x21	pushundefined	0x61	setproperty	0xa1	subtract	0xe1	RESERVED
0x22	RESERVED	0x62	getlocal	0xa2	multiply	0xe2	RESERVED
0x23	nextvalue	0x63	setlocal	0xa3	divide	0xe3	RESERVED
0x24	pushbyte	0x64	getglobalscope	0xa4	modulo	0xe4	RESERVED
0x25	pushshort	0x65	getscopeobject	0xa5	lshift	0xe5	RESERVED
0x26	pushtrue	0x66	getproperty	0xa6	rshift	0xe6	RESERVED
0x27	pushfalse	0x67	UNDOCUMENTED	0xa7	urshift	0xe7	RESERVED
0x28	pushnan	0x68	initproperty	0xa8	bitand	0xe8	RESERVED
0x29	pop	0x69	RESERVED	0xa9	bitor	0xe9	RESERVED
0x2a	dup	0x6a	deleteproperty	0xaa	bitxor	0xea	RESERVED
0x2b	swap	0x6b	RESERVED	0xab	equals	0xeb	RESERVED
0x2c	pushstring	0x6c	getslot	0xac	strictequals	0xec	RESERVED
0x2d	pushint	0x6d	setslot	0xad	lessthan	0xed	RESERVED
0x2e	pushuint	0x6e	getglobalslot	0xae	lessequals	0xee	RESERVED
0x2f	pushdouble	0x6f	setglobalslot	0xaf	greaterequals	0xef	debug
0x30	pushscope	0x70	convert_s	0xb0	UNDOCUMENTED	0xf0	debugline
0x31	pushnamespace	0x71	esc_xelem	0xb1	instanceof	0xf1	debugfile
0x32	hasnext2	0x72	esc_xattr	0xb2	istype	0xf2	UNDOCUMENTED
0x33	RESERVED	0x73	convert_i	0xb3	istypelate	0xf3	RESERVED
0x34	RESERVED	0x74	convert_u	0xb4	in	0xf4	RESERVED
0x35	UNDOCUMENTED	0x75	convert_d	0xb5	RESERVED	0xf5	RESERVED
0x36	UNDOCUMENTED	0x76	convert_b	0xb6	RESERVED	0xf6	RESERVED
0x37	UNDOCUMENTED	0x77	convert_o	0xb7	RESERVED	0xf7	RESERVED
0x38	UNDOCUMENTED	0x78	checkfilter	0xb8	RESERVED	0xf8	RESERVED
0x39	UNDOCUMENTED	0x79	RESERVED	0xb9	RESERVED	0xf9	RESERVED
0x3a	UNDOCUMENTED	0x7a	RESERVED	0xba	RESERVED	0xfa	RESERVED
0x3b	UNDOCUMENTED	0x7b	RESERVED	0xbb	RESERVED	0xfb	RESERVED
0x3c	UNDOCUMENTED	0x7c	RESERVED	0xbc	RESERVED	0xfc	RESERVED
0x3d	UNDOCUMENTED	0x7d	RESERVED	0xbd	RESERVED	0xfd	RESERVED
0x3e	UNDOCUMENTED	0x7e	RESERVED	0xbe	RESERVED	0xfe	RESERVED
0x3f	RESERVED	0x7f	RESERVED	0xbf	RESERVED	0xff	RESERVED

The loop and the big switch statement parsing DoABC bytecode is near 0x6087e9. Instruction near 0x58f25d also reads bytecode. The documentation certainly needs an update on Adobe's side so developers can add the currently undocumented bytecodes to their decompiler/disassembler.

JIT

After adding new functionalities to my pintool I run it against Flash Player. Here is my observation.

When executing a flash file containing DoAction tag in Flash Player no memory page allocated with or set to *EXECUTE* flag. Thus no dynamically generated code was executed with the most common method. Therefore I think DoAction works with interpreted execution. Meaning every single bytecode run on isolation rather than a set of bytes compiled&run (JIT).

When executing a flash file containing DoABC tag in Flash Player I observed increased usage of VirtualAlloc. The page was allocated with PAGE_READWRITE flag. Later on the execution the page was set to PAGE_EXECUTE_READ and the execution flow was transferred to the page. When the execution was returned to the caller the page was set back to PAGE_READWRITE. I knew this was a part of how JIT works. Change of the memory protection flags is the mitigation for DEP.

0x5205a6 is a VirtualProtect call to change the memory protection flags. When it's called with PAGE_READWRITE it's called via 0x5fc39c. When it's called with PAGE_EXECUTE_READ it's called via 0x5fc2e9.

During my experiment I figured out that instruction at 0x5d20ef calls into the JIT-compiled code. Though this might not be the only address to call JIT-compiled code from. I observed many call backs in the JIT-compiled code. One of the callback might be to give continuous feedback to the caller for example if a long loop is being executed. I observed that constants are encrypted with xor instructions to make memory spraying more difficult. This is not new but first time for me to see. This is how 0x41414141 looks like when it's encrypted.

03af1f67 b83a7c1959 mov eax,59197C3Ah 03af1f6c 357b3d5818 xor eax,18583D7Bh

All offsets in this post are RVAs, that is relative to Flash Player's image base. Offsets are appropriate in Flash Player 12.0.0.38 (flashplayer_12_sa.exe has a size of 10,339,208).

The blog continues at suszter.com/ReversingOnWindows