December 21, 2014

Use of refactor can make Visual C# 2013 to crash

For the change I'm working on a C# project not much to do with security. I was happy to re-discover and use the refactor feature in Visual C# 2013 until the point it crashed. Here are the simplified steps to reproduce the crash.
  • Create an empty Visual C# project, or open any C# project.
  • Add a class like below.
  • Do right-click on hello in hello = true, select Refactor, then select Remove Parameters...

  • Visual C# crashes.
Note, I was able to trigger the crash by an alternate way by simply using the hotkey Ctrl+R, Ctrl+V when the cursor is within hello in hello = true.

The bug reproduces with the up-to-date version of Visual C# 2013 Community Edition.

December 19, 2014

Variable-length permutation with repetition using backtracking

Recently, I needed the implementation of the search algorithm that qualifies for the followings.
  1. It must be able to produce the variable-length permutation with repetition of the given set.
  2. It must be based on backtracking.
  3. It must be a state machine.
  4. It must have basic programming elements only.
  5. It must be clean without irrelevant part.
Even I needed the final implementation in ActionScript, I was searching for the algorithm in C that I would convert later. Thought my odds would be better if I did so. Unluckily I didn't find anything useful, and so I wrote the code myself. Here is the sample implementation in C.

// Features:
// - Variable-length permutation with repetition
// - Backtracking
// - State machine

#include <stdio.h>

#define ITEMS            3                       // number of items
#define LASTITEM_IDX     (ITEMS-1)               // index of the last item
#define INIT             -1                      // initial value
#define EMPTY            0                       // empty

static char set[ITEMS]  = {1,2,3};               // set

// State
static char idx[ITEMS]  = {INIT,INIT,INIT};      // indices into set
static char out[ITEMS]  = {EMPTY,EMPTY,EMPTY};   // output

static int index = INIT;
static bool dir = 1;                             // direction: deep(1) / wide(0)
// State - END

bool backtrack();

// item to try
void add_item() {
    idx[index]++;
    out[index] = set[idx[index]];
}

bool step_wide() {
    if (index == INIT) {
        // reached initial level
        // no more items to take
        return false;
    }
    // way to go wider?
    if (idx[index] < LASTITEM_IDX) {
        add_item();          // take item
        return true;
    }
    else {
        return backtrack();  // backtrack a level higher
    }
}

bool backtrack() {
    // backtrack a level higher
    idx[index] = INIT;
    out[index] = EMPTY;
    index--;

    return step_wide();
}

bool step_deep() {
    // way to go deeper?
    if (index < LASTITEM_IDX) {
        index++;             // step a level deeper
        add_item();          // take item
        return true;
    }
    else {
        return step_wide();  // step wide
    }
}

// returns true if step is successful
// returns false if no more items to take
bool step() {
    if (dir) {
        return step_deep();
    } else {
        return step_wide();
    }
}

int main(int argc, char* argv[])
{
    while (step()) {
        for (int i=0; i<ITEMS && out[i]!=EMPTY; i++) {
            printf("%02x ", out[i]);
        }
        printf("\n");
    }

    return 0;
}

The source can be downloaded from here.

The output of the run looks like this.

01
01 01
01 01 01
01 01 02
01 01 03
01 02
01 02 01
01 02 02
01 02 03
01 03
01 03 01
01 03 02
01 03 03
02
02 01
02 01 01
02 01 02
02 01 03
02 02
02 02 01
02 02 02
02 02 03
02 03
02 03 01
02 03 02
02 03 03
03
03 01
03 01 01
03 01 02
03 01 03
03 02
03 02 01
03 02 02
03 02 03
03 03
03 03 01
03 03 02
03 03 03

August 9, 2014

Instrumenting Flash Player to Inspect JITted Pages for Integer Errors

In this blog post I'm writing about the method I experiment with to discover potential areas, that may or may not be prone to integer errors, in Flash Player.

I have 26k flash files that are used as a corpus to generate the test samples for Flash Player. The test samples have an element of 0x41424241 in the integer pool. I have a total of 344k files generated to test Flash Player with.

During the test, I use a pintool to instrument the JITted code. The pintool is based on this implementation. Since the elements in the integer pool being dereferenced by action script it makes sense to restrict the instrumentation to JITted code.

I use instruction-level instrumentation that allows to check the register values at every single instruction being executed. If any of the general registers have the value of 0x41424241 and the instruction is referencing to that register, the instruction information along with general registers are logged.

The pintool pre-allocates the address of 0x41424241 so Flash Player won't use that memory address and so reducing the irrelevant lines in the log.

There is no need to wait for the test to finish to get partial results. A log file is generated for each test file rendered by Flash Player.

The size of log files are vary. Some are close to size 0 specially if the value above makes the code to fail early. There are many log files with size 4k. When the execution keeps going long the size is about 16k. If the value has to do something with a loop the log can reach 100k but that's rare.

Logs can be grouped and many can be thrown away as they don't contain instructions associated with vulnerabilities.

What I look for is like signed shift, addition, subtraction, or multiplication instructions. If the value is used in displacement with lea instruction that counts suspicious too.

Once an address in the log is chosen for closer inspection, I reproduce the log on isolation with an option to dump the JITted pages so I can manually analyze the surrounding area of that address in disassembler. Knowing the state information it's also possible to debug the code.

If results positive, certain level of automation can be added.

July 30, 2014

Practical Suggestions for Writing a Pintool

This is my list of practical suggestions to people developing a pintool. Since I dealt with these previously I thought to jot them down to help others. By applying this you should be somewhat closer to avoid your pintool from unexpected termination.

Start from scratch. So you use a sample pintool to develop your own. Rather than to modify the sample, start with an empty project and gradually build it up by taking elements from the sample.

Simplicity. Keep the code-base small and easy to understand.

Testing. As a part of development, aim to test if all blocks have been exercised. Refrain from adding unreachable blocks.

Errors. Check for errors as early as possible, specially when returning from a Pin API.

Safe memory dereference. Whenever you have to dereference the target's memory use PIN_SafeCopy. If you want to read an integer you should use this function, too, rather than the dereference "*" operator.

Thread safety. Be aware the target may be running with multiple threads. Possibly, you want your pintool to be thread safe.

Multi-threading. Sometimes you want your variables to be stored in the thread context to have the ability to distinguish the analysis between threads. In that case looking at the sample inscount_tls.cpp is a good start.

Probe mode. Use of probe mode is always preferred as it gives better performance. However, only limited Pin facilities available in probe mode.

Limit instrumentation. Consider restricting the instrumentation to routines or libraries and even can avoid the instrumentation of shared libraries to get better performance.

Standard library. It's good idea to use C++ standard library in a pintool as it provides the most frequently needed data structures.

Visual Studio. Visual C++ project file is available with Pin framework in MyPinTool folder. Alternatively, you can create one for yourself after looking at an earlier post.

Trace vs Ins. Instruction instrumentation is practically the same as trace instrumentation. You can do instruction analysis from the trace by iterating through the instructions.

Output. Having output routines in Fini makes the application to run faster than having them in analysis functions. However if the application terminates unexpectedly and so Fini is not called there will be no results shown. By having output routines in analysis functions makes the application to run slower but if the application terminates unexpectedly partial results may be shown.

July 26, 2014

Inspection of SAR Instructions

SAR stands for Shift Arithmetic Right and the instruction performs arithmetic shift. The instruction preserves the sign of the value to be shifted and so the vacant bits are filled according to the sign-bit.

Compilers generate SAR instruction when right shift operator ">>" is used on a signed integer.

The use of SAR instruction can potentially lead to create a signedness bug if it's assumed the shift is unsigned.

Given the following simplified example.

char retItem(char* arr, int value)
{
    return arr[value>>24];
}

If value is positive the code is working as expected. However if value is negative the program can read out of the bounds of arr.

Other example would be to compare the signed value after the shift to an unsigned value leading to implicit conversion that may lead to trigger bug.

In my experiment, in several cases, it is seen that memory is being dereferenced involving SAR instruction. These places may be worthy to look for bugs, specially if the value to be shifted is a user input or is a controlled one.

If an unsigned jump is followed by a signed shift that could be a potential to look for bugs as well.

Regular expressions or scripts can be used to search for patterns of occurrences of SAR instructions. When it's not feasible to review all occurrences of SAR, a pintool may be used to highlight what SAR instructions have been executed, and only focus on those executed.

July 22, 2014

Examining Native Code by Looking for Patterns

Earlier this year a post was published of examining data format without using the program that reads the format. That post discusses patterns to look for, in order to identify certain constructs. This post focuses on static methods of examining code that can be either the complete code section of the file, memory dump, or just fragment. It also describes selected ideas what patterns to look for when examining a given code.

The reason one may look for patterns in code is to locate certain functionalities or to get high-level understanding of what the code does. Others may look for certain construct that may be the key part of the program in security point of view.

It's true to say one can expect this to be a rapid method compared to other methods such as line-by-line instruction analysis.

But, it's always good to read documentation, if possible at all, to get an overview of the expectations.

There are methods that more effective if performed on small region. Therefore to narrow the scope of the search wherein to look for pattern is something good to do at the beginning of analysis albeit it's not always feasible to do with enough certainty. Anyway, one can always widen the search region if required at a later stage.

Compilers tend to produce executable files with particular layout. Some have the library code at the beginning of the code section, while others have it at the end of the code section.

If there is no information about the compiler or no information about the layout there are other ways to locate the library in the code.

You may look for library function calls that can be visible in disassembler. Library code may have distinct color in disassembler.

Library/runtime code often have many implementations of functions to use the advantage of latest hardware. An example is MSVC. And so SSE instructions/functions may indicate the presence of library/runtime code.

Library code can be spotted by looking for strings can be associated with particular libraries.

Library/runtime code can be spotted by looking for constant values that can be associated with particular libraries such as cryptographic libraries that tend to have many constants.

To guess the compiler that was used to generate the code is possible by analyzing the library/runtime code.

In case the code is just a fragment of user code you may consider examining the instructions how they are encoded. Intel encodings are redundant and one instruction can have multiple encodings. This is something to make guess on what compiler was used.

If multiple encodings of an instruction is found in a binary the code that could be generated with a polymorphic encoder.

Also, code has other characteristics that may differ between compilers such as padding and stack allocation.

Imports and exports as well as strings can tell a lot. You may check where they are referenced in the code.

Debugging symbols can help awfully lot if the disassembler can handle that. Sometimes it's available sometimes it's not.

No matter what code you're looking at it most likely deals with input data. That case it may get the data from file, from network, via standard API calls. These are valuable areas to audit for security problems, and it's possible to follow how the data returned by these APIs. It may require to analyze caller functions as usually these APIs are wrapped around many calls before using the input.

Just like when reading the data the code may write data, or send data via standard API calls. These areas may be security-sensitive.

Programs have centralized, well-established functions. These functions, for example, read dword values, read data into structures and propagate any other internal storage. Discovery of these functions not considered hard, they are normally small, and have instructions of memory read and write. By looking where they referenced from we can find good attack surfaces.

Good to keep in mind that code sections can contain data besides code. But normally data is stored in data section. In the disassembler it's convenient to see how the data is referenced, and may decide if there is an attack surface nearby.

CRC and hash constants may indicate there is some data which is being CRC'd or hashed. You may figure out where is that data from and how can you perform security testing around.

When a library is using a parameter hardcoded it's often encoded as a part of the instruction rather than stored in data section of the executable. Example encoding looks like mov eax, <param> or mov al, <param>.

When a data format is parsed often a magic value is tested. Looking for instructions like cmp reg, <magic> or cmp dword ptr [addr], <magic> or similar instructions can help to locate attack surfaces.

Longer strings may be broken into immediate values and compared with multiple cmp instructions.

Looking for strcmp function calls is good idea to look for if you want to find code that test for data format as often strcmp functions are used for this purpose.

If the code is optimized for speed there are many ways to confirm. Normally the readability of code bad, for example when the code performs division or use the same memory address for multiple variables. If EBP register is used in arithmetic or other than to store stack base address that could indicate the code is optimized.

Perhaps there are circumstances when looking at the frequency of instructions, looking for undocumented instructions, or rare instruction, or instructions that not present can give us valuable clues that help the examination.

Intuitively going through the code and looking for undefined patterns can be good idea if the scientific ways have been exhausted.
  This blog is written and maintained by Attila Suszter. Read in Feed Reader. Advert is experimentally shown.