Showing posts with label Debugging. Show all posts
Showing posts with label Debugging. Show all posts

Friday, September 3, 2010

My Thanks to the Profiler

I recently had an issue at work with one of our kernel mode file system filters for Windows.   The CPU was spending way too much time executing our drivers code.  I ran the kernrate sampling profiler tool from the Windows Driver Kit on our driver to see what code was taking so long to execute.  The profiler revealed that over 50% of the time that our driver was executing in a function named RtlUnicodeStStri.  This is a function that I wrote that is similar to the strstr function in the C standard library with the exception that it is case-insensitive and it works on UNICODE_STRING structures. In the Windows kernel, UNICODE_STRING structures are used in place of character arrays.  The structure is defined like this:

typedef struct _UNICODE_STRING
{
    USHORT Length;
    USHORT MaximumLength;
    PWCH Buffer;
} UNICODE_STRING, *PUNICODE_STRING;   


Windows includes type definitions of USHORT as unsigned short and PWCH as wchar_t*. When using UNICODE_STRING structures you have the length of the string (in bytes) and the maximum length (i.e. the buffer size) of the string (in bytes).  This provides many benefits: you can safely detect if any operation will overrun your string buffer, you can compare lengths before comparing strings, etc…  The strings stored in the Buffer member of the UNICODE_STRING structure are not guaranteed to be NULL terminated and many times they are not.  Thus you can not use the standard C library functions on the Buffer member.
Although the Windows kernel internally uses the UNICODE_STRING, there is not a very comprehensive library for handling UNICODE_STRINGS (The string handling functions are in the Rtl library).  This causes many developers like myself to write functions like RtlUnicodeStrStri.  Below is the original version of RtlUnicodeStrStri that I had written. 
 1 __drv_maxIRQL(APC_LEVEL)
 2 NTSTATUS
 3 RtlUnicodeStrStri(
 4     __in PCUNICODE_STRING  Str,
 5     __in PCUNICODE_STRING  SubStr,
 6     __out_opt PUNICODE_STRING  Result,
 7     __out_opt int* MatchStartIndex
 8     )
 9 {
10     USHORT l1=0, l2=0;
11     USHORT start = 0;
12     
13     //  Translate all counts to character counts.
14     const USHORT StrCharLen = Str->Length / sizeof(WCHAR);
15     const USHORT SubStrCharLen = SubStr->Length / sizeof(WCHAR);
16 
17     ASSERT_VALID_STRING(Str);
18     ASSERT_VALID_STRING(SubStr);
19     
20     while (start < StrCharLen)
21     {
22         for (l1=start,l2=0; l1<StrCharLen && l2<SubStrCharLen; ++l1, ++l2)
23         {
24             if (RtlUpcaseUnicodeChar(Str->Buffer[l1]) != 
25                 RtlUpcaseUnicodeChar(SubStr->Buffer[l2]))
26             {
27                 break;
28             }
29         }
30     }
31 
32     //    other code removed...
33 
34 } // RtlUnicodeStrStri

At first glance, I thought that the function was fairly straightforward.  Then I noticed one particular optimization that should have been made.  In the worst case, when the sub-string is not present, the algorithm continues to look through the entire source string even when the remaining length of the source string is too short to hold the sub-string.  This is an optimization that can not be made in the C version of strstr because you have no idea in advance how long the source string.  The full code is given below.


 1 __drv_maxIRQL(APC_LEVEL)
 2 NTSTATUS
 3 RtlUnicodeStrStri(
 4     __in PCUNICODE_STRING  Str,
 5     __in PCUNICODE_STRING  SubStr,
 6     __out_opt PUNICODE_STRING  Result,
 7     __out_opt int* MatchStartIndex
 8     )
 9 /*++
10 
11 Routine Description:
12 
13     This routine search Str for the first occurrence of the string SubStr
14     in a case insensitive manner.  If Result is provided and the search is
15     successful, it will be filled in with the string starting at the matched 
16     sub-string.
17 
18 Arguments:
19 
20     Str - The string to search.
21     
22     SubStr - The string to search for.
23 
24     Result - Result of the search starting at the sub-string and 
25              continuing through the rest of Str. This structure 
26              justs points to the string in Str.
27 
28     MatchStartIndex - The starting index within Str at which SubStr
29                       appears.                                   
30 
31 Return Value:
32 
33     STATUS_SUCCESS if SubStr was found in Str, 
34     STATUS_OBJECT_NAME_NOT_FOUND otherwise.
35 
36 Unit test:
37     
38     UnitTest4() via RtlWStrStri
39 
40 --*/
41 {
42     USHORT l1=0, l2=0;
43     USHORT start = 0;
44     
45     //  Translate all counts to character counts.
46     const USHORT StrCharLen = Str->Length / sizeof(WCHAR);
47     const USHORT SubStrCharLen = SubStr->Length / sizeof(WCHAR);
48 
49     ASSERT_VALID_STRING(Str);
50     ASSERT_VALID_STRING(SubStr);
51     
52     // Run the loop while the length of the sub-string (SubStr) is less
53     // than the remaining length of the input string being searched (Str). 
54     while (start < StrCharLen && (SubStrCharLen <= (StrCharLen - start)))
55     {
56         for (l1=start,l2=0; l1<StrCharLen && l2<SubStrCharLen; ++l1, ++l2)
57         {
58             if (RtlUpcaseUnicodeChar(Str->Buffer[l1]) != 
59                 RtlUpcaseUnicodeChar(SubStr->Buffer[l2]))
60             {
61                 break;
62             }
63         }
64 
65         if (l2 == SubStrCharLen)
66         {
67             if (ARGUMENT_PRESENT(Result))
68             {
69                 //  Translate start back to byte count.
70                 l1 = start * sizeof(WCHAR);
71 
72                 Result->Length = Str->Length - l1;
73                 Result->MaximumLength = Str->MaximumLength - l1;
74                 Result->Buffer = &Str->Buffer[start];
75             }
76 
77             if (ARGUMENT_PRESENT(MatchStartIndex))
78             {
79                 *MatchStartIndex = start;
80             }
81 
82             return STATUS_SUCCESS;
83         }
84 
85         ++start;
86     }
87 
88     return STATUS_OBJECT_NAME_NOT_FOUND;
89 
90 } // RtlUnicodeStrStri

This optimization dropped our drivers execution time dramatically, to an acceptable level, though RtlUnicodeStrStri was still responsible for about %30 of our drivers execution time.  I tried to optimize the calls to RtlUpcaseUnicodeChar (A Windows API) by doing quick translations for English ASCII codes (including ignoring chars that were already up-cased).  It turns out that RtlUpcaseUnicodeChar is very very good and any optimization attempt I made just made the driver’s execution time worse.

It turns out that our programmer’s (including myself) just got lazy and overused RtlUnicodeStrStri.  I mean, why parse a string and do a compare on the part of interest when you can just call RtlUnicodeStrStri?  So I wrote many more UNICODE_STRING handling functions like RtlStringEndsWithSuffix that can replace many calls to RtlUnicodeStrStri and more clearly capture the programmer’s intention in the code.

Thursday, January 28, 2010

Useful Debugging/Reversing Web Sites and Tools

IDA Pro Disassembler and Debugger:
http://www.hex-rays.com/idapro

Olly Debugger:
http://www.ollydbg.de

Immunity Debugger (based on Olly but with bug-fixes and Python scripting!):
http://www.immunityinc.com/products-immdbg.shtml
http://forum.immunityinc.com/

PaiMei/PyDbg a reverse engineering framework:
http://paimei.googlecode.com

OpenRCE (Open Reverse Code Engineering):
http://www.openrce.org

Dump Analysis:
http://www.dumpanalysis.org

Using Breakpoints to Skip a Function in WinDbg

The following is an example of how to skip a function (even if you don't have the source) in a device driver. Look at the assembly below. This is the function start/end for a 3rd party device driver function that I wish to fail without executing. First I let the stack frame get set up so I must also tear it down. I set the breakpoint at instruction f7ed0511 so that before this line executes, the breakpoint will go off. Then I give a command to the break point. The command 'reax = c0000001' sets the eax register to STATUS_UNSUCCESSFUL. The command 'reip = f7ed0735' sets the instruction pointer to the line 'pop edi' to begin the stack frame teardown. The final command 'g' tells the debugger to go. So each time this breakpoint is executed, the function is skipped with a failure code and the debugger continues execution.  You can even use the .echo command to display a message every time the breakpoint goes off.

1: kd> u fslx+7506
fslx+0x7506:
f7ed0506 8bff             mov edi,edi
f7ed0508 55               push ebp
f7ed0509 8bec            mov ebp,esp
f7ed050b 53               push ebx
f7ed050c 8b5d08        mov ebx,dword ptr [ebp+8]
f7ed050f 56                push esi
f7ed0510 57               push edi
f7ed0511 6846583430 push 30345846h
1: kd> u fslx+7735
fslx+0x7735:
f7ed0735 5f       pop edi
f7ed0736 5e      pop esi
f7ed0737 5b      pop ebx
f7ed0738 5d      pop ebp
f7ed0739 c21800     ret 18h

1: kd> bp f7ed0511 "reax = c0000001; reip = f7ed0735; g"
1: kd> bl
1 e f7ed0511 0001 (0001) fslx+0x7511 "reax = c0000001; reip = f7ed0735; g"

With the .echo command:
1: kd> bp f7ed0511 "reax = c0000001; reip = f7ed0735; .echo \"skipping function...\"; g"