On this page you will learn to redirect the flow of a vulnerable program to make it execute any code that it contains. This tutorial is quite a bit more technical than the last one.
The vulnerable program source code - Download a C compiler such as Dev-C++ if you want to experiment further.
The vulnerable program - Use Start->Run "cmd" and then "cd [folder where this file is]" then type "vuln1" to run.
The attack program source code
The attack program
OllyDbg - This is a debugger which we'll use to gather information to attack the program
OllyDbg is a type of application called a debugger. It allows us to peer inside a program while its running; this is very useful during the development of exploits. We can look at memory and see values and how they are organized. We can even modify the program flow and modify areas of memory. Using a debugger as a final attack on a program, however, is sort of like cheating. Also debuggers cannot be used in many situations (esp. remote exploits) Instead, we'll use it to gather information that will be used to attack the program via user input.
The top left pane on the screen is where we can follow the code as it executes. Simply open an .exe using the File menu and you can start analyzing the code. This window shows not only the assembly respresentation of the code, but also the bytes that are actually executed in memory and their location. You can step through the code one or several lines at a time using the arrow-buttons in the toolbar (F7 and F8 are particularly useful). The EIP (Extended Instruction Pointer) is a special register (a place that the CPU stores often used info) that defines where the current execution code is. Each time the program runs a line of code, the EIP increases so that it point to the location of the next code to be executed. Jumps and calls further modify the EIP and redirect it to different parts of the program. You can step through the code using F7 and F8 and watch the EIP change in the right-hand pane.
The bottom-right pane shows the Stack. You'll remember from the last tutorial that the stack is where temporary variables are stored among other things. We'll explore the stack later, but for now just know where it is in Olly. The debugger is a complex piece of software with a ton of functionality; it can be intimidating, but you'll master it by playing and experimentation.
Here is our target vulnerable application. You can see there are two functions: main() and cantSitHere(). You should also notice that cantSitHere never gets called by main and thus is never executed. In addition to this main() asks us for a useless piece of data called password. Remember the problem with the last program: the gets function. We can type as much data as we want, and gets with faithfully instert it into the password buffer. In this example, though, there is no handy variable to overwrite and redirect the flow. Our goal is to make the program execute the cantSitHere() function and proudly tells us that we "are the winner!".
Before we pwn this program, we need to go over some theory. The stack is not only used to store variables (like password); its also used to keep track of the context. This is a somewhat difficult concept, but hear me out. When Windows runs a program like vuln2, it needs to make sure that the execution goes back into the Windows code when its done running vuln2. Now threads and processes make this statement not exactly true, but it illustrates the point. Another example is the gets function itself. When you call gets (located in stdio.h), it needs to know where to send the EIP back to after gets finishes getting user input. The computer uses the stack to keep track of this. It does so well enough that we can call a function inside a function inside a function and it won't get confused.
When a function is called, a stack frame is generated. This is just some information on the stack that is filled in and organized. The stack is like a deck of cards; data is pushed onto it and poped off it. Another register called ESP (Extended Stack Pointer) holds the location of the top of the stack and changes when things and pushed and popped on it. So when FunctionA calls FunctionB, first a stack frame is pushed onto the stack. The stack frame basically consists of a SFP(Saved Frame Pointer) and a Return Address. The return address is easy to understand: its the location of the next instruction in FunctionA after the call to FunctionB. The SFP holds the location of FunctionA's SFP. This seems strange, but think of it as a long chain of pointers from the most current stack frame back to the stack frame that spawned it. The location of the current Saved Frame Pointer is stored in a register called EBP (Extended Base Pointer).
So lets say that FunctionB has a variable declared inside it: char crap. When FunctionA calls FunctionB, it first pushes the location of the instruction after the call. This becomes FunctionB's return address. Then FunctionA pushes the location of its own Saved Frame Pointer. The stack frame is almost complete when it then jumps to location(address) of FunctionB. Once the EIP has entered FunctionB, it loads the location of the SFP into the Extended Base Pointer(EBP). The hard part is over; take a deep breath. Next it pushes 10 bytes onto the stack; this is how the function makes room for the crap buffer. Now it executed whatever code is inside FunctionB. When it ready to go back to FunctionA, it loads the SFP into EBP and jumps to the address in the return address. As it performs these actions, the stack is popping the data off. The end result of all of this is that the code inside FunctionB executed and control was returned to FunctionA without screwing anything up.
This is a very difficult process to grasp at first. I recommend reading a lot of tutorials and having a look at the images in these paragraphs. Also play with OllyDbg and observe the stack, EIP, and EBP. Once it clicks, you'll be glad you took the time out to learn about it.
In order to take control of how this program executes and ultimately make it run the cantSitHere() function, we'll use a similar technique that we did in the last tutorial. We'll overflow the vulnerable buffer via automated keyboard input and overwrite main()'s return address. First, we'll stuff a bunch of letters into the password buffer until we reach the location of the return address on the stack. When we've reached it, its time to feed it the actual address in memory where cantSitHere() begins. The main() function will execute as normal until it reaches the return 0; statement. At that time, it will pop a Windows frame pointer off the stack (main's SFP) and then jump back to what it thinks is Windows code. By writing over that return address, main() will actually "return" into the cantSitHere() function instead and execute the code that is there. Because we have corruted the delicate stack frame structure, when cantSitHere returns, it will crash the program. But, the good news is that our goal has been accomplished in the process.
Providing Input Programmatically
Typing the hostile input into these vulnerable apps via the keyboard is about to be unrealistic. Its difficult or impossible to type some of the non-printable characters we'll be feeding into these programs. What is needed is a way to make a program feed attack data into the target. This can be accomplished using the piping feature that is, surprisingly, built-in to the Windows command prompt. To use it, simply type the name of the program that outputs the hostile bytes (via printf, etc..) and then the pipe character '|' followed by the name of the target program. Whenever the vulnerable app asks the user for input, it will get it from the output of the attack program instead. In this case, we're calling the attack program "input.exe". We can write this program to output almost any character or byte that we want and it will be fed into vuln2.exe. Here is the final source code for input.exe for reference:
As you can see, its very easy to edit the string of characters that will be fed into the vulnerable program this way. We've already established that our hostile input will a long string of characters('A's) followed by the address of cantSitHere(). We need to find out two pieces of information to make this work: "How many As?" and "What the hell is cantSitHere()'s address?". We will use OllyDbg to find both of these answers.
How Many 'A's?
First we'll fire up OllyDbg and load vuln2.exe. Next press the play button to start the program; it should automagically break (stop execution so we can play with it) at the beginning of the actual program code. When i does so, the screen should look something like the image on the right. Next scroll down a bit until you see the call to gets (like in the image to the right). We want to make the program call gets, return to main(), and then break. To do this, we can place a breakpoint on the call to gets. Simply double-click the little dot to the left of the bytecode for that instruction. If you set the breakpoint correctly, it will highlight the address in red. This will actuall make the program stop executing right before it calls the gets function. Once you've set the breakpoint, hit the play button again and watch it stop there. Now press the Step Over button (keyboard shortcut F8). This will call the function and then break again right after it returns to main().
So you pressed it and its just sitting there; whats going on?! The program is waiting for user input from the keyboard. Find the app's window in the taskbar and open it up. You can see the text cursor blinking away waiting for you to attack it. Type something easy to spot in memory; I use "AAAAAAAA" and hit enter. Go back to Olly and you'll see that it has stopped at the instruction after the call. Now hit F7 or F8 two more times. The current instruction (EIP) should be on an instruction called RET but has not executed it yet. This will make the program pop main()'s return address off the stack and jump to it (back to Windows normally). Don't hit it yet; lets check out the stack.
Scroll the stack pane up a little so we can see what comes before the return address. The return address is highlighted because it is at the top of the stack waiting to be popped off by the RET instruction. Find your recognizable user input ("AAAAAAAA" in my case). Look at the address where it starts (the one higher in the stack pane). The addresses are in the column on the left. As you can see, they are written in hexadecimal and are 8 digits (4 bytes) long each. Write down the address where the 'A's begin; this is the address of our password buffer. Next look back down at the highlighted return address and write down its location on the stack. Now get out a calculator or use the program in the Applications folder of the Start menu and subtract the return address's location from the buffer's location. This tells us the number of 'A's to fill the buffer with before tacking our desired return address on the end. Give it a think and you'll find that we'll be needing 28 'A's.
What is cantSitHere()'s Address?
This one is quite a bit easier than finding out the number of 'A's. Scroll back up in the code window of Olly until you see the string "You are the winner!". This and the brackets that Olly helpfully adds on the left side when it guesses what bits of code are functions should tell you that the address of cantSitHere() is 0x00401290. Armed with this tidbit of info, we are almost ready to redirect the flow of this program to our will.
We can't just plug that address into our little attack program as \x00\x40\x12\x90. Check out the stack and look at an address; the correct Windows return address is a good example. The bytes not in reverse order like our keyboard input is. Addresses on the stack are stored in Big Endian order instead. This is no problem, we'll just reverse the order of our address bytes and tack \x90\x12\x40\x00 on the end of our string of 'A's. Here is where we'll go over one last detail: null bytes. I haven't covered it yet and have ignored it so far. A byte with the value of \x00 signals the end of a buffer. That is, once any function that plays with character buffers (also called strings) sees a byte with the value zero, it stops. Similarly, when we input our attack string via the keyboard or the automated program with a pipe, gets automatically adds a \x00 to the end of our string. Now check out the last byte that our counterfeit return address needs to be: \x00. This means that we can shorten our tack-on bytes to \x90\x12\x40 and getc will shove the correct last byte on the end for us; how thoughtful.
Now that we have the correct amount of padding and the address of our target function, we can redirect the application and get it to run cantSitHere(). Refer to the input.c code for our attack program above and simply edit the string that it prints out to reflect this information. Tell it to print 28 'A's followed by the last 3 bytes of the target address in reverse order(0x90\x12\x40). Compile it and head over to the directory with input.exe and vuln2.exe. Tell Windows to pipe the output of input.exe into the input of vuln2.exe using this command: input.exe | vuln2.exe. In the words of Dan Kaminsky: "And then you win."