Can Registers Hold Binary, Decimal Or Hexadecimal Values

1 Basic Associates
- 1.1 Preface
  - one.1.1 From assembly to car lawmaking
- 1.2 Binary & Hexadecimal
  - 1.2.one Counting
    - 1.2.1.1 Introduction to Binary
  - 1.2.two Bitwise Math
    - i.2.2.1 Basic Addition
    - one.two.2.2 Binary to Hexadecimal
    - 1.ii.2.iii NOT, AND, OR and XOR
      - 1.2.2.3.1 Non
      - i.2.2.3.2 AND
      - 1.2.ii.3.3 OR
      - 1.two.2.3.iv XOR
  - ane.two.3 Analyzing Code
- 1.3 Instructions & Concepts
  - one.3.1 Registers
  - 1.3.ii Special Registers
    - 1.iii.2.one ESP
    - 1.iii.2.two EIP
    - 1.three.2.3 EBP
    - 1.3.2.4 ESI
  - 1.3.3 Instructions
    - 1.3.3.1 Basic Arithmetic
    - 1.three.iii.two Bitwise Operations
    - 1.three.3.3 Command Period & Stack Instructions
  - ane.iii.4 Kernel Interrupts
  - ane.3.5 A simple exit program in AT&T Syntax
- ane.4 The Stack
  - ane.four.1 Overflows
    - one.iv.1.1 Push & Pop
  - i.4.two The C Calling Convention
    - 1.four.2.ane Linking shared libraries into a plan

Basic Associates

NOTE: Heavy editing is going on with this page at the moment. Don't exist surprised if some content is missing, or seems to be raw/unedited. Just bank check back in a fleck, we're adding a lot of content.

-m4

Preface

Understanding of assembly and stack growth are important for the machine coder. In assembly, there are a limited amount of temporary variables, also called registers, with only the stack or heap to contain boosted awarding data. These registers are of a fixed size and fulfil various functions; the general purpose registers work every bit small variables for use by the auto coder, whereas special registers such equally the instruction arrow fulfil specific functions.

There is a limited number of instructions that tin can exist issued to a processor. While assembly instructions can exist compared to the functions of higher-level languages, it is important to go along in listen that they are processor-specific operations that are performed, often peforming alterations on the registers mentioned in the previous paragraph. Information technology is possible to write cross-operating-system shellcoded applications, also known as flat binaries, or apartment binary applications.

Because of the limitations of nigh shellcoding and overflow tutorials, the lawmaking examples and notations in this commodity will be given in both AT&T and Intel Syntax. Intel Syntax and AT&T Organisation 5 syntax are syntax standards for coding assembly. The 2 syntaxes are both translated into the aforementioned car code, however they are translated from machine lawmaking to human-readable code differently. On Windows XP SP2 systems, nosotros will be demonstrating our lawmaking in Intel 32 Scrap Standard Syntax. On Linux and UNIX systems, we will exist demonstrating our code in AT&T System V syntax, of course for the intel 32 fleck processor.

Again, in that location is no functional divergence between the two syntaxes. They are simply methods of displaying assembly (equally much as assembly itself is a method of displaying machine code); opinions differ on which of the two is more readable.

On the low level, a motorcar is really very elementary. A processor is able to sympathize accented hexadecimal values, so perform mathematical operations(add, decrease, multiply, divide) upon them as information technology is asked to. Other than that, the functionality of a processor is mostly limited to memory operations and execution procedures, such equally the management of functions and stack growth.

The C programming language is more than or less associates that is arranged so that it is more hands written by humans, including the C libraries that provide a framework with which to work.

From associates to motorcar lawmaking

Assembly is a user friendly method of writing car code. At that place are two stages to the conversion of the lawmaking into an executable plan:

1. An assembler is used to translate the assembly instructions into machine code equivalent(opcode). The auto code includes numerical representations of mathematical operations including binary, hexadecimal, octal and decimal formats.

2. Linking occurs which combines various object files into a unmarried executable, and as well creates references to dynamically linked libraries.

In linux platforms, associates is done using GNU every bit, or the `as command':

          [email protected] ~ $ as test.southward -o exam.o

The file generated (test.o) is chosen an object file. The object file has an ELF format header and needs to be linked with the kernel in lodge to become a valid executable:

          [email protected] ~ $ ld test.o -o test

Binary & Hexadecimal

Although this section is non straight related to the associates coding itself, bitwise math and binary/hexadecimal operations are unavoidable when coding in associates. Information technology is vital to understand the concepts that drive the processor.

Counting

Introduction to Binary

Binary math and hex math have slight differences to decimal math only the same principles apply. For example, in the binary number 1111, has a '1' in the "eights" placeholder, '1' in the "fours" placeholder, a '1' in the "twos" placeholder and '1' in the "ones" placeholder. Add together these together and you become xv in decimal numbers.

Detailed Instance
	Eights (1x2^3)	Fours (1x2^ii)	Twos (1x2^1)	Ones (1x2^0)
Binary Values	1	i	1	1
Decimal Values	8	4	2	1

A nybble consists of 4 bits, a one-half-byte, or one hexadecimal digit - all 3 of these are the same thing. Hex works in conjunction with binary based on this fact. Hexadecimal is base xvi, using the characters 0-9 and a-f, each nybble is able to stand for values 0-15. Binary works by base two, using merely the values 0 or 1, corresponding to the on or off state. Binary is the same as our own numerary organization, the key deviation being that there are only 2 'numbers' (0-1 instead of 0-9) that a place value can hold.

Hex Nybble	Binary	Decimal
0	0000	0
ane	0001	1
2	0010	2
3	0011	iii
4	0100	4
v	0101	5
6	0110	6
7	0111	7
eight	1000	8
9	1001	9
a	1010	10
b	1011	eleven
c	1100	12
d	1101	thirteen
east	1110	14
f	1111	15

Bitwise Math

Basic Add-on

Much like the decimal arrangement, binary numbers can be added together. For this example, we are going to apply the binary numbers 0110 and 0010 and add them.

	Eights	Fours	Twos	Ones	Total
Binary	0	i	i	0	half dozen
Decimal	0	iv	2	0	half-dozen

	Eights	Fours	Twos	Ones	Total
Binary	0	0	1	0	two
Decimal	0	0	2	0	2

Thus by using decimal improver(4 + 2 + two = 8), nosotros tin can determine the value of the 2 binary numbers together is 8.

Binary to Hexadecimal

For the by exercises, we have been working with 4 bits at once (4 values ranging from 0-1, e.g. 0001). This is a nybble in hexadecimal. A byte is made of ii nybbles as eight bits make a byte. In hexadecimal, we have a 1's placeholder and a 16'southward placeholder. Hexadecimal is 0 through nine and A through F. A nybble tin can agree 16 unique values but the highest value is 15 because one of the values is 0. A nybble is a unmarried hex digit. And so, A = 10, B = 11, so on and so forth, F = 15.

And so in hex, say nosotros've got AF as a byte, AF = 175 in decimal because A is in the 16's placeholder, A = 10, 10*sixteen=160, plus F which is in the one'south placeholder, 15*1=fifteen. Therefore 160(A)+15(F) = 175.

Not, AND, OR and XOR

NOT

Not is an educational activity that takes just I operand. The all-time fashion to describe it is that information technology inverts the binary value.

Example:

          A = 1010 in binary or x in decimal. Not A results in the inversion of 1010 which is 0101. Therefore NOT A = 5.

AND

Our side by side instruction is AND, which returns "Truthful" per bits that are the same and takes two operands. True is 1 and simulated is 0. It operates bit by fleck, but similar Not. AND compares each bit and if both bits per placeholder are truthful, then information technology returns a true for that placeholder, all else gets turned into 0.

Case:

          -----------  6 = 0110 so...> 0|1|1|0 < - The second and third $.25 are truthful.                -----------  v = 0101 so...> 0|one|0|i < - The second and fourth bits are true.                -----------  0100          > 0|1|0|0 < - The second bit is the simply truthful values in both 6 and five.                -----------

OR

OR will return true for each placeholder if Whatever of the bits are truthful.

Case: 5 OR C

          (0101 = v) OR (1100 = C) = 1101 which equals D.                -----------  5 = 0101 and so...> 0|ane|0|1 < - The second and fourth bits are true.                -----------  C = 1100 then...> 1|ane|0|0 < - The commencement and second bits are truthful.                -----------  D = 1101      > 1|one|0|ane < - The beginning, second and fourth bits are true in at least one instance.                -----------

Table of OR:

	0	1	2	3	four	5	6	seven	8	nine	A	B	C	D	East	F
0	0	1	ii	3	iv	v	6	7	8	nine	A	B	C	D	Eastward	F
1	1	ane	3	3	five	5	vii	vii	9	9	B	B	D	D	F	F
two	2	3	2	3	half dozen	vii	6	seven	A	B	A	B	Due east	F	E	F
3	3	iii	3	iii	7	7	7	7	B	B	B	B	F	F	F	F
four	4	v	6	seven	4	5	6	seven	C	D	Due east	F	C	D	E	F
five	5	5	vii	7	5	5	7	7	D	D	F	F	D	D	F	F
6	6	7	6	7	half-dozen	7	6	7	E	F	E	F	E	F	E	F
7	7	7	vii	vii	7	7	7	7	F	F	F	F	F	F	F	F
8	8	9	A	B	C	D	E	F	8	9	A	B	C	D	East	F
9	9	9	B	B	D	D	F	F	9	ix	B	B	D	D	F	F
A	A	B	A	B	E	F	E	F	A	B	A	B	East	F	E	F
B	B	B	B	B	F	F	F	F	B	B	B	B	F	F	F	F
C	C	D	E	F	C	D	E	F	C	D	East	F	C	D	E	F
D	D	D	F	F	D	D	F	F	D	D	F	F	D	D	F	F
E	Eastward	F	E	F	Eastward	F	E	F	East	F	E	F	Due east	F	Due east	F
F	F	F	F	F	F	F	F	F	F	F	F	F	F	F	F	F

XOR

Xor is kind of a strange command, if the placeholder bits are unlike, information technology returns a true scrap. If they are the same, it returns a false bit. Thus, anything XOR'd with itself, results in 0.

General Examples:

          1 xor 1 = 0
          0 xor 0 = 0
          1 xor 0 = 1

Detailed Example: A xor F = 5

          -----------  A = 1010 then...> 1|0|1|0 < - The offset and third bits are truthful.                -----------  F = 1111 so...> 1|ane|1|one < - The first, 2nd and 3rd bits are true.                -----------  0101 = 5      > 0|one|0|one < - The second and 4th bits are true in ONLY one instance as opposed                              to two found in $.25 1 and iii.                -----------

The ane'south placeholders are the same so they return 0, aforementioned with the 8's placeholder. The 4's and 2's are unlike, therefore render true. XOR is really WHICH BITS ARE Not THE Aforementioned.

Analyzing Code

Detect: to be replaced

Time to get to expanded view and show you WHY the addresses work the way they do, notice that the machine lawmaking in the center contains a certain corporeality of bytes. Now we'll take a expect at the expanded view of the execution stack or what is refered to as the .text segment. Instead of viewing it by i line of assembly instructions at a time, we'll view it one Byte at a time.

Find: to be replaced

Memory Address	Machine Code Byte
0x101c879d	\xb8
0x101c879e	\x01
0x101c879f	\x00
0x101c87a0	\x00
0x101c87a1	\x00
0x101c87a2	\x83
0x101c87a3	\xc0
0x101c87a4	\x03

Now if we look at the expanded view, we tin tell how 0x101c87a2 is the beginning of the next line of lawmaking, and even get a feel for certain raw instructions. How can we practice this? Well, nosotros can meet the code for each educational activity:

So lets rip this apart a little bit. We know that the eax register can contain a DWORD value, so lets extrapolate on the mov eax, 1 instruction. Considering eax can agree a DWORD value, we can safely presume that the machine code instruction for moving or setting a value to the eax register must exist washed in proper DWORD format, that is, information technology must take 4 bytes. So we discover that the value \x01 is followed by \x00\x00\x00. These are three "null bytes". Seeing these zippo bytes and knowing the way that associates operates, we can safely say that the value of eax is assigned backwards in DWORD format. Following all of these logical trains of thought, we can describe the following conclusions:

The motorcar lawmaking pedagogy "\xb8" is equivilent to mov eax, [dword].

All of these instructions are synonymous. At present lets analyze the add eax, three instruction:

<syntaxhighlight lang="asm"> add together eax, three \x83\xc0 \x03

add eax, \x03 \x83\xc0, 3 </syntaxhighlight>

Can too exist seen to be all synonymous. Therefore we can describe the determination that the motorcar lawmaking instructions \x83\xc0 is equivilent to the assembly instructions add eax, [byte].

Instructions & Concepts

Registers

So, now we have a good framework of the concepts of assembly language, merely how can we see them exist utilized to actually practice something? Well, we'll kickoff with explaining a scrap nearly registers. Full general purpose registers on an intel 32 fleck processor are 4 bytes in length, also referred to as a "double word" or "DWORD". A Byte consists of viii bits. It takes four $.25 to compose a hexadecimal digit, or a nybble.

In that location are other forms of "words" in assembly - these are naming conventions that refer to "chunks" of data. The computer can handle all this data running into each other, but we use these naming conventions to avoid confusion - for example, attempting to copy 4 bytes of information into a 2-byte register will end in disaster.

A DWORD is, equally previously noted, 4 bytes in length. All the registers on an intel 32 bit processor are DWORDS. At that place are besides WORDS, which are a meager two bytes in length. There are also bytes and nibbles, which are composed of $.25, the 'indivisble unit' of a computer.

Notation: Some naming conventions choose to call a 4-bytes chunk a WORD and a 2-byte clamper a HALFWORD. It doesn't thing which convention you use, so long as you are consistent.

Although a annals is 4 bytes in length, it is not always necessary to apply the entirety of the register. For operations requiring an increment smaller than 4 bytes, each register is separate into subregisters. Take %eax, for example. The least significant 2 bytes of %eax form their own subregister, which is referred to as %ax. These are further divided into the 1 byte subregisters, %ah and %al. %ah is the most meaning byte of %ax, and %al is the to the lowest degree significant byte.

This syntax persists with other registers - for example, instead of %ax, %ah and %al, the register %ebx has %bx, %bh and %bl. Special purpose registers also have their subregisters. %ebp, %esp, %esi and %edi are split into %bp, %sp, %si and %di. All the same, the special purpose registers are not subdivided by their WORD-sized subregisters.

The Word sized subregisters are sometimes known as 16-flake subregisters, while the ane-byte subregisters are sometimes known equally eight-bit subregisters. Likewise, the normal DWORD registers such as %eax are known as 32-bit registers, hence the name "32-bit processor".

There is another class of annals beyond the 32-bit registers that are only found on 64-bit processors. These registers tin hold a unmarried QWORD, which is equal to 8 bytes. These are prefixed with the letter 'r' - for example, %eax is the least significant 4 bytes of the larger register %rax, which is viii bytes. Also, there is %rbx, %rcx, %rdx, %rbp etc...

Subregisters can be used non just for size optimisation, but too to perform operations on the annals as a whole. If you wished to zero out only the to the lowest degree significant byte of %eax while leaving the rest untouched, you lot could simply do

A register tin be considered to exist a fundamental variable; they can store variable data of a stock-still size (up to 4 bytes in intel 32-bit processors) and are limited in number. You lot cannot create new registers; in that location is a fix of registers hardwired into the processor, and you must work with these. Some of the registers tin can merely be used as variables, whereas others have specific role - similar the instruction pointer, which holds the memory address of the next instruction to be executed. When an education is executed, ane of the available gates (which determine the possible instructions that a processor can conduct out) is used. The function of the gates is simply to receive a value, perform an operation on it and return it.

Because of the shortage of bachelor registers, some registers - particularly the general purpose ones - will serve many functions. For case, eax can serve as a general variable. When making a kernel interrupt however, the value stored in eax is taken as the argument for the system call to be executed. Likewise, the values in ebx, ecx and edx are used to represent arguments to the syscall fabricated.

The general-purpose registers include eax, ecx, ebx, edi, and edx. Other, non general purpose registers are esi, esp, ebp, and eip.

Special Registers

These registers are not intended to exist used for general data processing, but have some specific part that they are used by the processor for.

ESP

Stack Pointer - This register remembers the electric current index of the stack. At the beginning of the small code with the pushes, esp is 0x100010ff, and then it becomes 0x100010f8, f0, so on and then forth as DWORD values are pushed onto the stack. The programme uses this to go along rail of where it is pushing and popping information from inside of the stack.

Whenever a new value is added to the stack or removed from it, the value stored in esp is changed. For example, if the top of the stack is located at address 0xbffffb1c, so the value held in esp will be 0xbffffb1c. However, if I push button a iv-byte value onto the stack, and then 4 will exist subtracted from the value in esp (recall, the stack grows downward and so by subtracting we are moving the top of the stack upward) so that it now points 4-bytes higher in the stack that it did before, to the new height of the stack.

EIP

Didactics Pointer - This annals represents the location of the bodily code executing. For example, when the educational activity 'add eax,1' is to exist executed, the memory address in eip volition be the address where the opcode \x83 (which represents the teaching) is located in our program. EIP can be unsafe if non secured properly, equally beingness in control of eip will allow you to execute whatever memory it points to as if information technology was executable code - in buffer overflows, this is exploited to overwrite eip with the address of injected lawmaking in the buffer, thus executing arbitrary code.

EBP

Base Pointer - This points to the very bottom of the stack, however it can be used to hold addresses of other bases. It is common practice when using the C Calling Convention to push button the value of ebp onto the stack, and so utilise ebp to shop the current value of esp so that when esp changes, ebp can still be used equally a pointer to the top of the stack at the point that ebp was stored.

ESI

Office call annals - when a part is called using the call instruction, the location of the get-go of the part is placed into esi.

Needless to say, our goal when attempting a buffer overflow is going to be manipulating the value of the eip annals to make the calculator execute our code as opposed to the original code. At that place are several ways to do this, merely first we need to look at a few of our "special instructions" in assembly:

Instructions

Then lets test out a few small instructions to go the hang of this. In intel syntax, to put the value of ane into eax, nosotros will say

In AT&T Organisation V syntax, we would say

Intel syntax uses the format:

          [instruction] [destination],[source]

Whereas AT&T System V syntax uses the syntax format:

          [teaching] [source],[destination]

At present in our small amount of code, eax = ane. If nosotros want to add together 3 to eax, we would say:

Intel:

AT&T:

<syntaxhighlight lang="asm">addl $three, %eax</syntaxhighlight>

This is the simple explanation of some instructions. A total list of instructions and a table of their purpose is equally follows :

Basic Arithmetics

General Purpose Instructions
Didactics	Purpose
ADD	Adds value to register
SUB	Subtracts value from annals
IMUL	Multiplies value into register
IDIV	Divides value into annals
MOV	Places value into register
CMP	Compare register with value
INC	Increment specified annals
DEC	Decrement specified register

Bitwise Operations

xor
or
and
non
test
shl
shr
ror
rol

Control Flow & Stack Instructions

Special Instructions
Instruction	Description
Phone call	Calls a function - location of the office that is being called is placed in the esi register, while the location of the instruction immediately after the call is pushed to a stack for return purposes.
JMP	Jumps to a different segment of code - location is defined by byte start, merely can also exist defined with a DWORD memory address.
RET	Returns from office - jumps back to the value that call pushed onto the stack thus returning to "normal" execution catamenia. This is what is being exploited during a buffer overflow attack.
PUSH	Push something onto the stack
Popular	Popular the value off the stack and place it in a register
NOP	No operation occurs, equivilent of saying do nil.

Special Jumps
Pedagogy	Description
JNE	Jump if not equal
JE	Jump if equal
JGE	Spring if greater than or equal to >=
JLE	Jump if less than or equal to >=
JL	Spring if less than
JG	Bound if greater than
JZ	Jump if ZFLAG set up
JNZ	Bound if ZFLAG not set
JP	Jump with parity
JNP	Jump if no parity

Kernel Interrupts

The kernel interrupt is a special didactics that does exactly what its name says - interrupts the kernel. The kernel of an Bone can be seen equally an countless loop that exists in a standby state, waiting for programs to interrupt information technology with a asking for it to execute. It can exist likened to a while loop in a unproblematic program that endlessly waits for connections and accepts them.

When nosotros perform a kernel interrupt, we must provide arguments; by making a kernel interrupt and providing the kernel with an argument, we can make what is called a organization call, or merely a 'syscall'. The most important argument to the kernel interrupt is the one in which we decide which arrangement call it is that nosotros're going to call upon. Every Operating Organization has a broad range of system calls (that are rarely consistent from OS to Os) and each is referenced by a number. For example, exit is system call ane. In order to signify that nosotros intend to make the exit organization phone call, we would motion the value 1 into eax before making the kernel interrupt.

Depending on the system call, up to iii additional arguments can exist provided, which are stored in ebx, ecx and edx - the other three general purpose registers. In the get out example given before, there is only one additional argument - the value in ebx will be the exit lawmaking returned by the plan when it exits.

Hither is some sample code for the leave arrangement call: (AT&T Syntax)

movl $1, %eax # leave is organisation telephone call one
movl $three, %ebx # the exit code is iii
int $0x80 #kernel interrupt

This would have the same effect as opening a linux shell and inbound the command 'exit 3'. After the plan has completed, we can cheque the go out lawmaking by typing "echo $?". The output volition exist three if the program ran successfully.

A simple exit programme in AT&T Syntax

This is an case of a very unproblematic program that merely makes the exit organisation call in linux. In linux, the go out system call is called past applying a kernel interrupt with the int educational activity in the course "int $0x80". When it is called, a specific system telephone call (which is an integral function of the operating system) will be executed - we select which organisation phone call to execute with a value stored in the eax register. In this case, we are calling exit - arrangement call 1, and exiting with the leave code 03.

.section .text
.globl _start #_start is a global symbol that signifies the outset of executable lawmaking

_start:
movl $one, %eax #go out is system telephone call i
movl $3, %ebx #go out code is 3
int $0x80

Note that this is a very uncomplicated example that does nothing only exit. It also employs a GNU/Linux organization call; system calls are rarely uniform between Operating Systems, and in item betwixt Linux and Windows.

If this program was saved to exit.s, you would get through the following steps to compile and link your program:

equally exit.s -o exit.o # convert our lawmaking into an object file
ld exit.o -o go out.out # link our object file into a prepared ELF executable
./exit.out #execute our program

If the program exits successfully, you should be able to blazon echo $? immediately afterwards the plan finishes and run across the code 3 returned.

The Stack

In order to understand how special instructions and special jumps are used, it is necessary to have a solid understanding of the nature of functions and stack-based operations. The execution stack and the data stack are two wholly seperate things and must be understood as such.

When an application loads into retention, it begins at what is referred to equally the "entry point" of the binary. Program execution so occurs in increasing memory addresses. Memory addresses are stored in DWORD values. For instance, suppose that the entry indicate to app.exe is at the hexadecimal address value 0x101c879d in retention. Suppose that the code there is 'mov eax, 1' (or mov 1, %eax), and the side by side segment of pedagogy is 'add eax, 3'. The program would look something like the following in memory:

Retention Address	Auto Code	Assembly Instructions
0x101c879d	\xb8\x01\x00\x00\x00	<syntaxhighlight lang="asm">mov eax, 1</syntaxhighlight>
0x101c87a2	\x83\xc0\x03	<syntaxhighlight lang="asm">add together eax, 3</syntaxhighlight>

The stack is a construct used to make procedural execution and storage easier - it takes its name from the order in which it is altered; it is possible to "button" a value onto the "tiptop" of the stack, or "popular" a value from the "meridian" of the stack. Keep in mind that the stack starts at a relatively high address and grows downards in memory, so the "superlative" of the stack is really located at a lower retention adress than the "lesser".

The execution stack refers to the "stack" of instructions to be executed in a binary. When a program loads into memory, the entry point of the binary will be at the top of the execution stack. The data stack, on the other manus, is used to store data for afterward use - such as when a office is needed. The employ of the information stack is vital to the C calling convention, which is a technique of executing modular lawmaking in a program that takes its proper name from the method that the C programming languages uses to execute functions.

The main reason for usage of the stack is the fact that y'all can arbitrarily push and pop elements onto or off from the stack without having to worry about the actual retentivity address that they are located at. Furthermore, the stack pointer(esp or %esp) is altered every time and element is added to or removed from the stack. It is possible to use this to refer non only to the chemical element at the top of the stack, just to other elements within the stack. It is also possible it allocate or deallocate space on the stack simply by incrementing or decrementing the stack pointer.

Overflows

The stack is a linear region of retentiveness with a especially allocated size, which cannot exceed xvi megabytes on 32 bit x86, for various reasons. Think of the stack like a stack of paper. While the location of each slice of paper is stored in a hexadecimal DWORD address, the stack grows backwards. This is to say that a stack is assigned a certain memory range and that range of memory is written to from the highest value to the lowest, non the lowest value to the highest. The stack is considered a part of the .data segment of an awarding, or the .bss segment. This is of import considering of something chosen DEP, or Data Execution Prevention. The stack region of an awarding is the region that can be "overflowed" to allow us to execute capricious machine code, however considering it is a data region of the plan certain safeguards take been put in place to prevent code located within the stack from executing.

Lets make a simulated stack, for the sake of example and explanation so that the reader may amend grasp this concept. lets say that our stack exists between addresses 0x100010ff and 0x10001000. The stack would look like this, in memory :

Retentiveness Address/Range	Location	Contents
0x10001000	[Acme of the Stack]
0x10001001-0x100010fe	[Data]
0x100010ff	[Bottom of the Stack]

Now we tin can see that the top of the stack has a lower memory address than the bottom of the stack. At present we're going to demand a flake more than information nigh our not-general purpose registers and a flake more information well-nigh our stack operation special instructions.

Push & Pop

Educational activity	Purpose
Push button	"Pushes" a value on top of the stack
Popular	"Pops" a value off of the top of the stack

Obviously push applies to both registers likewise as static data. For instance, we tin push direct information in hexadecimal format like push button 0x1badc0de or we can push a byte like push button 0x1b or nosotros tin push a register'southward dword value like push button eax. Pushing a nybble is not supported. Lets go over the push and pop concept a bit more with our "false stack" that nosotros're creating for the sake of education. My fake car lawmaking is equally follows :

Associates	Machine Code
<syntaxhighlight lang="asm">push button 0x1badc0de</syntaxhighlight>	\x68\xde\xc0\xad\x1b
<syntaxhighlight lang="asm">push 0xabad1dea</syntaxhighlight>	\x68\xea\x1d\xad\xab
<syntaxhighlight lang="asm">push 0xcafebabe</syntaxhighlight>	\x68\xbe\xba\xfe\xca
<syntaxhighlight lang="asm">push 0xdeadbeef</syntaxhighlight>	\x68\xef\xbe\xad\xde

Alright, so say we execute that code starting with the top line first. The data stack volition now look something like the following :

Memory Accost	Location	Value
0x100010ff	[Top of Stack]	???
0x100010f0-0x100010f3		"deadbeef"
0x100010f4-0x100010f7		"cafebabe"
0x100010f8-0x100010fb		"abad1dea"
0x100010fc-0x100010ff	[Lesser of Stack]	"1badcode"

A detailed view of the bottom of the stack:

Memory Address	Value
0x100010f8	"ab"
0x100010f9	"advertizement"
0x100010fa	"1d"
0x100010fb	"ea"
0x100010fc	"1b"
0x100010fd	"ad"
0x100010fe	"c0"
0x100010ff	"de"

So we tin sort of empathise how the stack grows backwards now, as 1badcode was the first affair to get pushed onto the stack and is at the bottom of the stack. Remember of the stack similar a stack of papers. When you button something onto information technology, its similar putting a piece of paper on it. When you push button something else on it, you lot're putting some other piece of paper on the first piece of newspaper. This is where these "special registers" come up into play.

The C Calling Convention

As mentioned previously, one of the data stack's most useful applications is every bit a placeholder for important data when executing a function or subprocess. Although there are multiple methods of doing this, one very widely used technique is known every bit the C Calling Convention, later the linguistic communication that originally employed it.

As mentioned in to a higher place, there are several ways in which the stack can be used by an executable, and the C calling convention applies all of them. Firstly, it is possible to use the push and pop instructions to add to or remove from the top of the stack.

pushl %ebx - pushes the value contained in the ebx annals to the stack
popl %ebx - pops the 4 bytes on the top of the stack into the ebx register

It is possible to reference a value contained in the stack using the stack arrow annals, esp. The stack pointer always holds the memory address of the top of the stack. This allows you to create or remove space from the stack by incrementing or decrementing esp, and it allows yous to reference items stored on the stack by using indirect addressing. with esp.

%esp = the memory address of the top of the stack
(%esp) = the value stored at the superlative of the stack
four(%esp) = the value stored 4 bytes into the stack
-four(%esp) = the value stored 4 bytes above the height of the stack

Before a office can be chosen, it must exist divers in the information department of the code. In, AT&T syntax, this is done with:

.type function_name, @role

In order to telephone call this part, push button each paramater of the function onto the stack in reverse society:

One time this is done, phone call the function with the call speical insruction:

The phone call instruction pushes the accost of the instruction after the call onto the peak of the stack. This is the address that would ordinarily exist stored in %eip, the instruction arrow. The reason this address, known equally the return address, is pushed to the stack is that the function is called by irresolute the pedagogy pointer; pushing the render address ensures that once our function has completed, we can return to the programme's normal execution. Subsequently information technology has pushed the return address onto the stack, information technology modifies the instruction pointer so that it points to the first line of our part.

Example code:

.department .data
.type somename,@function
.globl _start

_start:

            pushl $5    pushl $3    call someoname

somename:

            <function code>

At the indicate nosotros execute the call instruction, our stack looks similar this:

Statement two
Statement ane
Return Address [%esp points hither]

The part lawmaking can comprise whatever is needed to achieve the desired effect, only once the function has completed it must be handled properly. If any changes have been fabricated to the stack, they must be altered then that the return address is in one case more than at the height of the stack. Nosotros then use the ret instruction to pop the return address back into eip, returning execution to the chief trunk of the program.

Instance of an add1 function [AT&T syntax]:

.section .text

.type add1,@function .globl _start

_start:

            pushl $5    #this is our statement   phone call add1   #saves the return address and transfers execution to the add1 characterization   #add1 happens here

            movl %eax, %ebx #move the return value from add1 into %ebx for leave phone call   movl $ane, %eax #get out is sycall 1   int $0x80 #kernel interrupt, exiting (syscall 1) with exit code equal to render value from add1

add1:

            pushl %ebp #push button the value of ebp so we can use it to store values   movl %esp, %ebp #store the value of esp into ebp so we can use information technology as a static pointer to the top of the stack at this signal   movl 8(%ebp), %eax #move the argumen into eax   incl %eax #add1 to eax   #now we retore everything and so the return address is at the elevation of the stack   movl %ebp, %esp #restore the stack pointer so information technology points to the stored value of ebp   popl %ebp #pop the value of ebp back into ebp, and now the return address is at the top of the stack   ret #pop the return address into eip and resume execution

Linking shared libraries into a plan

Although assembly is a very powerful tool in the right hands, it is oftentimes counterproductive to write your ain code when libraries exist to perform the functions y'all require. This is more true in assembly than whatever other linguistic communication, because assembly operates on such a low level; a task as simple as printing to STDOUT can require a large amount of effort and coding in associates.

The libraries that a program tin can directly reference are known equally shared objects (or simply SO'due south) under a UNIX architecture or Dynamic Linked Libraries (DLLs) nether Windows. They can be called with the C Calling Convention, but they first must be linked into the programme binary. After the assembler has translated the lawmaking of a plan into object files, these objects are so linked together and then that when the executable file is called, whatsoever necessary libraries or objects will be dynamically loaded. In order for a specific shared object file - for instance, the libc libraries - to be called by the program, the shared object needs to be passed to the linker so that information technology knows to include it.

It is from this process that ELF format for executables takes its name - "Executable and Linkable Format".

An example of assembling and linking a program that references libc under GNU/Linux may exist as follows:

as file.s -o file.o :: this is where nosotros get together our lawmaking into an object file, file.o
ld -dynamic-linker /lib/ld-linux.then.two -o file file.o -lc

Looking at the link control, we can run into that we have called the dynamic linker to link our object files into an output executable. Furthermore, we have passed ld the '-lc' option, which indicates that the libc libraries should also be added to the linked executable. For a program that is properly linked with the libc shared object, it would be possible to call on functions from that library using the C Calling Convention. For instance, to call on the printf function:

.department .data
str:
.ascii "how-do-you-do, globe!\n\0"

.section .text
.globl _start

_start:
push $str #push the string to the stack in accordance with the C Calling Convention
call printf #utilise phone call to push the return address and transfer execution to libc'due south printf role

movl $1, %eax
movl $1, %ebx
int $0x80 #execute a kernel interrupt; eax is equal to one and ebx is also equal to one, then we are exiting(syscall 1) with leave code 1

Associates Basics is part of a series on programming.
<center> </middle>