Notes on <Assembly Language step by step>
By brant-ruan
Yeah, I feel very happy
When you want to give up, think why you have held on so long.
Just fight.
Somebody may ask you: Why would you want to do that?
Yeah, because I want to know how it works.
Assembly language programming is about memory addressing.
Be audacious!
Take notes.
You have to decide who is to be the master, you or th machine,
and make it so.
You are in it to figure out how it works.
Let's go.
Even if you've learned every single instruction in an
instruction set, you haven't learned assembly language.
The skill of assembly language consisits of a deep
comprehension of memory addressing.
Real mode flat model.
Real mode segmented model.
Protected mode flat model.(now)
Note: there is no necessary relation between the number
of address lines in a memory system and the size of the
data stored at each location.
CS code segment
2015/12/21
Chapter 5 The right to Asm
[nasm -f elf -g -F stabs eatsyscall.asm]
If your machine is x86_64, you should use instruction below:
[nasm -f elf64 -g -F stabs eatsyscall.asm][ld -o eatsyscall estsyscall.o]
man nasm for more details
My first program in asm_64:
SECTION .data ; initialized data segment
EatMsg: db "Hello,world!", 10
EatLen: equ $-EatMsg
SECTION .bss ; uninitialized data segment
SECTION .text ; code segment
global _start ; entry point for linker
_start:
nop ; this nop instruction will benefit gdb
mov eax, 4 ; specific sys_write system-call
mov ebx, 1 ; specific file-descriptor 1: stdout
mov ecx, EatMsg ; offset address of string
mov edx, EatLen ; length of string
int 80h ; system call
mov eax, 1 ; specific Exit system-call
mov ebx, 0 ; return 0
int 80h ; system call
Something about GDB
-list
-info registers
-print $rbp
-bt (show stack layers)
-frame [n]
-stepi
-nexti
-info breakpoints
-disassemble
-x /nfu # examine the memory
details about x /nfu :
n is the number of memory units to be showed
f controls the format:
x-hex d-dec u-dec-unsigned o-oct t-bin c-char f-float
u stands for the length of one unit(b/h/w/g)
e.g. x /5c &msg (section .data msg db 'hello,world')
The author uses Kate editor(apt-get install kate)
And there are lots of configurations of Kate before you can use it comfortably
so I think you'd better look for those in the book...
Session is a integration of one project's current state and configurations.
Using too many tools, I just want to say fuck...
2015/12/23
Chapter 7
# A minimum NASM program:
section .data
section .text
global _start
_start:
nop
nop
section .bss
makefile
sandbox: sandbox.o
[tab] ld -o sandbox sandbox.o
sandbox.o: sandbox.asm
[tab] nasm -f elf64 -g -F stabs sandbox.asm
.PHONY: clean
clean:
[tab] rm sandbox sandbox.o
mov eax,'WXYZ'
In gdb: eax = 0x5a595857
xchg cl, ch
mov [eatmsg], byte 'G'
Now you can use all the general registers to address
x86_64 asm. DEC/INC will affects some EFLAGS
2015/12/24
neg ax
move with signed extend:
movsx reg16, reg8
movsx reg32, reg16
movsx reg32, reg8
multiply just like the 8086:
mul reg8
mul reg 16
mul reg32
mul reg64
div:
div reg8
div reg16
div reg32
remember:
and instructions are the slowest in x86.
Chapter 8 Our Object All Sublime
comments are important!!!
one comment per line!
Comment content:
source-code name
executable-file name
creation date
modification date
author name
compiler name and instructions
.bss segment:
it does not contribute to file-room. data in it has not been initialized.
.text segment:
Contains symbols called labels that identify locations in the program code for jumps and calls, but beyond your instruction mnemonics
In Linux executable file, there must be a '_start' label.
In Linux, EOL(End Of Line) is 10 in ASCII.So if you define a string,
it is better to do like ["hello,world",10]
Calculate the length of a string:
eatmsg: db 'hello,world',10
eatlen: equ $-eatmsg # connect a value with a lebel (just like macro define)
The '$' above means the current location. And that is Assembly-time
calculation.
Stack
Push
push reg16/reg32/mem
pushf ; push 16-bit flags
pushfd ; push eflags (32 bits)
pusha ; push 8 16-bit general registers
above: ax/bx/cx/dx/si/di/sp/bp
pushad ; push 8 32-bit general registers
above: eax/ebx/ecx/edx/esi/edi/esp/ebp
Pop
pop reg16/reg32/mem
popf
popfd
popa (however, sp's value does not change!!!)
popad(however, esp's value does not change!!!)
push/popad are invalid in x86_64 mode
iret
INT 80
mov eax, 4 ; specific the sys_write call (write data into a file)
mov ebx, 1 ; specific the file-descriptor: stdout
mov ecx, eatmsg ; offset of string
mov edx, eatlen ; the length of string
int 80h
(I want to know where I can get these arguments)
mov eax,1 ; specific exit call
mov ebx,0 ; specific 0 as return value
int 80h
program-1 UPPERCASE-1
version-1
section .bss
Buff resb 1 ; reserve byte
section .data
section .text
global _start
_start:
nop
Read:
mov eax, 3 ; specific sys_read call
mov ebx, 0 ; specific file-descriptor: stdin
mov ecx, Buff ; read from buff
mov edx, 1 ; read 1 char
int 80h
cmp eax, 0 ; if it is EOF (eax is the char num been read)
je Exit
cmp byte [Buff], 'a'
jb Write
cmp byte [Buff], 'z'
ja Write
sub byte [Buff], 20h
Write:
mov eax, 4
mov ebx, 1
mov ecx, Buff
mov edx, 1
int 80h
jmp Read
Exit:
mov eax, 1
mov ebx, 0
int 80h
version-2
section .bss
BUFFLEN equ 1024
Buff: resb BUFFLEN ; (reserve byte)
section .data
section .text
global _start
_start:
nop
read:
mov eax, 3
mov ebx, 0
mov ecx, Buff
mov edx, BUFFLEN
int 80h
mov esi, eax
cmp eax, 0 ; #
je Done ; #
mov ecx, esi
mov ebp, Buff
dec ebp # I think it is a terrible way...
Scan:
cmp byte [ebp+ecx], 'a'
jb Next
cmp byte [ebp+ecx], 'z'
ja Next
and byte [ebp+ecx], 11011111b
Next:
dec ecx
jnz Scan
Write:
mov eax, 4
mov ebx, 1
mov ecx, Buff
mov edx, esi
int 80h
jmp read
Done:
mov eax, 1
mov ebx, 0
int 80h
nop
Remember to use pseudocode!!!
Chapter 9 - Bits, Flags, Branches, and Tables
Top-down is a very effective learning method.
Bit mapping!
Instructions:
AND/OR/NOT/XOR/ROL/ROR/RCL/RCR/SHL/SHR
AND: Masking out bits
XOR: xor eax,eax
SHL: shl reg/mem ; (count: 1~255)
RCL/RCR (rotate with carry bit)
CLC/STL (CLear/SeT CF bit)
program-hexdump1.asm
TEST opd1, mask ; calculate opd1 AND opd2 to set flags
application: to check whether a specific bit is 1
Bit Test:BT eax, 4
it will pass the fourth bit in eax to CarryFlag
application:
bt eax, 4;
jnc quit;
This instruction is very slow
Protected Mode Memory Addressing in Detail:
[BASE+(INDEX*SCALE)+DISP]
BASE: any general register
INDEX: any general register
SCALE: 1, 2, 4 or 8
DISP: any 32-bit constant.
16 or 8 bit registers can not be used to address
If you want to use other scales, you must do it yourself:
mov edx,ecx
shl edx,1
add edx, ecx
above is ecx*3
Or you can make a table:
scalevalues: dd 0,25,50,75,100,125,150,175,200,225
mov ecx, 6
mov eax, [scalevalues+ecx*4]
then eax stores the 6th element's effective address
section .data
Sums: dd 15,12,6,0,21,14,4,0,0,19
[Sums+ECX*4]:
15 : [Sums+0*4]
12 : [Sums+1*4]
LEA:
lea ebx, [ScaleValues+ecx*4]
Here's another application: multiply:
mov edx, ecx
shl edx, 1
add edx, ecx
You can also use:
mov edx, ecx
lea edx, [edx*2+edx]
What a nice instruction!
is faster than
Remember if you can use 32-bit reg, don't use 16/8-bit reg
XLAT:
table address in EBX
char to be converted in AL
new char in AL
xlat
mov ebx, UpCase
mov al,byte [edx+ecx]
xlat
Remember Table Translation is a very important skill!
if the number is too large, maybe you can't use XLAT (it only allows 8 bits).
But still remember:Table Translation is a very important skill!
Chapter 10 Dividing and Conquering
When procedures have local data, it's almost always data that is placed on the
stack when a procedure is called. The procedure can then access these pushed
data items on the stack. However, The procedure can't just pop those data
beacuse of the return address in the way.
Stack Frame (important)
Local label: label has a period in front of it.
E.g. .poke
Local labels are local to the first nonlocal label preceding them in the code.
A local label cannot be referenced higher than the blobal label owning it.
As long as a global label exists between two local labels with the same name,
NASM has no trouble distinguishing them.
E.g.
Scan:
xor eax, eax
.modTest:
test esi, 0000000fh
jnz Scan
.modTest belongs to Scan
Use
jne Scan.modTest
can access local label compulsively.
jne Scan ; Short jump, to within 127 bytes
jne near Scan ; Near jump, to within 2GB bytes
EXTERN ClearLine ; external file's label needed by you
GLOBAL DumpLin ; your label needed by external file
Then you can separate the ClearLine procedure in its own file and compile .o
Use Comment Headers!!!
E.g.
; -----------------------------------------------------
; LoadBuff: Fills a buffer with data from stdin via INT 80h sys_read
; UPDATED: 4/15/2009
; IN: Nothing
; RETURNS: # of bytes read in EBP
; MODIFIES: ECX, EBP, Buff
; CALLS: Kernel sys_read
; DESCRIPTION:
; Loads a buffer full of data (BUFFLEN bytes) from stdin using INT 80h sys_read
; and places it in Buff. Buffer offset counter ECX is zeroed, because we're
; starting in on a new buffer full of data. Caller must test value in EBP: If
; EBP contains zero on return, we hit EOF on stdin. Less that 0 in EBP on return
; indicates some kind of error.
Simple Cursor Control in the Linux Console
Linux console can be controlledd by sending it escape sequences embedded in
the stream of text traveling from your program to stdout.
mov eax, 4
mov ebx, 0
mov ecx, ADDRESS
mov edx, LENGTH
int 80h
ClearTerm:
db 27, "[2J" ; [2J
The escape sequence is 4 characters long.
The command above clears the display.
db 27, "[01;01H" ; <ESC>[<Y>;<X>H
The command above moves the cursor.
db 27, "[42m"
The command above change the background color to green
You can use "man console_codes" to find more details.
(There are many interesting things.)
A library for NASM assembly language, called LinuxAsmTools:
http:// linuxasmtools.net/
Macros:
The expansion of the macros occurs only in memory.
E.g.
%macro WriteStr 2 ; 2 is the number of parameters
push rax
push rbx
mov ecx, %1
mov edx, %2
mov eax, 4
mov ebx, 1
int 80h
pop rbx
pop rax
%endmacro
All the usual rules governing instruction operands apply.
E.g.
%macro GotoXY 2
...
mov dl, %1 ; dl is a 8-bit register.
...
%endmacro
Then the parameter %1 can only hold an 8-bit argument.
invoke:
GotoXY bl, al
Local labels in macro:
%macro UpCase 2
mov edx, %1
mov ecx, %2
%%IsLC:
cmp byte [edx+ecx-1], 'a'
jb %%Bump
cmp byte [edx+ecx-1], 'z'
ja %%Bump
sub byte [edx+ecx-1], 20h
%%Bump:
dec ecx
jnz %%IsLC
%endmacro
local labels in macro can only be used in their macro.
To include a macro library:
%include "mylib.mac"
Chapter 11 Strings and Things
ESI points to source string
EDI points to destination string
ECX stores the length
Data coming from a source string or going to a destination string must begin
the trip from, end the trip at, or pass through register EAX
mov al, ' '
mov edi, VidBuff
mov ecx, COLS*ROWS
rep stosb
aaa ; ASCII Adjust After Addition.
64-bit asm doesn't support this instruction.
REP is a prefix
STOSB -- STOre String by Byte.
MOVSB
LOOPNZ watches ECX and ZF
-----------------------------------------------------------------------------
Command-Line Arguments and Examining the Stack
./showargs1 time for tacos
When Linux loads your program, it places much information on the stack before
letting the program's code begin execution.
This includes the fully pathname, any command-line arguments and the current
state of the Linux environment.
ESP points at the top of the stack.
At [ESP] is a 32-bit number giving you the count of the command-line args.
-----------------------------------------------------------------------------
Stack simulation|
32-bit null pointer --------------- | High address
Full pathname of executable |
Actual environment variables |
Actual command-line arguments |
Actual executable invocation text |
(System oddments and empty space) |
32-bit null pointer --------------- |
Address of last environment variable | Stack
Address of environment variable 3 |
Address of environment variable 2 |
Address of environment variable 1 |
32-bit null pointer --------------- |
Address of last argument |
Address of argument 2 |
Address of argument 1 |
Address of executable invocation text |
Count of arguments (Always at least 1) <--ESP $ Low address
-----------------------------------------------------------------------------
You use the addresses found on the stack to access items up-memory.
Beginning with the 2.6 version of the Linux kernel, the kernel randomizes the
boundaries of the stack. Each time your program runs, its stack addresses will
be different.
The only tricky part is determining how many bytes belong to each argument, so
you can copy that argument data somewhere. Because each argument ends with a
single 0-byte, the challenge is to search for the 0.
So you need this instruction: SCASB (SCAn String by Byte)
SCASB: Look up in the Intel manual.
REPNE SCASB ; Repeat SCASB as long as [EDI] does not equal AL.
Chapter 12 Heading Out to C
The main function in C program is really a function.
The standard C library code calls it with a CALL instruction, and it returns
control to the standard C library code by executing a RET instruction.
Stack frame:
push ebp
mov ebp, esp
EBP is considered the anchor of your new stack frame.
After the stack frame is OK, the first things you need are to put EBX, ESI and
EDI on the stack.
E.g.
*********************************
[SECTION .data]
Msg db "hello,world!", 0
[SECTION .bss]
[SECTION .text]
extern puts
global main
main:
push ebp
mov ebp, esp
push ebx
push esi
push edi
; Your codes....
push Msg
call puts
add esp, 4
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret
*********************************
If your 64-bit gcc does not have 32 libraries, install:
apt-get install gcc-multilib g++-multilib
Then:
nasm -f elf hello.asm
gcc -o hello -m32 hello.o
最新文章
- Dom addEventlistener与id 绑定事件的区别(续)
- 关于UIView的方法animateWithDuration:animations:completion:的说明
- OSG配置问题
- 关于HTML的编码问题
- jQuery中ajax应用
- ckeditor与ckfinder简单整合使用
- Unity3D中的Coroutine详解
- android recover 系统代码分析 -- 选择进入
- jquery插件datepicker
- VS2008 环境中完美搭建 Qt 4.7.4 静态编译的调试与发布(好像很不错,有六张插图说明)good
- jmeter参数化之CSV Data Set Config
- [.net 面向对象程序设计深入](24)实战设计模式——策略模式(行为型)
- Android开发使用的常见第三方框架
- 关于定时器setTimeout()方法的实践--巧解bug
- 隐藏非选中的checkBox
- Vue 动态绑定类名
- (转) Hadoop1.2.1安装
- Linux学习--2
- 普及向 ZKW线段树!
- HDOJ.1228 A + B (map)
热门文章
Chapter 8 Our Object All Sublime
comments are important!!!
one comment per line!
Comment content:
source-code name
executable-file name
creation date
modification date
author name
compiler name and instructions
.bss segment:
it does not contribute to file-room. data in it has not been initialized.
.text segment:
Contains symbols called labels that identify locations in the program code for jumps and calls, but beyond your instruction mnemonics
In Linux executable file, there must be a '_start' label.
In Linux, EOL(End Of Line) is 10 in ASCII.So if you define a string,
it is better to do like ["hello,world",10]
Calculate the length of a string:
eatmsg: db 'hello,world',10
eatlen: equ $-eatmsg # connect a value with a lebel (just like macro define)
The '$' above means the current location. And that is Assembly-time
calculation.
Stack
Push
push reg16/reg32/mem
pushf ; push 16-bit flags
pushfd ; push eflags (32 bits)
pusha ; push 8 16-bit general registers
above: ax/bx/cx/dx/si/di/sp/bp
pushad ; push 8 32-bit general registers
above: eax/ebx/ecx/edx/esi/edi/esp/ebp
Pop
pop reg16/reg32/mem
popf
popfd
popa (however, sp's value does not change!!!)
popad(however, esp's value does not change!!!)
push/popad are invalid in x86_64 mode
iret
INT 80
mov eax, 4 ; specific the sys_write call (write data into a file)
mov ebx, 1 ; specific the file-descriptor: stdout
mov ecx, eatmsg ; offset of string
mov edx, eatlen ; the length of string
int 80h
(I want to know where I can get these arguments)
mov eax,1 ; specific exit call
mov ebx,0 ; specific 0 as return value
int 80h
program-1 UPPERCASE-1
version-1
section .bss
Buff resb 1 ; reserve byte
section .data
section .text
global _start
_start:
nop
Read:
mov eax, 3 ; specific sys_read call
mov ebx, 0 ; specific file-descriptor: stdin
mov ecx, Buff ; read from buff
mov edx, 1 ; read 1 char
int 80h
cmp eax, 0 ; if it is EOF (eax is the char num been read)
je Exit
cmp byte [Buff], 'a'
jb Write
cmp byte [Buff], 'z'
ja Write
sub byte [Buff], 20h
Write:
mov eax, 4
mov ebx, 1
mov ecx, Buff
mov edx, 1
int 80h
jmp Read
Exit:
mov eax, 1
mov ebx, 0
int 80h
version-2
section .bss
BUFFLEN equ 1024
Buff: resb BUFFLEN ; (reserve byte)
section .data
section .text
global _start
_start:
nop
read:
mov eax, 3
mov ebx, 0
mov ecx, Buff
mov edx, BUFFLEN
int 80h
mov esi, eax
cmp eax, 0 ; #
je Done ; #
mov ecx, esi
mov ebp, Buff
dec ebp # I think it is a terrible way...
Scan:
cmp byte [ebp+ecx], 'a'
jb Next
cmp byte [ebp+ecx], 'z'
ja Next
and byte [ebp+ecx], 11011111b
Next:
dec ecx
jnz Scan
Write:
mov eax, 4
mov ebx, 1
mov ecx, Buff
mov edx, esi
int 80h
jmp read
Done:
mov eax, 1
mov ebx, 0
int 80h
nop
Remember to use pseudocode!!!
Chapter 9 - Bits, Flags, Branches, and Tables
Top-down is a very effective learning method.
Bit mapping!
Instructions:
AND/OR/NOT/XOR/ROL/ROR/RCL/RCR/SHL/SHR
AND: Masking out bits
XOR: xor eax,eax
SHL: shl reg/mem ; (count: 1~255)
RCL/RCR (rotate with carry bit)
CLC/STL (CLear/SeT CF bit)
program-hexdump1.asm
TEST opd1, mask ; calculate opd1 AND opd2 to set flags
application: to check whether a specific bit is 1
Bit Test:BT eax, 4
it will pass the fourth bit in eax to CarryFlag
application:
bt eax, 4;
jnc quit;
This instruction is very slow
Protected Mode Memory Addressing in Detail:
[BASE+(INDEX*SCALE)+DISP]
BASE: any general register
INDEX: any general register
SCALE: 1, 2, 4 or 8
DISP: any 32-bit constant.
16 or 8 bit registers can not be used to address
If you want to use other scales, you must do it yourself:
mov edx,ecx
shl edx,1
add edx, ecx
above is ecx*3
Or you can make a table:
scalevalues: dd 0,25,50,75,100,125,150,175,200,225
mov ecx, 6
mov eax, [scalevalues+ecx*4]
then eax stores the 6th element's effective address
section .data
Sums: dd 15,12,6,0,21,14,4,0,0,19
[Sums+ECX*4]:
15 : [Sums+0*4]
12 : [Sums+1*4]
LEA:
lea ebx, [ScaleValues+ecx*4]
Here's another application: multiply:
mov edx, ecx
shl edx, 1
add edx, ecx
You can also use:
mov edx, ecx
lea edx, [edx*2+edx]
What a nice instruction!
is faster than
Remember if you can use 32-bit reg, don't use 16/8-bit reg
XLAT:
table address in EBX
char to be converted in AL
new char in AL
xlat
mov ebx, UpCase
mov al,byte [edx+ecx]
xlat
Remember Table Translation is a very important skill!
if the number is too large, maybe you can't use XLAT (it only allows 8 bits).
But still remember:Table Translation is a very important skill!
Chapter 10 Dividing and Conquering
When procedures have local data, it's almost always data that is placed on the
stack when a procedure is called. The procedure can then access these pushed
data items on the stack. However, The procedure can't just pop those data
beacuse of the return address in the way.
Stack Frame (important)
Local label: label has a period in front of it.
E.g. .poke
Local labels are local to the first nonlocal label preceding them in the code.
A local label cannot be referenced higher than the blobal label owning it.
As long as a global label exists between two local labels with the same name,
NASM has no trouble distinguishing them.
E.g.
Scan:
xor eax, eax
.modTest:
test esi, 0000000fh
jnz Scan
.modTest belongs to Scan
Use
jne Scan.modTest
can access local label compulsively.
jne Scan ; Short jump, to within 127 bytes
jne near Scan ; Near jump, to within 2GB bytes
EXTERN ClearLine ; external file's label needed by you
GLOBAL DumpLin ; your label needed by external file
Then you can separate the ClearLine procedure in its own file and compile .o
Use Comment Headers!!!
E.g.
; -----------------------------------------------------
; LoadBuff: Fills a buffer with data from stdin via INT 80h sys_read
; UPDATED: 4/15/2009
; IN: Nothing
; RETURNS: # of bytes read in EBP
; MODIFIES: ECX, EBP, Buff
; CALLS: Kernel sys_read
; DESCRIPTION:
; Loads a buffer full of data (BUFFLEN bytes) from stdin using INT 80h sys_read
; and places it in Buff. Buffer offset counter ECX is zeroed, because we're
; starting in on a new buffer full of data. Caller must test value in EBP: If
; EBP contains zero on return, we hit EOF on stdin. Less that 0 in EBP on return
; indicates some kind of error.
Simple Cursor Control in the Linux Console
Linux console can be controlledd by sending it escape sequences embedded in
the stream of text traveling from your program to stdout.
mov eax, 4
mov ebx, 0
mov ecx, ADDRESS
mov edx, LENGTH
int 80h
ClearTerm:
db 27, "[2J" ; [2J
The escape sequence is 4 characters long.
The command above clears the display.
db 27, "[01;01H" ; <ESC>[<Y>;<X>H
The command above moves the cursor.
db 27, "[42m"
The command above change the background color to green
You can use "man console_codes" to find more details.
(There are many interesting things.)
A library for NASM assembly language, called LinuxAsmTools:
http:// linuxasmtools.net/
Macros:
The expansion of the macros occurs only in memory.
E.g.
%macro WriteStr 2 ; 2 is the number of parameters
push rax
push rbx
mov ecx, %1
mov edx, %2
mov eax, 4
mov ebx, 1
int 80h
pop rbx
pop rax
%endmacro
All the usual rules governing instruction operands apply.
E.g.
%macro GotoXY 2
...
mov dl, %1 ; dl is a 8-bit register.
...
%endmacro
Then the parameter %1 can only hold an 8-bit argument.
invoke:
GotoXY bl, al
Local labels in macro:
%macro UpCase 2
mov edx, %1
mov ecx, %2
%%IsLC:
cmp byte [edx+ecx-1], 'a'
jb %%Bump
cmp byte [edx+ecx-1], 'z'
ja %%Bump
sub byte [edx+ecx-1], 20h
%%Bump:
dec ecx
jnz %%IsLC
%endmacro
local labels in macro can only be used in their macro.
To include a macro library:
%include "mylib.mac"
Chapter 11 Strings and Things
ESI points to source string
EDI points to destination string
ECX stores the length
Data coming from a source string or going to a destination string must begin
the trip from, end the trip at, or pass through register EAX
mov al, ' '
mov edi, VidBuff
mov ecx, COLS*ROWS
rep stosb
aaa ; ASCII Adjust After Addition.
64-bit asm doesn't support this instruction.
REP is a prefix
STOSB -- STOre String by Byte.
MOVSB
LOOPNZ watches ECX and ZF
-----------------------------------------------------------------------------
Command-Line Arguments and Examining the Stack
./showargs1 time for tacos
When Linux loads your program, it places much information on the stack before
letting the program's code begin execution.
This includes the fully pathname, any command-line arguments and the current
state of the Linux environment.
ESP points at the top of the stack.
At [ESP] is a 32-bit number giving you the count of the command-line args.
-----------------------------------------------------------------------------
Stack simulation|
32-bit null pointer --------------- | High address
Full pathname of executable |
Actual environment variables |
Actual command-line arguments |
Actual executable invocation text |
(System oddments and empty space) |
32-bit null pointer --------------- |
Address of last environment variable | Stack
Address of environment variable 3 |
Address of environment variable 2 |
Address of environment variable 1 |
32-bit null pointer --------------- |
Address of last argument |
Address of argument 2 |
Address of argument 1 |
Address of executable invocation text |
Count of arguments (Always at least 1) <--ESP $ Low address
-----------------------------------------------------------------------------
You use the addresses found on the stack to access items up-memory.
Beginning with the 2.6 version of the Linux kernel, the kernel randomizes the
boundaries of the stack. Each time your program runs, its stack addresses will
be different.
The only tricky part is determining how many bytes belong to each argument, so
you can copy that argument data somewhere. Because each argument ends with a
single 0-byte, the challenge is to search for the 0.
So you need this instruction: SCASB (SCAn String by Byte)
SCASB: Look up in the Intel manual.
REPNE SCASB ; Repeat SCASB as long as [EDI] does not equal AL.
Chapter 12 Heading Out to C
The main function in C program is really a function.
The standard C library code calls it with a CALL instruction, and it returns
control to the standard C library code by executing a RET instruction.
Stack frame:
push ebp
mov ebp, esp
EBP is considered the anchor of your new stack frame.
After the stack frame is OK, the first things you need are to put EBX, ESI and
EDI on the stack.
E.g.
*********************************
[SECTION .data]
Msg db "hello,world!", 0
[SECTION .bss]
[SECTION .text]
extern puts
global main
main:
push ebp
mov ebp, esp
push ebx
push esi
push edi
; Your codes....
push Msg
call puts
add esp, 4
pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret