By brant-ruan

Yeah, I feel very happy

When you want to give up, think why you have held on so long.

Just fight.

Somebody may ask you: Why would you want to do that?

Yeah, because I want to know how it works.

Assembly language programming is about memory addressing.

Be audacious!

Take notes.

You have to decide who is to be the master, you or th machine,

and make it so.

You are in it to figure out how it works.

Let's go.

Even if you've learned every single instruction in an

instruction set, you haven't learned assembly language.

The skill of assembly language consisits of a deep

comprehension of memory addressing.

Real mode flat model.

Real mode segmented model.

Protected mode flat model.(now)

Note: there is no necessary relation between the number

of address lines in a memory system and the size of the

data stored at each location.

CS code segment

2015/12/21

Chapter 5 The right to Asm

[nasm -f elf -g -F stabs eatsyscall.asm]

If your machine is x86_64, you should use instruction below:

[nasm -f elf64 -g -F stabs eatsyscall.asm][ld -o eatsyscall estsyscall.o]

man nasm for more details

My first program in asm_64:

SECTION .data ; initialized data segment
EatMsg: db "Hello,world!", 10
EatLen: equ $-EatMsg

SECTION .bss ; uninitialized data segment
SECTION .text ; code segment

global _start ; entry point for linker

_start:
nop ; this nop instruction will benefit gdb
mov eax, 4 ; specific sys_write system-call
mov ebx, 1 ; specific file-descriptor 1: stdout
mov ecx, EatMsg ; offset address of string
mov edx, EatLen ; length of string
int 80h ; system call
mov eax, 1 ; specific Exit system-call
mov ebx, 0 ; return 0
int 80h ; system call

Something about GDB

-list
-info registers
-print $rbp
-bt (show stack layers)
-frame [n]
-stepi
-nexti
-info breakpoints
-disassemble
-x /nfu # examine the memory

details about x /nfu :

n is the number of memory units to be showed

f controls the format:

x-hex d-dec u-dec-unsigned o-oct t-bin c-char f-float

u stands for the length of one unit(b/h/w/g)

e.g. x /5c &msg (section .data msg db 'hello,world')

The author uses Kate editor(apt-get install kate)

And there are lots of configurations of Kate before you can use it comfortably

so I think you'd better look for those in the book...

Session is a integration of one project's current state and configurations.

Using too many tools, I just want to say fuck...

2015/12/23

Chapter 7

# A minimum NASM program:

section .data

section .text
global _start
_start:
nop

nop

section .bss


makefile

sandbox: sandbox.o
[tab] ld -o sandbox sandbox.o
sandbox.o: sandbox.asm
[tab] nasm -f elf64 -g -F stabs sandbox.asm
.PHONY: clean
clean:
[tab] rm sandbox sandbox.o

mov eax,'WXYZ'

In gdb: eax = 0x5a595857

xchg cl, ch
mov [eatmsg], byte 'G'

Now you can use all the general registers to address

x86_64 asm. DEC/INC will affects some EFLAGS

2015/12/24

neg ax

move with signed extend:
movsx reg16, reg8
movsx reg32, reg16
movsx reg32, reg8

multiply just like the 8086:
mul reg8
mul reg 16
mul reg32
mul reg64

div:
div reg8
div reg16
div reg32

remember:
and instructions are the slowest in x86.

Chapter 8 Our Object All Sublime

comments are important!!!

one comment per line!

Comment content:

source-code name

executable-file name

creation date

modification date

author name

compiler name and instructions

.bss segment:

it does not contribute to file-room. data in it has not been initialized.

.text segment:

Contains symbols called labels that identify locations in the program code for jumps and calls, but beyond your instruction mnemonics

In Linux executable file, there must be a '_start' label.

In Linux, EOL(End Of Line) is 10 in ASCII.So if you define a string,

it is better to do like ["hello,world",10]

Calculate the length of a string:

eatmsg: db 'hello,world',10
eatlen: equ $-eatmsg # connect a value with a lebel (just like macro define)

The '$' above means the current location. And that is Assembly-time

calculation.

Stack

Push

push reg16/reg32/mem
pushf ; push 16-bit flags
pushfd ; push eflags (32 bits)
pusha ; push 8 16-bit general registers

above: ax/bx/cx/dx/si/di/sp/bp

pushad ; push 8 32-bit general registers

above: eax/ebx/ecx/edx/esi/edi/esp/ebp

Pop

pop reg16/reg32/mem
popf
popfd
popa (however, sp's value does not change!!!)
popad(however, esp's value does not change!!!)

push/popad are invalid in x86_64 mode

iret

INT 80

mov eax, 4 ; specific the sys_write call (write data into a file)
mov ebx, 1 ; specific the file-descriptor: stdout
mov ecx, eatmsg ; offset of string
mov edx, eatlen ; the length of string
int 80h

(I want to know where I can get these arguments)

mov eax,1 ; specific exit call
mov ebx,0 ; specific 0 as return value
int 80h

program-1 UPPERCASE-1

version-1

section .bss
Buff resb 1 ; reserve byte
section .data

section .text
global _start

_start:
nop
Read:
mov eax, 3 ; specific sys_read call
mov ebx, 0 ; specific file-descriptor: stdin
mov ecx, Buff ; read from buff
mov edx, 1 ; read 1 char
int 80h
cmp eax, 0 ; if it is EOF (eax is the char num been read)
je Exit

cmp byte [Buff], 'a'
jb Write
cmp byte [Buff], 'z'
ja Write
sub byte [Buff], 20h

Write:
mov eax, 4
mov ebx, 1
mov ecx, Buff
mov edx, 1
int 80h
jmp Read

Exit:
mov eax, 1
mov ebx, 0
int 80h

version-2

section .bss
BUFFLEN equ 1024
Buff: resb BUFFLEN ; (reserve byte)

section .data

section .text
global _start
_start:
nop
read:
mov eax, 3
mov ebx, 0
mov ecx, Buff
mov edx, BUFFLEN
int 80h
mov esi, eax
cmp eax, 0 ; #
je Done ; #

mov ecx, esi
mov ebp, Buff
dec ebp # I think it is a terrible way...

Scan:
cmp byte [ebp+ecx], 'a'
jb Next
cmp byte [ebp+ecx], 'z'
ja Next

and byte [ebp+ecx], 11011111b

Next:
dec ecx
jnz Scan

Write:
mov eax, 4
mov ebx, 1
mov ecx, Buff
mov edx, esi
int 80h
jmp read

Done:
mov eax, 1
mov ebx, 0
int 80h
nop

Remember to use pseudocode!!!

Chapter 9 - Bits, Flags, Branches, and Tables

Top-down is a very effective learning method.

Bit mapping!

Instructions:

AND/OR/NOT/XOR/ROL/ROR/RCL/RCR/SHL/SHR

AND: Masking out bits

XOR: xor eax,eax

SHL: shl reg/mem ; (count: 1~255)

RCL/RCR (rotate with carry bit)

CLC/STL (CLear/SeT CF bit)

program-hexdump1.asm

TEST opd1, mask ; calculate opd1 AND opd2 to set flags

application: to check whether a specific bit is 1

Bit Test:BT eax, 4

it will pass the fourth bit in eax to CarryFlag

application:

bt eax, 4;

jnc quit;

This instruction is very slow

Protected Mode Memory Addressing in Detail:

[BASE+(INDEX*SCALE)+DISP]

BASE: any general register
INDEX: any general register
SCALE: 1, 2, 4 or 8
DISP: any 32-bit constant.

16 or 8 bit registers can not be used to address

If you want to use other scales, you must do it yourself:

mov edx,ecx

shl edx,1

add edx, ecx

above is ecx*3

Or you can make a table:

scalevalues: dd 0,25,50,75,100,125,150,175,200,225

mov ecx, 6

mov eax, [scalevalues+ecx*4]

then eax stores the 6th element's effective address

section .data
Sums: dd 15,12,6,0,21,14,4,0,0,19
[Sums+ECX*4]:
15 : [Sums+0*4]
12 : [Sums+1*4]

LEA:
lea ebx, [ScaleValues+ecx*4]

Here's another application: multiply:

mov edx, ecx
shl edx, 1
add edx, ecx

You can also use:

mov edx, ecx
lea edx, [edx*2+edx]

What a nice instruction!

is faster than

Remember if you can use 32-bit reg, don't use 16/8-bit reg

XLAT:

table address in EBX

char to be converted in AL

new char in AL

xlat

mov ebx, UpCase
mov al,byte [edx+ecx]
xlat

Remember Table Translation is a very important skill!

if the number is too large, maybe you can't use XLAT (it only allows 8 bits).

But still remember:Table Translation is a very important skill!

Chapter 10 Dividing and Conquering

When procedures have local data, it's almost always data that is placed on the

stack when a procedure is called. The procedure can then access these pushed

data items on the stack. However, The procedure can't just pop those data

beacuse of the return address in the way.

Stack Frame (important)

Local label: label has a period in front of it.

E.g. .poke

Local labels are local to the first nonlocal label preceding them in the code.

A local label cannot be referenced higher than the blobal label owning it.

As long as a global label exists between two local labels with the same name,

NASM has no trouble distinguishing them.

E.g.

Scan:
xor eax, eax
.modTest:
test esi, 0000000fh
jnz Scan

.modTest belongs to Scan

Use

jne Scan.modTest

can access local label compulsively.

jne Scan ; Short jump, to within 127 bytes
jne near Scan ; Near jump, to within 2GB bytes

EXTERN ClearLine ; external file's label needed by you
GLOBAL DumpLin ; your label needed by external file

Then you can separate the ClearLine procedure in its own file and compile .o

Use Comment Headers!!!

E.g.
; -----------------------------------------------------
; LoadBuff: Fills a buffer with data from stdin via INT 80h sys_read
; UPDATED: 4/15/2009
; IN: Nothing
; RETURNS: # of bytes read in EBP
; MODIFIES: ECX, EBP, Buff
; CALLS: Kernel sys_read
; DESCRIPTION:
; Loads a buffer full of data (BUFFLEN bytes) from stdin using INT 80h sys_read
; and places it in Buff. Buffer offset counter ECX is zeroed, because we're
; starting in on a new buffer full of data. Caller must test value in EBP: If
; EBP contains zero on return, we hit EOF on stdin. Less that 0 in EBP on return
; indicates some kind of error.

Simple Cursor Control in the Linux Console

Linux console can be controlledd by sending it escape sequences embedded in

the stream of text traveling from your program to stdout.

mov eax, 4

mov ebx, 0

mov ecx, ADDRESS

mov edx, LENGTH

int 80h

ClearTerm:
db 27, "[2J" ; [2J

The escape sequence is 4 characters long.

The command above clears the display.

db 27, "[01;01H" ; <ESC>[<Y>;<X>H

The command above moves the cursor.

db 27, "[42m"

The command above change the background color to green

You can use "man console_codes" to find more details.

(There are many interesting things.)

A library for NASM assembly language, called LinuxAsmTools:

http:// linuxasmtools.net/

Macros:

The expansion of the macros occurs only in memory.

E.g.

%macro WriteStr 2 ; 2 is the number of parameters
push rax
push rbx
mov ecx, %1
mov edx, %2
mov eax, 4
mov ebx, 1
int 80h
pop rbx
pop rax
%endmacro

All the usual rules governing instruction operands apply.

E.g.
%macro GotoXY 2
...
mov dl, %1 ; dl is a 8-bit register.
...
%endmacro

Then the parameter %1 can only hold an 8-bit argument.

invoke:

GotoXY bl, al

Local labels in macro:

%macro UpCase 2
mov edx, %1
mov ecx, %2
%%IsLC:
cmp byte [edx+ecx-1], 'a'
jb %%Bump
cmp byte [edx+ecx-1], 'z'
ja %%Bump
sub byte [edx+ecx-1], 20h
%%Bump:
dec ecx
jnz %%IsLC
%endmacro

local labels in macro can only be used in their macro.

To include a macro library:

%include "mylib.mac"

Chapter 11 Strings and Things

ESI points to source string

EDI points to destination string

ECX stores the length

Data coming from a source string or going to a destination string must begin

the trip from, end the trip at, or pass through register EAX

mov al, ' '
mov edi, VidBuff
mov ecx, COLS*ROWS
rep stosb

aaa ; ASCII Adjust After Addition.

64-bit asm doesn't support this instruction.

REP is a prefix

STOSB -- STOre String by Byte.

MOVSB

LOOPNZ watches ECX and ZF

-----------------------------------------------------------------------------

Command-Line Arguments and Examining the Stack

./showargs1 time for tacos

When Linux loads your program, it places much information on the stack before

letting the program's code begin execution.

This includes the fully pathname, any command-line arguments and the current

state of the Linux environment.

ESP points at the top of the stack.

At [ESP] is a 32-bit number giving you the count of the command-line args.

-----------------------------------------------------------------------------

Stack simulation|

32-bit null pointer --------------- | High address

Full pathname of executable |

Actual environment variables |

Actual command-line arguments |

Actual executable invocation text |

(System oddments and empty space) |

32-bit null pointer --------------- |

Address of last environment variable | Stack

Address of environment variable 3 |

Address of environment variable 2 |

Address of environment variable 1 |

32-bit null pointer --------------- |

Address of last argument |

Address of argument 2 |

Address of argument 1 |

Address of executable invocation text |

Count of arguments (Always at least 1) <--ESP $ Low address

-----------------------------------------------------------------------------

You use the addresses found on the stack to access items up-memory.

Beginning with the 2.6 version of the Linux kernel, the kernel randomizes the

boundaries of the stack. Each time your program runs, its stack addresses will

be different.

The only tricky part is determining how many bytes belong to each argument, so

you can copy that argument data somewhere. Because each argument ends with a

single 0-byte, the challenge is to search for the 0.

So you need this instruction: SCASB (SCAn String by Byte)

SCASB: Look up in the Intel manual.

REPNE SCASB ; Repeat SCASB as long as [EDI] does not equal AL.

Chapter 12 Heading Out to C

The main function in C program is really a function.

The standard C library code calls it with a CALL instruction, and it returns

control to the standard C library code by executing a RET instruction.

Stack frame:

push ebp
mov ebp, esp

EBP is considered the anchor of your new stack frame.

After the stack frame is OK, the first things you need are to put EBX, ESI and

EDI on the stack.

E.g.

*********************************

[SECTION .data]
Msg db "hello,world!", 0
[SECTION .bss]

[SECTION .text]
extern puts
global main

main:
push ebp
mov ebp, esp
push ebx
push esi
push edi

; Your codes....
push Msg
call puts
add esp, 4

pop edi
pop esi
pop ebx
mov esp, ebp
pop ebp
ret

*********************************

If your 64-bit gcc does not have 32 libraries, install:

apt-get install gcc-multilib g++-multilib

Then:

nasm -f elf hello.asm

gcc -o hello -m32 hello.o