I. x86 disassembly A. x86 architecture ------------------------------------------ ARCHITECTURAL OVERVIEW +--------------------+ +---------+ | CPU | | | | +----------------+ | | | | | Registers | | | | | +----^-----------+ | | RAM | | | | | | | +---v--+ +-----+ | | | | | ALU <-->Control<-----> | | +---^--+ +-----+ | | | +------|-------------+ | | | | | +-------v--------------+ | | | I/O Devices | | | | | +---------+ +----------------------+ ------------------------------------------ What does the RAM do? What does the ALU do? What does the control unit do? 1. memory a. real mode memory addressing ------------------------------------------ REAL MODE MEMORY ADDRESSING For 16 bit 8086 and 8088 (from 1978!) also 80286 and above, for compatibility with older programs Address is sum of: Segment address + offset FFFFFH +----------------+ | | | | +----------------+ 1F000H | |<-+ offset=F000 +----------------+ | | 64K segment | | | | | Seg. Reg. 10000H | |<\| +---------+ +----------------+ +-+ 1000 | | | +---------+ | | | | | | 00000H +---------------- ------------------------------------------ Why use this segment register + offset scheme? ------------------------------------------ MEMORY LAYOUT FOR A PROCESS Approximate, the sections may be - in different orders - not contiguous high addresses +----------------------------+ | environment ptr | | cmd line arguments | BP ->+----------------------------+ | stack | | | | | | | SP ->| v | +----------------------------+ | | SS ->| | +----------------------------+ +----------------------------+ | ^ | | | | | | | ES ->| heap | +----------------------------+ | | | .data or .bss | DS ->| | +----------------------------+ | .text | CS ->| | +----------------------------+ low | shared libraries | addresses +----------------------------+ ------------------------------------------ What's in the .text section? b. Protected mode addressing ------------------------------------------ PROTECTED MODE ADDRESSING 80286 (16 bit, from 1982) and above Doesn't use segment + offset to directly form address Segment register contains a selector into a global descriptor table, the descriptor has a base address, address is base address + offset So instructions are the same! memory +------------+ FFFFFF | | | | | | | | global | | descriptor | | table | | +-----------+ | | | | | | | | | | | | | | | | +------------+ 1000FF DS +-----------+ | data | +-------+ | ... | /> segment | 100000 | 0008 |->| 100000 +- +------------+ +-------+ | 00FF | | | +-----------+ | | | | | | +-----------+ +------------+ 000000 ------------------------------------------ c. virtual memory ------------------------------------------ VIRTUAL MEMORY 80386 (32 bit, from 1985) and above Program generates a "linear address" Memory paging unit translates it to a "physical address" ------------------------------------------ d. x64, flat memory addressing + virtual memory ------------------------------------------ X64 ARCHITECTURE 64 bit architecture, introduced with the Intel Pentium 4 (2000) linear addressing + virtual memory only sensible to use FS and GS segment registers ------------------------------------------ 2. instructions, opcodes, endianness ------------------------------------------ ASSEMBLY INSTRUCTIONS Intel assembler (NASM) conventions: mov ecx, 0x42 | | | Mnemonic destination source B9 42 00 00 00 | opcode constant in bytes Little-endian = least significant bytes first (left) Big-endian = most significant bytes first (left) ------------------------------------------ Which is Intel: big or little endian? 3. operands ------------------------------------------ TYPES OF OPERANDS Immediate: Register: Memory address: ------------------------------------------ 4. registers ------------------------------------------ REGISTERS General: 64 bit: RAX, RBX, RCX, RDX 32 bit: EAX, EBX, ECX, EDX 16 bit: AX, BX, CX, DX 8 bit AH & AL, BH & BL, CH & CL, DH & DL 64 bit mode also has registers r8-r15 64 bit: r8 32 bit: r8d 16 bit: r8w 8 bit: r8b ECX is used as a Source/Destination registers: 64 bit: RSI RDI 32 bit: ESI EDI 16 bit SI DI Stack-manipulation registers 64 bit: RBP RSP 32 bit: EBP ESP 16 bit: BP SP Flags: EFLAGS Instruction Pointer: RIP (64 bit) EIP (32 bit) IP (16 bit) ------------------------------------------ ------------------------------------------ BACKWARDS COMPATIBILITY REGISTERS +--------+---------+ | AH | AL | | | | +--------+---------+ 8 bits 8 bits +------------------+ | AX | | | +------------------+ 16 bits +-----------------------------------+ | EAX | | | +-----------------------------------+ 32 bits +-//------------------------------------+ | RAX | | | +-//------------------------------------+ 64 bits ------------------------------------------ ------------------------------------------ DATA TYPES AND BITS 7 0 +-----+ byte | | +-----+ 15 8 7 0 +-----+-----+ word | high| low | | byte| byte| +-----+-----+ N+1 N 31 16 15 0 +--------+-----------+ doubleword | | | | | | +--------+-----------+ N+2 N 63 32 31 0 +------------------+--------------------+ | high | low | | doubleword | doubleword | +------------------+--------------------+ N+4 N ------------------------------------------ ------------------------------------------ SEGMENTATION For protected mode (32 bit)in x86, 3 kinds of address: 1. segmentation-based: segment + offset 2. linear/virtual address (32/64 bit address) 3. physical address (32/64 bit address Segment registers (16 bit) CS ~ code SS ~ stack DS ~ data ES ~ extra FS ~ exception handling chain GS Segmentation is disabled in 64 bit mode - uses flat 64 bit address space ------------------------------------------ How much memory can be addressed with a 16 bit address? How much memory can be addressed with a 32 bit address? ------------------------------------------ EFLAGS REGISTER Bit Name Description ==================================== 0 CF Carry flag 2 PF Parity flag 4 AF Auxiliary carry flag 6 ZF Zero flag 7 SF Sign flag 8 TF Trap flag 9 IF Interrupt enable flag 10 DF Direction flag 11 OF Overflow flag 12-13 IOPL I/O Privilege level 14 NT Nested task flag 16 RF Resume flag 17 VM Virtual 8086 mode flag 18 AC Alignment check flag (486+) 19 VIF Virtual interrupt flag 20 VIP Virtual interrupt pending flag 21 ID ID flag ------------------------------------------ 5. instructions a. overview ------------------------------------------ INSTRUCTION FORMAT NASM syntax (Intel variant) Instruction format LABEL: OPCODE destop [, sourceop] [; comment] Example: HERE: cmp ebx, BEEFh ; does ebx have secret? push ebx push eax xor ebx, ebx ; ebx == 0 xor eax, eax ; eax == 0 ------------------------------------------ ------------------------------------------ ACCESSING MEMORY CONTENTS mov eax, 0xBEEF ; eax gets mov eax, [0xBEEF] ; eax gets ------------------------------------------ ------------------------------------------ ACCESSING VIA REGISTERS mov eax, ebx ; eax gets mov eax, [ebx] ; eax gets ------------------------------------------ b. details ------------------------------------------ GRAMMAR CONVENTIONS ::= means "can be" or "produces" | means "or" is the nonterminal named "x", a syntactic category other literal characters [] is an optional []... means 0 or more s '[' is a left square bracket (char) ']' is a right square bracket (char) ------------------------------------------ ------------------------------------------ MASM SYNTAX DETAILS ::= [