X86 assembly programming in real mode



         


x86 assembly programming in real mode involves the manipulation of several 16-bit processor registers, and dealing with physical addresses in memory only (as opposed to protected mode).

[Top]

Registers

Each register is specialized for a certain task, and operations that deal with that task are often run more efficiently if the right register is used.

Registers in real mode include:

Each data register can be broken up into two eight-bit registers - that is 16 bits of data in a 16 bit register can be addressed 8 bits at a time: the upper eight and the lower eight bits, and can be treated as registers in their own right. For example, in the AX register, the AH register addresses the upper eight bits of the AX register, and the AL register addresses the lower eight bits of the AX register. The other data registers can be addressed in this way by changing the suffix - "X" for extended, "H" for high, and "L" for low.

Collectively the data and address registers are called the general registers.

With the general registers, there are additionally the:

The IP register points to where in the program the processor is currently executing it's code. The IP register cannot be accessed by the programmer directly.

The FLAGS register contains the current state of the processor. Each bit in this register is called a flag. Each flag can be either 1 or 0, set or not set. Some of the flags that the FLAGS register contains is carry, overflow, zero and single step.

Flags are notably used in the x86 architecture for comparisons. A comparison is made between two registers, for example, and in comparison of their difference a flag is raised. A jump instruction then checks the respective flag and jumps if the flag has been raised: for example

cmp ax, bx jne do_something

first compares the AX and BX registers, and if they are unequal, the code branches off to the do_something label.

[Top]

Mnemonics for opcodes

In real mode, the following mnemonics are available: aaa, aad, aam, aas, adc, add, and, call, cbw, clc, cld, cli, cmc, cmp, cmpsb, cmpsw, cwd, daa, das, dec, div, esc, hlt, idiv, imul, in, inc, int, into, iret, ja, jae, jb, jbe, jc, jcxz, je, jg, jge, jl, jle, jmp, jna, jnae, jnb, jnbe, jnc, jne, jng, jnge, jnl, jnle, jno, jnp, jns, jnz, jo, jp, jpe, jpo, js, jz, lahf, lds, lea, les, lock, lodsb, lodsw, loop, loope, loopne, loopnz, loopz, mov, movsb, movsw, mul, neg, nop, not, or, out, pop, popf, push, push, puchf, rcl, rcr, rep, repe, repne, repnz, repz, ret, rol, ror, sahf, sal, sar, sbb, scasb, scasw, shl, shr, stc, std, sti, stosb, stosw, sub, test, wait, xchg, xlat, xor

There are also some undocumented opcodes that has no mnemonics named after them. For example, 0x0F while executed by most 8086-processors could be translated to "POP CS". Other processors in the x86-family may not interprent undocumented opcodes as earlier processors does. Therefore, use of undocumented opcodes might render your program useless in future x86-processors.

[Top]

The real mode addressing model

This is quite simple, but is quite controversial amongst programmers. The x86 architecture uses a process known as segmentation to address memory, and not a linear method as used in other architectures. Segmentation involves decomposing a linear address into two parts - a segment and an offset. The segment address points to the beginning of a 64K group of addresses and an offset from the base address of the specified segment. To translate back into a linear address, the segment address is shifted 4 bits left and then added to the offset. The formula looks like this: segment*0x10+offset.

In real mode, two registers are used for a memory address: one to hold the segment, and one to hold the offset.

For example, if DS contains the hexadecimal number 0xDEAD and DX contains the number 0xCAFE they would together point to the memory address 0xDEAD * 0x10 + 0xCAFE = 0xEB5CE One quick way to do this without a hexadecimal calculator would be to just add a zero to the hexadecimal number in the segment register and then add the content of the offset register to that number. The above would be 0xDEAD0+0xCAFE.

In referring to an address with a segment and an offset, the notation of segment:offset is used, in the above example, the linear address 0xEB5CE can be written as 0xDEAD:0xCAFE, or if one has a segment and offset register pair, DS:DX.

There are some special combinations of segment registers and general registers that point to important addresses:

[Top]

The PC memory layout in real mode

0-3FF IVT (Interrupt Vector Table) 400-5FF BDA (BIOS Data Area) 600-9FFFF Ordinary application RAM A0000-BFFFF Video memory C0000-EFFFF Optional ROMs (The VGA ROM is usually located at C0000) F0000-FFFFF BIOS ROM

That means that we have 640kB of application RAM.

Everything above 0xFFFFF is called the "high memory area".

[Top]

Interrupts in real mode

The x86 architecture is an interrupt-driven architecture. This means that hardware or software can present the processor with requested data, instead of the processor waiting for a device to respond.

There are two kinds of interrupts: software and hardware interrupts. Software interrupts is often used to talk with the operating system. A typical software interrupt is interrupt 0x21 (the DOS-interrupt, nearly all DOS system functions are accessed via this interrupt) and int3 (breakpoint, which is often used to enter a software-debugger). A typical hardware interrupt would be when some external circuit decides that it need attention from the CPU, like when the system clock ticks. The 8259 chip is used to map different IRQs into ordinary interrupts. There are two 8259 chips in a PC, 8259A and 8259B. If the 8259A chip is mapped into interrupt 0x20 to 0x27, the every time the system clock ticks the interrupt 0x20 would go off.

At the very beginning of the memory lies the Interrupt Vector Table (IVT). The IVT contains pointers to all the Interrupt Service Routines (ISR's).

The pointers to the different ISR's wired to the interrupts are saved in this format:

[offset_0][segment_0][offset_1][segment_1][... ...][offset_255][segment_255] (each integer (that is: the offset or segment-pointers) is 16 bits wide)

There are 256 different interrupts, each with its own pointer.

[Top]

Example

This NASM-assembler program is an example of real mode code that prints "Hello!" to the screen by means of writing directly to video.

[org 0x100] [bits 16] [section .text]
mov ax, cs  ; cs = code segment mov ds, ax  ; ds = cs  ; (this way, we dont have to care much about where our data is located) mov ax, 0xB800  ; 0xB8000 is the base of the text video memory mov es, ax  ; Remember the memory model! mov si, text  ; Remember that ds:si -> es:di xor di, di  ; a xor a is allways zero. (di is given the value 0)
around: mov al, [ds:si]  ; give al the value of what ds:si points to cmp al, 0  ; compare if al contains zero ("Hello world!",0) je stop  ; if so, stop writing to the screen mov [es:di], al  ; move the content of al to es:di (text video memory) inc si  ; select the next byte in the Hello world!-string add di, 2  ; and goto the next position on the screen. jmp around  ; and go back to the beginning of the loop stop: ret  ; and return back to the caller function
text db "Hello world!",0

This program could be compiled into a DOS-compatible .com-file, it is also quite possible to assemble it to any other operating system running in realmode, or even no operating system at all, but you might need to make some minor changes in such cases. Because it does not make use of the screen-functions that is provided by DOS or the BIOS, the text that the program prints to screen will disappear when the program is terminated and other programs write to video memory.

[Top]

See also





  View Live Article   This article is from Wikipedia. All text is available under the terms of the GNU Free Documentation License