You are on page 1of 116

While learning ASM, I found many tutorials to be very confusing, and did not cover assembly in the detail

that's necessary for such a complicated programming language as this one. So, I write this rudimentary tutorial in order to ease the pain others may have learning ASM. The problem with most beginner level tutorials is that they assume the reader has previous programming knowledge in one language or another. While I'll make comments that draw connections between programming in BASIC and ASM, i hope to write this is such a way that you can skip these remarks without affecting your learning, therefore making this a completely newbielevel tutorial. First off, i believe it very difficult to learn programming without programming as you learn. So, i suggest you have a copy of TASM, a necessary utility for writing assembly programs. Also before you start, it's important that you understand about hexadecimal + binary.

2.1 - Introduction to programming


[Those with programming experience in any other language may want to ignore this section] So what is programming anyway. Well, the basic idea is that a computer program is made up of a bunch of "instructions" that a computer follows. For the most part, a program is made by typing in a bunch of instructions that make much more sense to us than they do to the computer. Then, they are translated, "compiled" or "assembled" into a program that the computer can understand. This is why you need to download an install the software higher up on this page. For our means, we can type these commands into a simple, standard text editor such as "notepad". Actually, this is preferred - if you use a more advanced program like Microsoft Word, you'll have to make sure that you save it as "text only". So, if you can, use Notepad. It's standard with all versions of windows.

2.2 - Your first program


Open up notepad, or whatever you happen to have decided to type with. For a start, your programs should always have this skeleton
.MODEL SMALL .STACK 200H .CODE START: END START

That is, all your programs should include these lines. Your whole program will go in lines between "start" and "end start".

It's very important that if you copy these lines into your file instead of using Copy+Paste, notice the periods at the beginning of the first few lines. And, notice the colon after START. Even the smallest dot is a very important piece in programming so never overlook them. Now, start and end start don't really mean much to a computer. But, to use, start is the beginning of something. And end start doesn't make a lot of logical sense to us, but that's how it goes, so just grin and bear it. End Start tell where the end of the part of the main program is. But right now, our program does absolutely nothing!. So, we may want to learn about the different commands we can use in assembly.

2.3 - Interrupts
We can write a very simple program that puts just a character of text on the screen using just "interrupts". If you're familiar with any higher level languages, you can think of interrupts as essentially commands. Interrupts each have some complicated operation(s) they perform, and all they require is that you give them a small amount of information. In this case, we'll be using an interrupt that can put text characters on the screen. Because one interrupt may have many other functions it can perform, we must tell it which one to do. Then, we give it the required information, and tell it to do whatever it may do. We can therefore do very complicated operations while being totally oblivious to how they work. Here is our example program
.MODEL SMALL .STACK 200H .CODE START: Mov ah, 2 Mov dl, 1 Int 21h mov ah, 4ch mov al, 00h int 21h END START

It may seem like a collection of completely arbitrary words and numbers. Only at first. We soon realize that it is a very concrete concept. Every part along the way does its important part. The tiny pieces of code result in one big program that does exactly what we expected. Here's a breakdown, line by line, of what the program does 1. We put the number 2 in a specific location in the computer's memory. Later, the computer will look at this number and, in this case, this number tell which "function number" the interrupt should do. As mentioned before, most interrupts can do a variety of functions. So, we must tell it which one to do. In this case, we want the DISPLAY OUTPUT function. This

is function number 2. So, we put the number 2 in a specific place, just waiting for the computer to look it up later 2. We put the number 1 in a different specific place in memory. We've already specified that we want to use function 2 of the specific interrupt, which is DISPLAY OUTPUT. But what should it display. Well, different characters of text have different number codes assigned to them (this is unrelated to the base-whatever numbering stuff we talked about earlier, just to let you know). This code is called "ASCII" So, if we're going to be displaying text, we should specify what text. The number 1 in the ASCII code happens to correspond to a little smiley face. After all of this, we've so far established that we want the computer to do some text displaying; The text we want to display is a smiley face 3. The two pieces of information we gave the computer would be worthless if we didn't do something with it. In this 3rd line, we tell the computer to use interrupt #21. As soon as this happens, it looks at the place in memory called "ah" and sees whats there, because it must know which of it's numerous functions it should do. It ends up figuring out that it should display text, and ultimately, it should display a smiley face. Note that there's an "h" after the number 21. If we put an h after a number, it means that the number is not 21 in decimal. It's 21 in hexadecimal. Remember, don't think that this means 21 in both cases. Think of this hexadecimal number as "Two one"; And, if we convert it to decimal, we find that it's 33. But, it's common programming practice to use hexadecimal when referring to interrupts, rather than their decimal equivalents. So, unless you're a devout non-conformist, make it easy on yourself and think of this as "Int twenty 21 h" not "Int thirty three". It'll make it much easier for you to communicate about assembly, as everyone else calls interrupts by their hexadecimal numbers. 4. The final commands end the program. This is necessary at the end of the all your programs, unless you want awful things to happen. If you forget this, random effects, that will more than likely freeze up the computer, will result. There we have it! We've effectively written a program doing exactly what we expected from the outset. A couple things you should note Firstly, the blanks line are just my style of separating code to make it easier to read. The assembler, which we'll explain using in just a second, doesn't care one way or the other if there are blank lines, as long as they don't actually hurt the code in some way: They generally don't. you can take them out if you don't like - You can add more at your will - It doesn't matter, because the only important part is the code; The commands involved in our program. Secondly, in "Mov ah", ah is not "10" in hexadecimal. In this instance, it's the name of a place in memory. It's just a coincidence. More is explained in the next section.

2.4 - The Registers


In the last section there was a strange unexplained part of it. Primarily, these two lines:
Mov ah, 2 Mov dl, 1

First, let's explain "Mov". It appears to be shorthand for the word "Move". This makes a lot of sense. This command, unlike an interrupt, does a tiny, simple command. However, it's a very important instruction in ASM. The first one takes the number 2 and "Moves" it into a place the computer explicitly calls "ah". So we can deduce that the next command moves the number 1 into a place called "dl". It does. The MOV instruction can be used in other ways. For example, we could say
Mov ah, dl

The computer would take whatever is in dl and move it into ah. Well, to say "move" is misleading, because it's not actually moved. Whatever is in dl stays there. But now, it's also in ah. Likewise, this would work
Mov dl, ah

So what are ah and dl anyway. We know from many previous mentions that they're specific places in memory. They're called registers. The ones we're mainly concerned with right now are AX, BX, CX, and DX. They're made up of two 'pieces' each - hence, smaller registers. Ah, that we've already encountered, is one of the parts of ax. Ax also has another part called al. the h and l in ah and al, mean "High" and "Low". They make up the higher and lower parts of the register ax. For example, if we did this:
Mov ah, 1 Mov al, FF

Then, if we looked at what is in ax, we would see it contained: 01FF Why? Because the "High" part contains 1, or 01. and the "low" part contains FF. So, combined into the bigger register, they make 01FF So, we conclude that many registers, or at the least the ones we care about are made of 2 smaller parts. And, to find their values, we combine them (Don't add them, though: 01 + FF = 100, not 01FF)
ah bh ch dh + + + + al bl cl dl = = = = ax bx cx dx

One final thing to mention - al, bl, ah, bh, and so on, can each have a value of 0 to 255. So, when combined to make ax, bx, and so on, the total value possible for those is 0 to 65,535

2.5 - Compiling our programs


We left off part 2.3, with a finished program. But, we never actually made a program out of it. Well, to make a program is really quite simple: First save your program, as something like "First.asm". Then, go to the folder where you have TASM and type this into the address bar: >Tasm.exe First First.obj As long as your program has no problems, this will make a file in the same directory called "first.obj". Then, type this into the address bar >Tlink First.obj Finally, this will make a program called "First.exe"! Hoorah! Our first successful compile! (hopefully). Now, click on it to run it. If you have problems seeing it run because it opens and closes itself too fast... well.... enjoy!

3.1 - Memory
True, MOV, interrupts, and registers are very important, as you just read. However, there's not a whole lot that can be done using only them. To move on, we'll need to understand a little bit about the computer's memory. And to do this, we also need to just know about memory in general. We'll first start by how memory is divided up. This can become quite complex, so just read through slowly, and go back over it if something confuses you. Basically, a computer's memory is a piece of circuitry; most of the time many pieces. It has small points in the circuits called "transistors" that can either have an electric charge of 5v, or no charge. The millions of these that the computer has is where is stores everything. Taking into account our previous knowledge of binary, we remember that in binary a digit can only be either 0 or 1. So, we could think of either a transistor with a charge, or one with no charge, as the same as 1 and 0 in binary. This turns out to be true. 1 and 0 represent each transistor of memory. Each transistor is called a "bit". This is short for "BInary digiT". Well, hexadecimal is also important in our discussion. You see, if we wanted to look at memory and it was all in binary form, it would be very cryptic 1010111010000111001100011100011100111110.... and so on. So, to make memory easier to read, we can read it in hexadecimal numbers Well, recall that in hexadecimal the highest digit is F - which has a decimal equivalent of 15. In binary, that would take up four digits to show:
1111

is the same as F in hexadecimal.

Since our previous unit was called a "Bit", to keep in the same naming theme, 4 bits are called a "Nibble". Then, it just goes up from there.
8 bits = Byte 2 Bytes = Word 2 Words = Double Word (DWORD for short)

Then, for really big numbers, there's these:


1024 1024 1024 1024 1024 1024 bytes = kilobyte (KB) KB = megabyte (MB) MB = gigabyte (GB) GB = terabyte terabytes = petabyte petabytes = exabyte

Terabyte can be TB, petabyte PB, etc, but these are not in common use. As of 2007, terabyte is only starting to come into common use as harddrives get larger. In any case, we just need to deal with the terms "bit", "byte", and "word". They're the ones that'll come up most often in low-level programming As I mentioned briefly in the last section, the registers ah, al, and so on could only have a maximum value of 255. This may seem arbitrary at first - why not 999? Because, they can only hold one byte. One byte is 8 bits, and the highest number we can make in binary with 8 bits is this
11111111

So when we put together ah and al, the highest number is 65535. Why? Well, each register can hold 1 byte, or 2 nibbles, or 8 bits - it's all the same amount. So, with two registers we have 2 bytes, or 4 nibbles, or 16 bits. Assuming we made the highest possible hexadecimal number with 4 nibbles, it would look like this:
FFFF

Punch that into your computer's calculator and convert it to decimal, and, surprise surprise : It equals 65535

3.2 - Addressing
In order to use the computer's memory - e.g. store numbers, text, etc - we have to understand how the computer goes about organizing it. It does this by something called Segments and Offsets. These are used to communicate between ourselves and the computer, where things should be put and where they should be gotten from. Whenever we want to read or write to memory, we must use numbers pointing the the exact location of the BYTE we want to read; Specifically, 2 numbers, called the Segment and Offset. Usually these two numbers are WORDs (16 bits each). One points to the general area of memory,

the "Segment". And the other, how many bytes into that segment, known as the "offset". This way, we can use up to 64KB of memory at once (65536 bytes) For example, say these numbers (hexadecimal) we stored somewhere in memory:
00 AB D2 AC 98 4E 67

and so on.. Now, say we wanted to read those numbers. Well, the computer has millions of bytes of memory - so we must have some way of specifying what part of memory they're in. This is called their "Address". For a real life address with the street name and the number. Essentially the street name is what "part" of the city you live in. We do the same with computer memory, but both are numbers. So, that data above may be located at:
FE00:0000

FE00 would be the "segment", or part. And 0000 would be how far in the data starts. So the address FE00:0000 would "point" to the hexadecimal number 00. In that case, FE00:0001 would point to the hexadecimal number AB. FE00:0002 points to D2, and so on. Bear this in mind as we cover just one more section before making use of what we now know about memory.

3.3 - The Register DS


The registers we've covered: AX, BX, CX, DX, and their smaller parts, are all called General Purpose Registers . There is another kind of registers which are called Segment Registers In this case we're discussing DS - not to be confused with DX - which stands for "Data Segment". Segment registers are used, not surprisingly, to point to segments of memory. They aren't usually used for holding data like the general purpose registers are. Going back to the previous sections; say we wanted to print some text to the screen. We learned one method, but that would require that we print each individual character one after another! There's a better way. There is another function of Int 21 that will print an entire string (a string is a bunch of text characters one after another). For the sake of simplicity, say that the text we want is stored at FE00:0000. This program will allow us to print it out to the screen
.MODEL SMALL .STACK 200H .CODE START: Mov Mov Mov Mov Int ax, ds, dx, ah, 21h fe00 ax 0 09

mov ah, 4ch

mov al, 00h int 21h END START

So, what does this program do? Well, first, it puts fe00, the segment of the text, into AX. We use the MOV instruction to do this, which in this case 'moves' (or rather copies) fe00 into the ax register. But, we wanted ds to have the segment. Well, that's one quirk of the segment registers - you're not allowed to change them directly. So, you can't just put a number right into DS. You can, however, put another register into them. So, we put the segment number first into ax. Now, we move it into ds Then we put 0 into dx. For this interrupt, it requires that we have the segment of the text in DS and the offset in DX. Since the offset is 0, we put 0 into dx. Next, we put 9 into ah. Since int 21 has a lot of different functions it can do, we must specify which on we want. The one to print text, by specifying a segment and offset, is #9. Finally, we use int 21 again, but this time to end the program. In theory, this is just great. But, memory doesn't work like that. We generally don't just put whatever we want, where we want. At least not at this stage. For example, when you run a program like this one (if you were to compile and run it, which i don't recommend), the computer picks out a free space in memory to load the program itself. You don't specify this. So, what have we accomplished then with segments and offsets if we can't use them? We can, as you will see.

3.4 - Variables
DS was important to introduce in the previous section, because when you write a program, you can have things called "variables". And whatever you put in these variables is usually put in the "Data Segment", which is what DS points to. When the compiler/assembler is done changing your program into something the computer can actually read, it doesn't actually use variables, but they make life a lot easier for programmers. So, what are variables and how do we use them? A variable is where you can store data. Strings (text), numbers, etc. They're called variables because, well, they can vary. Not only can they contain a number or something like that, but you change them as much as you need during your program. This makes programs much more versatile and useful. For example, let's rewrite that last program so that it does actually work
.MODEL SMALL .STACK 200H .DATA

This is a new part! Make sure to include it


Textstring db "I'm a string$" .CODE START: Mov Mov Mov Mov Int ax, ds, dx, ah, 21h SEG Textstring ax OFFSET Textstring 09

mov ah, 4ch mov al, 00h int 21h END START

Wow. A lot of things to explain here. Let's start from the top downward. You'll notice there's a new part that should be included in the beginning. The part called .DATA declares what variables we have. As always, the period in front of DATA is very important. Also, make sure that .DATA comes before .CODE, because .CODE says that everything after it is part of the code. Again, we put the segment into ax first, since we can't move it straight into ds. Once very convenient feature of the assembler is that we don't have to figure out the segment and offset that our variable is at; Which is good, because as we said, the computer decides quite randomly - it would make it tough to find where our variables are in memory. So, by saying SEGMENT Textstring, we move the segment of that variable into ax instead of what's actually in the variable. The same for OFFSET Textstring. It puts the offset of the variable textstring into the register, instead of the actual variable. One more unexplained part - What's with that line after .DATA?
Textstring db "I'm a string$"

Well, Textstring is the name of the variable - we must specify the name we want to call the variable first. Next, db, stands for "Declare Byte(s)". It can either be used if we want our variable to be one byte long, or multiple bytes. In this case, it's multiple bytes, because each character of text takes up one byte. Finally, we tell the compiler what we want to be in the variable. This can be changed by your program, but we just tell what we want it to start at. One more little detail of int 21, function 9 is that the text you're printing must have a dollar sign at the end. It doesn't actually print a dollar sign on the screen, it just indicates where the text ends.

Go ahead and compile and run this program. Unlike the last one, it should work.

3.5 - Special Segments


We know that all your variables, and all the code that makes up your final program reside in memory. Variables usually in the Data Segment and code, surprisingly enough, in the Code Segment [btw - just like the register DS points to the Data segment, CS points to the code segment. But, let's not worry about that right now]. Well, there are other various specific segments of memory, though they don't have registers associated with them. One of these segments of memory holds all the data for what's stored on the screen. Depending on whether you're using only text, or using graphics, the location may change around. For simplicity right now, we'll talk about text. Our last program printed text on the screen using an interrupt. In all our previous programs, we didn't specify what screen mode we wanted to use. So by default, it uses a text screen mode. This means that the screen is set up then so you can only put text onto it. So how do we specify the screen mode? There's an interrupt that'll change the screen mode for us. Take a look at this program:
.MODEL SMALL .STACK 200H .CODE START: Mov ax, 0003h int 10h mov ax, 4c00h int 21h END START

You may notice this is a bit different than how the previous programs used interrupts. Here, put a value only in ax. The change screen mode function is function 0 of interrupt 10. So, to use it, we must put 0 in ah. And this function requires that you put the screen mode you want in al. Since ax consists of ah and al, i just moved a value straight into ax. Now ah should contain 00, and al should contain 03. Therefore, we'll call the screen mode function, and change to screen mode 3. Screen mode 3 however, is the default screen mode, so this doesn't accomplish much. Now that we're absolutely sure we're using the screen mode we want, we can write stuff to the segment where what's on screen is stored. This segment has an Absolute Address. This means that, unlike variables that may change around their address every single time you run a program, an absolute address is always in the same place. The only catch in this case is that with the screen mode, it's a different absolute address for different screen modes. For the screen mode we're using, the segment is B800 [That's hexadecimal of course]. As an example, say we had run that previous program that prints "I'm a string" on the screen. The letter "I" would be stored at B800:0000, or offset 0 in segment B800. Actually, it would be a number code for the letter I. Every letter on the keyboard, along with numerous other things, have a

numeric code assigned to them. We saw this in an earlier example that put a smiley face on the screen - It's code was the number 1. Well, the code for the letter I is 73. Not to be confused with a lowercase i, which has code number 105. It would be difficult to remember everything in this code all 256 of them - so just look at this chart:

Anyway, the code for "I" would be at B800:0000 - 73 [49 hexadecimal] The code for apostrophe would be at B800:0002 - 39 [27h] But why isn't the code for apostrophe at B800:0001? It is only one byte long after all. And therefore being the second character on the screen it should be the second byte. The answer is that text can have different colors. And after each byte containing a numeric code for a character, there's a byte with the numeric code of what color that character should be. Since by default the print string function of int 21 prints with the color grey, B800:0001 should have the number 7 stored at it - 7 is the numeric color code for grey. Now let's put all this information to use in this next program:
.MODEL SMALL .STACK 200H

.CODE START: mov ax, 0003h int 10h mov bx, 0b800h mov es, bx mov bx,0 mov ah, 1 mov es:[bx], ah mov ax, 0100h int 21h mov ax, 4c00h int 21h END START

The first two lines are two change to the text screen mode. We covered this further up on this page. From here, it gets a little complicated. We want to put text on the screen, so we want to put things into the segment B800. And the top left corner of the screen, which is essentially the 'beginning' of the screen, is stored at offset 0. We're gonna need to get a Segment Register to point to the segment we want. We're going to use ES, which is the 'extra segment'. I'm not sure exactly, but i don't think it's used for anything specific - i think it's just an extra segment register used to point to whatever segment you want, unlike DS and CS which point to your data and your code. But since you can't move numbers directly into the segment, we must put it into a register first. So, we put it into bx with:
mov bx, 0b800h

For one reason or another, you have to put that first 0 on there. Rest assured, that does actually mean B800 - It sort of drops that first 0, but it's required. After the segment's in bx, we put it into ES. Now it gets a little bit tricky:
mov es:[bx], ah

What exactly does this mean. Well, this is another way that we can move things around in memory. Firstly we know that a segment and it's offset are separated by a colon, so it must have something to do with a segment and offset. ES, since it's on the left side of the colon, is the segment. So instead of saying B800, we can actually put B800 in a register, and tell the computer to look at what's in the register to find what offset we want. Then so far we've deduced that we're trying to put a number at the segment represented by whatever is in es. Since [bx] is after the colon, it must be the offset that we're moving to. This is similar to what we did with ES. But for registers that aren't segment registers, we must put them in brackets - [] - to specify that we want to use them as an offset. This is a "Mov" instruction, so if we just left bx without

brackets, we'd be saying we actually want to put something in bx. So, two lines before this was this line:
mov bx,0

The offset of the upper-left corner of the screen is 0. And if bx is acting as our offset, we should make bx equal 0. So all in all, we now know what this command means:
mov es:[bx], ah

It means 'move' what's in ah to the segment and offset that es and bx point to. And ah contains 1 this is the numeric code for the smiley face character of text. These next 2 lines are also new:
mov ax, 0100h int 21h

This interrupt waits for you to hit a key. That way you can have a chance to see what happens when you run the program.

4.1 - Loops & Line Labels


Line Labels are a very simple idea. When you use a label, you give a name to a specific part of the program. As you'll see later, you can use this name to jump around in your program - In fact, it'll also come in useful right now. What if in the last program you didn't want to print just 1 smiley face on the screen. Say you wanted to print 100. Well, it would be a very long program, because you would constantly have to change bx - you'd have to add 2 to it every time you wanted to print the smiley face in a different place. This isn't necessarily true. By using what's called a "loop" we can print those 100 smiley faces by only adding a couple of lines of code. Let's see what this new program would look like:
.MODEL MEDIUM .STACK 200H .CODE START: mov ax, 0003h int 10h mov bx, 0b800h mov es, bx mov bx, 0 mov ah, 1 mov cx, 100

startloop: mov es:[bx], ah add bx, 2 loop startloop mov ax, 0100h int 21h mov ax, 4c00h int 21h END START

This new program makes use of the 2 new things we're learning here. Firstly, startloop: means that that line in the program is called startloop. Like a variable, you can call a label almost anything you want. Make sure that you're aware of the colon after it - this is what clarifies for the compiler that it's a label. What a loop basically does is does a set of instructions over and over again. To make a loop we start with a label. This will be the start of the loop (BTW: A label doesn't always have to start a loop, it can just be a label for a part in your program. But in this case, it does start the loop). Then we put the instructions to be repeated on the lines after the label. When we've typed all the lines that should go in the loop, we need one more line to close the loop. This is the command LOOP. Notice that here it says Loop startloop. We must tell where we want to loop back to. Since startloop is the beginning of the loop in this case, then we should use Loop startloop. We probably don't want the loop going on forever, so there must be some way to specify how long the loop lasts - there is. CX is used as the loop counter. Before the start of the loop, you must put a number in cx. Then, every time your program runs across the command LOOP, it subtracts one from cx before looping. If cx is 0, then the loop ends. That's really all there is to the loop command. Now, just one more thing we added to this program. Inside the loop is this:
add bx, 2

This pretty much explains itself - it adds the number 2 to bx. Recall that bx will point the offset 0 at the beginning of the program. And this loop is going to be done 100 times. We don't want it to put the smiley face character at offset 0 100 times; By adding 2, bx points to the offset of the next character on the screen.

4.2 - Doing something useful: Graphics


Up until this point, we've used only the text mode for output. This is all well and fine for learning purposes but not particularly useful. So now it's time that we used one of the screen modes suited towards graphics. This is mode 13h. It has a resolution of 320x200x256. That means 320 pixels wide, 200 pixels tall, and 256 colors on screen at once. Though not great, it can do some pretty nice graphics. It's a start anyway. DOS has some interrupts for dealing with graphics, but there's no point in using them because drawing pixels to the screen is very easy. It's very much like the last section.

In the previous section, the screen started at offset 0 of segment B800. Likewise, the segment for mode 13 starts at offset 0 of segment A000. Also in the previous section, we potentially had to write 2 bytes per character. In this mode, each byte of data written to the screen will draw only one pixel [a dot]. So, all data is only a color - because, a dot always looks like a dot, there's nothing else to store but the color of the dot. Let's see an example program for drawing some pixels:
MODEL MEDIUM .STACK 200H .CODE START: mov ax, 0013h int 10h mov bx, 0A000h mov es, bx mov bx, 0 mov ah, 1 mov cx, 64000 startloop: mov es:[bx], ah inc bx loop startloop mov ax, 0100h int 21h mov ax, 4c00h int 21h END START

Surprised? It's almost the exact same as before but with minor changes for mode 13. Now we use
mov bx, 0A000h mov es, bx

because the screen starts at segment A000. Also, the loop counter has been changed to
mov cx, 64000

That's because this program is intended to fill the whole screen with dots. Since there's 320x200 pixels, do the math: 320 * 200 = 64000 [note that * is used as a symbol for multiplication. FYI, when typing * usually denotes multiplication, / for division, and ^ for exponents: 2^3=8, and so on...]. Then, the loop itself is much the same. Move a byte to A000, add to bx to go to the next offset, loop again. Notice that
inc bx

is used in place of
add bx, 2

because we want to put a byte in every single offset, since every single offset corresponds to a pixel. Inc bx then, adds only one to bx. INC can be thought of as INCrememnt or even INCrease, if it helps.
add bx, 1

would have been valid here too, but i think inc is faster, and it's just good programming technique to do the more logical thing. Likewise, this would work:
inc bx inc bx

in place of the add bx, 2 in the previous example, but why when you can just use add!? Well, i know that that blue screen is ultra exciting. Let's try drawing the whole screen, but with all 256 colors at once to make it more interesting. Simply add this after inc bx:
inc ah

So, first time through the loop we draw a blue pixel. Then we move to the next pixel and draw one with color 2, the next with color 3, and so on. I'm fairly sure that when you use an inc for a register that's already at it's max value - hence, using inc when ah = 255 - that it loops back around to 0. This is how it turned out for me, anyway. You should see a colorful pattern on your screen when you run this.

4.3 - A faster way


Now we'll look at a quicker, cleaner way at filling the screen with pixels. We do this with the command STOSB. Stosb is used for exactly what we did in the last program - storing bytes at a location in memory. To use it, we must first set where to store the bytes. This is stored in ES:DI. Then it's just a matter of putting a value in al and calling STOSB.
START: mov ax, 0013h int 10h mov bx, 0A000h mov es, bx xor di, di xor al, al mov cx, 64000 Startloop: stosb

inc al loop startloop mov ax, 0100h int 21h mov ax, 4c00h int 21h END START

Well, same exact thing, but faster (i believe). It's not too noticeably faster, but when used over and over as part of a program it would pay off. This tutorial, as you can maybe tell, is leading towards the parts of ASM programming that will help you design games, which is mostly what I use it for, so it's the easiest for me to write about. Next, we'll get into drawing "sprites". A sprite is basically just a little image, like a person, enemy, ship, etc depending on what kind of game it's in. Our sprites will be stored in files and loaded in, so we'll need to load them in. This means learning how to open and read files.

4.4 - Graphics from file


Before we go onto to getting graphics from a file, let's try saving graphics in a file. Files aren't too hard to use. First we must open a file. Then, of course, we'll need the filename to do that. From there opening the file is just a matter of passing a few things to an interrupt. So to start, this should be in the DATA part of the program:
Filename db "spryte1.grh",0

Recall that db can be thought of as declare byte(s). Each character is a byte, and so is the number zero at the end. 0 is used as a terminator for the string. Much like printing text needs a $ at the end of the string, a filename must end in a 0 or 'null' character - It's therefore referred to as 'null terminated'. Notice that that line is NOT
Filename db "spryte1.grh0"

When the 0 is inside the quotes it becomes text. It's no longer a terminator because instead of being 00h in hex, it's 40h. Now it's part of the filename instead a terminator of the string. For simplicity, I have a sprite made for our program to load onto the screen. Download it before moving on. Our program so far should look like this:
.MODEL MEDIUM .STACK 200H .DATA Filename db "spryte1.grh",0 .CODE START:

mov ax, 0013h int 10h mov ax, @data mov ds, ax

Next, let's open the file. First, add these two lines to the data part:
Filehandle dw ? Filebuffer db 256 dup (?)

When we open a file, it'll give us a number called a 'handle'. This way, whenever we want to read, write, etc with the file, we just use the number associated with the open file instead of giving it the entire filename again. We can 'open' many files at once, meaning we have access to them and no other programs do. So, we technically could have many different handles, one for each file. For now though, we only need this one. Filebuffer is where the contents are stored. We have this variable in order to actually load the file, making it quicker and easier to look at it's contents, rather than reading the file every time we need data from it. You may wonder though what 'dup' is. Dup can be thought of as DUPlicate. Since we want a 256 byte long chunk of memory, we would normally have to write: Filebuffer db 0,0,0,0,0,...... and so on, 256 times. Well, dup says duplicate the byte (in this case we don't specify exactly what value, we just put a ? to say that it doesn't matter, and the assembler i think will reserve the space leaving whatever used to be there) 256 times. Notice that the 256 comes before DUP, and the value to DUP after it in parenthesis (). So, we have a place to but the handle and data, so let's get to it and open the file. The interrupt for opening a file is again int 21h, and it takes a few parameters. AH = 3Dh specifies that we want to open a file AL = the mode to open it in. Read only, write only, or both read and write. We'll make AL=0, meaning read only. We can't change it while it's open for read only, but that's okay because we only want to load it. DS:DX = Seg and offset to string holding the filename, an idea we're familiar with already. So, just a few simple lines to open it:
mov mov int mov ax, 3d00h dx, OFFSET filename 21h filehandle, ax

And it 'returns' the file handle to ax, meaning that it puts it in a register after it's done. We don't get to pick the handle, it decides for us. So, we just move ax, which now contains the handle, into our variable filehandle. Since we'll obviously need to use ax many more times throughout our program, the handle can't stay there. Next, it's just another interrupt to read from our newly opened file. AH = 3Fh specifies we want to read from the file. BX = Handle. We must put the handle here to tell which file to read from CX = how many bytes to read. in this case, 256 (16 pixels across, 16 down, 1Bpp - byte per pixel)

DS:DX = seg + offset of place to load to. So, just another little segment of code:
mov mov mov mov int ax, bx, cx, dx, 21h 3f00h filehandle 256 OFFSET filebuffer

Now that it's loaded, we'll introduce you to something very similar to what we just covered. It's a command called MOVSB. It's like STOSB, but it MOVeS Bytes around in memory. So, we'll move from the 'buffer' to the screen. To use this, we give an address to move to, and one to move from. DS:SI points to source, or where to move from. ES:DI will point to the destination, or where to move to. So, we'll point DS:DI to Filebuffer, and ES:DI to the screen:
mov mov mov xor si, ax, es, di, dx 0a000h ax di

The first line is just a little short cut, because DX still contains the offset of Filebuffer from before. Ds already points to the right segment, no need to change it. xor di,di points di to the first pixel of segment 0a000h, the screen by making it 0. This is also a shortcut to
mov di, 0

I know I haven't introduced the stack, PUSH, and POP, but just bear with it because they're necessary for this. Sometimes it's better to omit a few smaller details until later in order to move to the bigger stuff quicker. For now, just think of PUSH as saving a register temporarily, and POP as getting that value back. Things that are PUSHed go onto the 'stack', and they're 'POPped' off of there as well. I'll go into detail of this later... Anyway, this bit of code is a bit tricky:
mov cx, 16 startloop: push cx mov cx, 16 rep movsb add di, 304 pop cx loop startloop

Our loop draws one line, so we start by setting the loop counter in cx to 16. Then inside the loop we need to use cx again as a different counter, but it's already the loop counter! Well, we use PUSH CX to save it on the stack. Now, we can get back what value it had before the end of the loop so the loop will work. We want to move 16 bytes (one line) from the buffer to the screen. We set CX again to 16, and use REP MOVSB. What is rep? It means, "repeat the next instruction however many times CX says to". Now this is a little tricky as well: Each time we do MOVSB, it moves a byte, and automatically increases DI and SI by one. So, after 16 times, DI has changed by 16. There's 320 pixels in a row, and we want to point it to the next row before we draw it, so let's do the math: 320 -

16 = 304. By adding 304 to di, we point it to the first pixel of the next row down. The final command POPs CX back, making it have whatever value it had last time it was pushed. You'll notice though that it's POPped, then we loop and immediately PUSH it again! What good does this accomplish? Well, be sure to remember that when loop, CX is decreased by one. At the start of the loop then, it's pushed as 16, then we pop it, still 16, and loop. It's decreased before we loop, and pushed as 15, and so on and so on. REP also decreases CX by one every time it REPeats the instruction, that's why we must store and retrieve CX - it needs to be used as two separate loop counters. If we didn't PUSH it, the first time through CX would be 0 at the end of the loop because REP brought it down to 0. So, finally, we have just these lines to wait for a key allowing us to see the sprite, and then to exit:
finish: mov ax, 0100h int 21h mov ax, 4c00h int 21h END START

And that's it! When you run it, you should see a little Mario sprite in the top left corner of the screen. However, there's something wrong with him. His colors aren't right. Well, that's another topic VERY VERY important to graphics called the 'Palette'. And that's what we'll discuss next.

4.5 - The palette


Mode 13h is called a 'paletted' graphics mode. That is, you can use 256 colors at once but there's something like 4 million possible colors (4,144,959 actually). When you write a byte to the screen, you're only telling which number of the palette that pixel is, not what the pixel actually looks like. What the color looks like is something saved in the palette. So, how do you tell what a pixel/color looks like. Well, you mix 3 basic colors: Red, blue, and green. Things which emit their own light such as the electron gun in a TV or monitor have base colors red, green and blue, while other natural [reflected?] light has red, blue, and yellow. That's just how it happens. So, you can think of defining the look of a color as mixing colors, much like you probably did early in school with paint or something. So, it's quite easy. But you must be familiar with yet another command: OUT. This command puts a byte out to a 'port' in your computer. We need this because your video card, which has control over the palette, has many different ports you must interface with from time to time. You can't just write bytes to them since they're ports, you have to use OUT (and to get stuff from a port, use IN) But, why must we even use the palette at all? Well, sometimes the default palette doesn't have the exact shade of color you want, which there's a good chance of since there's only 256 in it. So, you just mix your own values of red, blue, and green and make the exact color you want. The sprite in our last program was saved as an image file through windows, then i saved it in the file we used which has no junk in it like a "bitmap" does. Just the pure bytes that make the image. The problem is, the colors in that image didn't look the same as the original bitmap, since MS Paint in

windows uses one palette, and our DOS based program uses another. However, I did happen to save the palette and put it in a file. This way, we can make the colors in our program look the same as the original image, so that it's loaded correctly. Before diving into the big main program, let's go over some simple palette stuff: For simplicity's sake, let's just change the look of one color. We'll change color 0 (by default, completely black) to white. This is just a couple of out commands:
.MODEL MEDIUM .STACK 200H .CODE START: mov ax, 0013h int 10h xor al, al mov dx, 3c8h out dx, al inc mov out out out mov int mov int dx al, dx, dx, dx, 63 al al al

ax, 0100h 21h ax, 4c00h 21h

end start

To use OUT we give it a port and a value. The port can be in DX, and the value in al - no other registers will work. They don't have to be in registers though. To write to the palette, first we send a byte to port 3c8h telling it which color we want to change. We did xor al,al because we want to change black, color 0. Then, it takes the next 3 values at port 3c9h. So we just INCed dx. The 3 values you give it are the amounts of red, green, and blue your new color will contain. Each one has a value from 0 to 63. Giving it 3 0's would result in black, and at the opposite extreme, 3 63's results in pure white. So, we just do mov al,63 and put it out to port 3c9h 3 times. The result should be the black screen changing to white. Don't be confused though - every byte in the screen's memory was 0 before, and it still is. We just changed what color 0 looks like. So, we can now load the palette in our program before, making all the colors look right.

Why do people learn Assembly Language?


Speed
It is possible to hand-optimize your code by using assembly language instead of relying on the generic optimization techniques of a High Level Language compiler. A well crafted assembly language routine can usually beat the one generated by a compiler (Even the better optimizing compilers such as Intel C++ or Visual C++ .NET). Optimizing for speed is a skill that is difficult to master. Such optimization usually requires writing code that is specifically geared towards the target architecture. A good place to start learning how to optimize code is Agner Fog's website [1]

Size
As a result of generic optimization methods, High Level Languages tend to bloat the executable file with useless code. When coding in Assembly, you have total control over what ends-up in the actual executable file, thus the capability to optimize and eliminate all unnecessary instructions.

Understanding
Learning Assembly Language is the stepping stone to learning general Computer Architecture. This learning process is accelerated when you approach subjects such as Operating System Development.

Necessity
Certain code cannot be conceived when using a High Level Language. Such "Low-Level" Code is usually available only through the use of Assembly Language. A big example is during Operating System Development, where you need to design Interrupt Handlers that have no other dependencies.

Assemblers?
You can take your pick from the List of Assemblers. In general, Intel-based Assembly Language syntax will be used. If you learn the syntax of one Assembler, it is pretty easy to learn another.

Debuggers
DEBUG is a DOS/Windows command-line utility program that you can use to debug MS-DOS (16-bit) programs. We will use DEBUG initially for our 16-bit examples, but for the 32-bit examples, we will use an open-source command-line debugger called GRDB (Get Real Debugger), which is available for download at [2].

Disassemblers
A disassembler is a program designed to create an assembly listing from a compiled Executable. They tend to be used by hackers and such for any reason from bypassing software protection, to getting rid of pesky bugs that they've tracked down themselves. These can be powerful resources in the hands of someone who knows how to use them, but they are certainly not a beginner's choice to learn assembly from. Though they

can help if you know enough assembly to wade through everything that a higher level language gets compiled into.

Emulators
Microsoft Virtual PC VMWare PC Emulator

The need
"Eek! Why should I know this stuff?", you might ask. Well, here are some reasons: It's extremely simple This is what you will need in order to learn and program in assembly language.

Introduction
A program consists of two fundamental things: data and instructions. Loosely speaking, a computer represents these data and instructions in the form of numbers. Therefore, it is apparent that a programmer should have a good understanding of the underlying number systems being used by the computer system. Several number systems, including binary, octal, decimal, and hexadecimal, are used by different computer systems. Before we dive into the other number systems, we would like to cover the most common of them all: the decimal number system. But first, some terminology.

Number Systems
A number system is a way of representing a number. Every number system has a base (the number of digits available). A number system does NOT change the value of the number, but only the manner in which it is represented. What we mean to say is that the value of the number remains the same, but the digits we use and how we use them decides the representation of that number. (You will understand what we mean as we progress along the chapter. For now, just remember, we are only playing with the representation of the number, not its value.)

Base

The base, also called radix or scale, is the fundamental building block of a number system. The base of a number system represents the number of digits it makes available for use. The decimal number system, for example, has 10 digits and is called a base-10 number system. For a number system with base b, the digits 0, ..., b - 1 are used. A table of common bases with the digits they provide follows: Base 2 8 10 16 Name Binary Octal Decimal 0, 1 0, 1, 2, 3, 4, 5, 6, 7 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 Digits Last Digit Letter Suffix 1 7 9 b o (none) h

Hexadecimal 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F F

Base-0 does not exist and you cannot do much with base-1. A discussion about them is irrelevant to us, so we will avoid them.

Base representation
When you are using several number systems together, it is easy to confuse one number system for another. This is precisely the reason why a subscript suffix is added to a number representation. This makes the number system being used for that particular sequence of digits clear. There are two common ways of writing the suffix. A decimal number representing the base (10 for the decimal number system) or a letter of the alphabet representing that base (for example, d for the decimal number system). The letter is usually the initial character of the name of the base in use. If a base is not specified, it is usually safe to assume that the number uses the base-10 or decimal number system. Please refer to the above table for information on Bases and Letter Suffixes

A number with base b and the sequence of digits (anan-1a0.a-1...) is represented as: (anan-1a0.a-1...)b

Example (the decimal number 44934 in different number systems): 10101111100001102 1276068 4493410 af86h ----binary octal decimal hexadecimal

If you look carefully, you will notice the number representation "shrinking" in width as we use a higher number system. Hexadecimal numbers are commonly used in code for precisely this property and because it makes

representing numbers simpler (as we shall see later). The lesser the digits we have, the wider the number representation becomes as we have to represent the number using fewer digits. The more the digits we have, the narrower the number representation becomes as we have plenty of digits to represent the number.

The Decimal Number System


The decimal number system is the most commonly used number system. You use it everyday and you have been taught to work with this number system since your childhood. Also called the base-10 number system, the decimal number system offers 10 digits (0 through 9) that one can use to represent numbers. A number consists of a sequence of one or more digits, with each digit having a weight and a place value. The decimal number, 65535, for example, has 5 digits and 5 corresponding weights each associated with one digit. Starting from the right, the digit 5 has the least weight (i.e 10^0 = 1), that of the digit 3 is 10^1 = 10, that of the digit 5 is 10^2 = 100, and so on toward the left. Each digit occupies a place in the number. The place value is the weight of the digit times the digit. For the digit 3, in 65536, the place value is 3 * 10^1 = 30. For any number with base b and the sequence of digits (anan-1a0), where n is the place of the digit in the number, the weight, wn, of each digit is given by wn = b . The place value pn of each digit, an is given by pn = anwn = an b . weight of a digit = radix raised to the power of the position of the digit place value = value of digit * weight of the digit
n n

The Decimal Odometer


If you peek into the dashboard of a vehicle, you will notice a distance measurement device showing the number of miles (or kilometers, in metric units) traveled by that vehicle. That measurement indicator is called an odometer. The next time you go out for a drive, notice how the digits change with each mile you cover. Consider, for example, a vehicle that has traveled 49,748 miles. Every time the vehicle covers a new mile, that number is incremented by one. Try visualizing it using this graphical illustration as a guide. 4 9 7 4 8 -- starting 4 9 7 4 9 -- one more mile covered. the rightmost digit is incremented by one 4 9 7 4 0 -- one more mile covered. the gears move and the digit 0 is brought into position 4 9 7 5 0 -- then for the same mile, the gears controlling the second to rightmost digit move and the digit is incremented by 1 from 4 to 5.

If you want to see it for yourself on your computer, try building and running this C program. You will need GCC installed on your system to try this example (for both UNIX and Windows). (If you have the MinGW Compiler system, you don't really need the WIN32-specific part, but we have included it just in case you use a different compiler. Don't forget to define the appropriate preprocessor macro properly when compiling.) /* odometer.c */ #include <stdio.h> /* choose platform */ #if defined(__WIN32__) #include <windows.h> #define sleep(_x) Sleep ((_x)*1000) #elif defined(__UNIX__) #include <unistd.h> #endif int main (void) { register int i = 0; for (i = 9985; i <= 10000; ++i) { /* display a number */ printf ("%05d\r", i); fflush (stdout); /* sleep for 1 second */ sleep (1); } return 0; } Steps to making and running the above program : 1. Put this code in a file named odometer.c 2. To build the executable and run it, do this For a UNIX system: % gcc -g -pedantic -Wall -std=c89 -D__UNIX__ -o odometer odometer.c % ./odometer

For a Windows system: > gcc -g -pedantic -Wall -std=c89 -D__WIN32__ -o odometer odometer.c > odometer

Now, see how the digits change. To stop the running program press Ctrl+C (Windows) or Ctrl-D (UNIX).

You can also use the following Makefile to build this program 1. -------------------------------------------------------------------------2. 3. GNU Makefile. 4. You need the following software to use this Makefile: 5. 1. GNU Compiler Collection and GNU Make 6. Windows - www.mingw.org / www.cygwin.com 7. UNIX - gcc.gnu.org 8. 2. rm 9. Windows - unxutils.sourceforge.net 10. 11. 12. 13. 14. ------------------------------------------------------------------------name=odometer platform=-D__WIN32__ CC=gcc CFLAGS=-g -pedantic -Wall -std=c89 RM=rm 1. -------------------------------------------------------------------------.PHONY: all clean all: $(name) $(name): $(name).c @ echo "\n>>> Building program\n" $(CC) $(CFLAGS) $(platform) -o $(name) $(name).c Note: The tabs in the makefile are important! UNIX - Your UNIX distribution should come with this.

clean: @ echo "\n>>> Cleaning build\n" - $(RM) -f $(name) $(name).exe *.o

Put the above text in a file called Makefile in the same directory as the odometer.c file and at the command prompt type...

on UNIX: % cd source_directory_that_contains_the_code_and_the_makefile % make platform=-D__UNIX__ % ./odometer

on Windows: > cd sourcedirectorythatcontainsthecodeandthemakefile > mingw32-make > odometer

Binary and hexadecimal


Binary and hexadecimal is are both different but yet similar number system which are extremely important to any programmer. Binary is base 2 number system, while hexadecimal is base 16 number system. The reason for saying why binary is important to a programmer:- In the electronic world, the only practical means to store data is to is the way of on and off (There is no way to tell whether how high is the voltage or how low it is). Therefore in that sense binary is evolved. Data is stored in on or off (1 or 0), so it means that learning binary is learning machine code. (As some people says, "Real man code in binary") Oh yes, there is 10 types of people in the world, one who can read binary and one who cannot. The reason for saying why hexadecimal is important to a programmer:- It makes no sense to type in 1s and 0s, thus the hexadecimal is evolved. In the sense, it is more practical to adopt a base 16 number system than to use a base 10 number system. By the way, hexadecimal was used to be called sexadecimal, but due to some reasons the people at IBM decided to call it hexadecimal instead. (Some programmers says "Real man codes in hex".)

Binary -> Decimal


Think of reading binary as reading normal number, but in the sense, the numbers mean something else. The last digit means is the to the power 2^0, the second last digit is to the power 2^1 and some one and so forth. Example: 0000b 0001b 0010b 0011b 0100b = = = = = 0 1 2 3 4

0101b 0110b 0111b 1000b 1001b 1010b

= = = = = =

5 6 7 8 9 10

11011010b = 1*2^7 + 1*2^6 + 1*2^4 + 1*2^3 + 1*2^1 = 128 + 64 + 16 + 8 + 2 = 218 01010111b = 1*2^6 + 1*2^4 + 1*2^2 + 1*2^1 + 1*2^0 = 64 + 16 + 4 + 2 + 1 = 87

(*Note: the b which ends every binary number is used to inform people that the number is in binary, it is just a notation.)

Hexadecimal -> Decimal


Till this point you should be able to understand binary, and since you understand how base 2, you should be able to understand how does hexadecimal works. The conversion of hexadecimal to decimal is almost similar to the conversion of binary to decimal. Well, applying the same concept:Example: 01h 02h 03h 04h 05h 06h 07h 08h 09h 0Ah 0Bh 0Ch 0Dh 0Eh 0Fh 10h = = = = = = = = = = = = = = = = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

F4h = 15*16^1 + 4*16^0 = 240 + 4 = 244

F34Ah = = = =

15*16^3 + 3*16^2 + 4*16^1 + A*16^0 15*4096 + 3*256 + 4*16 + 10 61440 + 768 + 64 + 10 62282

(*Note: the h which ends every hexadeciaml number is used to inform people that the number is in hexadecimal, it is just a notation. Some HLL programmers prefer hexadecimal to be prefixed with 0x, but I prefer it to end with h.)

Binary -> Hexadecimal


Now is one of the most important section of the this tutorial. What is the point of knowing binary and hexadecimal exist when you do not know how to convert from one to the other? Example 1: 0111 1010b | | 7h - Ah

Therefore 01111010b -> 7Ah

Example 2: 0100 0111b : | | 4h - - 7h

Therefore 01000111b -> 47h

Data Units
Humans can view data in a number of ways, such as the time of day, a friends name, or a picture of a tree. A computer, on the other hand, has only three ways of interpreting data: either as a quantity that is processed, machine code that is executed, or having no meaning at all. In addition, the computer relies upon you, the programmer, to tell it how to interpret that data.

Bit
The smallest "unit" of data in a binary machine is the bit . A bit has only two states, 0 and 1. The binary nature of the bit is inherited from the underlying electronics. The digital circuits that run a computer only have two levels, high and low. The meanings of high and low are tied to the hardware in use, and will not be covered in this book . If a single bit is used to represent a number, that number will only have two possible values. Usually, a single bit on its own in the 0 state will represent the number 0, and in the 1 state will represent the number 1.
(2) (1)

Setting, clearing, and toggling bits


To set a bit means changing its value to 1. To clear a bit means changing its value to 0. To toggle a bit means inverting its value (if it is currently 1, then toggling it will change it to 0).

Grouping Bits
One bit has two possible states: 0 and 1. If you have two bits, Bit A and Bit B, the total amount of combinations you can form is 4--you have the two states of Bit A while Bit B is 0, plus the two states of Bit A while Bit B is 1. If you add another bit, Bit C, you have 8 possible states--you have the 4 combinations of Bits A and B while Bit C is 0, plus the 4 combinations of Bits A and B while Bit C is 1. If you add a Bit D, there are 16 combinations--the same reasoning applies. Mathematically describing the amount of combinations formable by a group of 4 bits, 2*2*2*2

or 2^1 * 2^1 * 2^1 * 2^1 = 2^4 = 16 A group of n bits will have a total of 2 possible states. Usually, the state of a group of bits is used to represent an integer. (An integer is a number that has no fractional portion--that is, if you write out the number on paper, there will only be zeroes after the decimal point.) Each possible state of the group of bits will correspond to one unique integer. That is, given an integer, you can figure out what state the bits must be in to represent that value (if it is possible to represent that value with those bits). Also, if you know the state of the bits, you can figure out what integer is represented. If you wanted to use the state of a group of n bits to remember the value of a nonnegative (that is, zero or positive) integer, without any gaps in the possible values, then the lowest number you can remember is 0, and the highest number you can remember is 2 -1. That's a total of 2 possible numbers. If you want to be able to remember a higher number, you need to use more bits.
n n n

For ease of communication, special terms are used for the most commonly used bit groupings.

Byte
The byte is the smallest unit of data that a microprocessor can manipulate. The 80x86 byte is a group of 8 bits clubbed together to represent a total of 2 = 256 bit states. If a byte is used to represent a nonnegative integer, the smallest number that it can represent is 0 (00h). The highest number it can represent is 2 -1 = 255 (FFh). If you wanted to represent 256 or any higher number, you would need to use a larger-sized unit of data.
8 8

Octet
On most existing microprocessors, a byte is 8 bits. But the size of a byte depends on the microprocessor in question. That is why you are bound to come across the term "octet", which means "a group of 8 bits"-regardless of whatever the size of a byte may be.

Nibble (or Nybble)


There are two nibbles in a byte. One nibble is comprised of the first half of the bits in the byte, and the other nibble is comprised of the last half of the bits in the byte. On an 80x86, a nibble is 4 bits in size.

Word
'Word' is probably the most confusing term in data representation. There are actually two meanings. The original definition refers to the machine word, which is the maximum number bits that a processor can work with at a time. 32-bit microprocessors have machine words of 32 bits, whereas 16-bit microprocessors have words of 16 bits. The other definition of 'word', which is what 80x86 assembly community and this book uses, is a group of 16 bits.

Double-word
A group of 32 bits.

Wyde (Knuth)
D.Knuth (author of The Art of Computer Programming) has invented a new term for 16-bit data to get away from the ambiguity of the term 'word'. Because of the use of 'w' in functions handling 16-bit characters, he adopted a term that starts with 'w'. Also, 16-bit characters are currently known as "wide" characters in the C and C++ languages.

Tetra (Knuth)
Another D.Knuth term, which is short for tetrabyte. Four 8-bit bytes is 32 bits.

Octa (Knuth)
Another D.Knuth term, which is short for octabyte. Eight 8-bit bytes is 64 bits.

Use of groups of bits

What can you do with a word? For instance, you can represent the states of 16 lights; one bit per light. We can declare that if a bit is set, then its corresponding light is turned on. If the bit is cleared, then the light is off. And since each bit is independent of the others, some lights can be on while others are off. A word can also represent a quantity. In the above example, you can find out how many lights are turned on by testing the number of bits that are set. If none of the lights are on, then zero bits are set. If 5 lights are on, then 1+1+1+1+1 = 5 bits are set. Any quantity, from 0 through 16, of 'on' lights can be expressed with the word; 17 distinct values. Let's assume you only care about how many lights are on. Since you are using 65536 states to represent 17 values, the above use of a word is a waste. The most efficent way to represent a value is by assigning weights to each bit position. The binary nature of bits suggests that we should use the base-2 numbering system.

Representing integers as binary numbers


You're already familar with at least one numbering system: decimal. A decimal digit, the base-10 couterpart of the bit, can have a value from 0 through 9. What if you need to represent a number larger than 9? Add another digit. Fifteen can be represented as 78 = 7 + 8 = fifteen. Forty three might be 99997! (Kind of resembles Roman numerals.) The problem with this system of simply adding the digits is that there are many redundant numbers. 69, 78, 87, 96 all represent the value 15. There is not one unique value represented by each number. Each digit in a sequence of digits that form a number can be assigned a unique number to identify that digit, called a "place". The rightmost digit is said to be in place "0", and each other digit has a place that is one higher than its neighbor to the right. For example 6502 |||| |||\-> ||\--> |\---> \---->

Place Place Place Place

0 1 2 3

Next, we can assign place values to each place. This way, less digits are required to represent the same value. Let's assume that each digit has a place value that is 3 times greater than its neighbor to the right. For example, 11 would equal 1*(place value of digit in place 1) + 1*(place value of digit in place 0) = 1*3^1 + 1*3^0 = 3 + 1 = 4. To represent 43, we could write 421, which would equal 4*3^2 + 2*3^1 + 1*3^0 = thirty-six + six + one. We were able to use 3 digits instead of 5.

There is still room for improvement. It turns out that the best factor to increase place values by is the radix of the numbering system in use. The radix is the first value that cannot be represented by a digit. In decimal, the highest digit is 9, so the radix is 10. Conveniently, this also equals the amount of possible digits. By using the radix as the factor to calculate place values, each digit can be used to represent a multiple of the first value that cannot be represented by the digits to the right of it. In this way, there are no duplicate representations of any value. The place value of a place is (r^p), and the value of a digit in a place is (r^p)d where r=radix, p=place, d=digit number. Because of this equation, the radix of a numbering system is also called the base of the numbering system. For example, decimal is referred to as base 10, binary is referred to as base 2. To illustrate the equation, we will calculate the value of 6502 6502 |||| |||\-> ||\--> |\---> \---->

(10^0)2 (10^1)0 (10^2)5 (10^3)6

= = = =

1*2 10*0 100*5 1000*6

= = = =

2 0 500 6000

6000+500+0+2 = 6502. Now an example in which binary is translated to decimal (binary values end in "b" to show that they are binary) 1101b |||| |||\-> ||\--> |\---> \---->

(2^0)1b (2^1)0b (2^2)1b (2^3)1b

= = = =

1*1 2*0 4*1 8*1

= = = =

1 0 4 8

1+0+4+8=13.

Thus, 1101b is the way to write 13 in binary. To convert a decimal number to binary, just keep dividing the number by 2-- the remainder equals the digit, and the quotient is the number to use to get the next digit to the left. For example, let's convert 6502 to binary 6502 / 2 = 3251 REM 0

3251 1625 812 406 203 101 50 25 12 6 3 1

/ / / / / / / / / / / /

2 2 2 2 2 2 2 2 2 2 2 2

= 1625 REM 1 = 812 REM 1 = 406 REM 0 = 203 REM 0 = 101 REM 1 = 50 REM 1 = 25 REM 0 = 12 REM 1 = 6 REM 0 = 3 REM 0 = 1 REM 1 = 0 REM 1

So, 6502 expressed as a binary number is 1100101100110b.

Binary numbers and groups of bits


Until now, two separate topics have been discussed: groups of bits, and binary numbers. Now we want to know how to use a group of bits to represent binary numbers. Each bit can represent one binary digit-therefore, a binary number of n digits can be represented by a group of n or more bits. Bits in a group of bits are assigned unique numbers to identify them, in exactly the same way as places in a number are assigned place numbers. 0000 <- a |||| |||\-> Bit ||\--> Bit |\---> Bit \----> Bit bunch of bits set to 0 0 1 2 3

For any two bits A and B, where bit A has a higher bit ID number than bit B, (for example, if bit A is Bit 4 and bit B is Bit 2,) we say that bit A is a "higher" bit than bit B. We also say that bit A is "to the left" of bit B. When you write out a sequence of bits, you write the bits in descending order-- high bits to the left, low bits to the right. If we want a group of bits to represent a binary number, we usually use bit n to hold the digit in placen of the binary number. So, bit 0 holds the value in place 0, bit 1 holds the value in place 1, etc... For example 1101 <- A bunch of bits representing the value 1101b |||| |||\-> Bit 0, holding the digit in place 0, which is 1 ||\--> Bit 1, holding the digit in place 1, which is 0

|\---> Bit 2, holding the digit in place 2, which is 1 \----> Bit 3, holding the digit in place 3, which is 1

Groups of groups of bits


It may be that you are working with a system that only supports 8-bit bytes. How could you store a number higher than 255? You can combine bytes (or other groups of bits) in the same way as you combine digits. The radix of a byte is 256 (the first value that can't be represented by a byte). So, given a Byte 1 which stores the value 152 and a Byte 0 which stores the value 74, the combined value represented by these bytes can be considered to equal 152256^1 + 74 256^0 = 152 256 + 74 = 38986. The maximum value you can store with 2 bytes is 255 256^1 + 255 256^0 = 255 256 + 255 = 65535, which is also the maximum value you can store in a group of 16 bits.

Representing negative numbers


The above method is only one possible way to represent a number with bits. While there are always 2 possible values represented by a group of n bits, you can assign numeric values to each combination of bit values as you like, so the range of values isn't always from 0 to 2 -1. For example, with 8 bits you can represent values in the set {0, 2, 4, 6, ... 508, 510}. You can also represent values in the range -128 to 127. If a value cannot be negative, it is called an unsigned number. If a value can be negative or nonnegative, it is called a signed number. This is because you need to remember the negative sign associated with the number--that is, whether the value is negative or not. The method of storing signed numbers on an 80x86 is called the "two's complement" method. For such a signed number, the highest bit is not considered to be a binary digit, but is considered to be the "sign" of the number. If the bit is 1, the number is negative--otherwise it is nonnegative (zero or positive). When the number is signed, there is one sign bit, and the remaining bits represent an unsigned integer. When the sign bit is 0, the value is equal to the value of that unsigned integer. When the sign bit is 1, the value is equal to the value of the unsigned integer minus the radix of that unsigned integer. For example, with an 8 bit signed number, the highest (leftmost) bit is the sign bit, and the remaining 7 bits are the unsigned integer. This unsigned integer is in the range 0 to 127. The radix of that unsigned integer is 128. So, when the number is negative, the value 128 is subtracted from the value of the unsigned integer to get the value of the entire byte. Thus, when the value is negative, the byte has the range 0-128 to 127-128, or -128 to -1, and when the value is positive, the byte has the range 0 to 127. The total range of possible values is -128 to 127.
n n

If all bits are 1, (that is, the sign bit is 1 indicating that the value is negative and the unsigned integer is equal to 127,) the value will equal -1. This is true regardless of how large the group of bits is. Also, if the sign bit is 1 and all other bits are 0, then the value is equal to the lowest representable value. Another way of describing two's complement signed integers is to say that the place value of the leftmost bit is negated. That is, instead of the place value of the leftmost bit of a byte equalling 128, it equals -128. So, if that bit is set, -128 is added to the value of the byte.

80x86 conventions
On the 80x86, it's presumed by the instructions that a group of n bits represents either any unsigned integer in the range 0 to 2 -1 or any signed integer in the range -2
n n-1 n-1

to 2 -1, using the methods described above.

So, you will usually store integers in this way to make the most use of the instructions. The actual process of storing numbers in this way is automatically done by the assembler and the microprocessor.

Footnotes
1. The word Bit originates from the term "Binary digIT". 2. In traditional digital circuits, a high is represented by a +5 volt signal, and a low with ground (0V) on the line. The device that is driving the signal is either sourcing +5V or sinking to ground. In RS-232, a serial communications protocol, 1's are represented by -12V (-7 to -15), and 0's as +12V (+7 to +15). (I've removed note 3. Of course computers have concept of "left and right". It's not literal left and right spatial positioning, but the words "left" and "right" are clearly defined to indicate the relationship of bits in groupings of bits, so it is acceptable to use those terms. For example, look at the instructions SHL and SHR. It's not like you can't tell what direction "left" is.)

Boolean Logic
Boolean logic was developed by George Boole in the late 1800s. At its core, boolean logic is simple to master and will be useful later in programming. The following boolean examples will be represented using truth tables,logic gates, and the binary operation on two numbers, represented as binary numbers of course. Along the lines of electronics, a gate operates by on and off states (0=off or 1=on). The inputs to the gate(A,B) are on the left and the output (C) is on the right. Following along in the part of the 'AND' example below, reading from the 'Truth Table', the inputs A and B are required to be 'on' in order for C to be on (1). It could be read as the following IF A=0 AND B=0 then C=0 IF A=0 AND B=1 then C=0 IF A=1 AND B=0 then C=0

IF A=1 AND B=1 then C=1

Notice that the result from NAND and NOR are the exact complement of the result from AND and OR, respectively.

Contents
[hide]

1 Uses of the operations 2 Combinations of the logic 3 Coding the logic 4 Notes

Uses of the operations Combinations of the logic

One or more of these logical operations can be grouped together to form complex logic. Combining an AND and an OR, for example, in pseudocode: IF (A=1 AND B=1) OR G=1 then C=1 (A AND B must be true for C to be true.. G (an input) is optional. This has created a somewhat complex logical gate on which can be based a reaction, be it from software or other.

Coding the logic


FOR MORE BIT OPERATIONS, SEE Bit Operations.

Notes
Boolean logic is the basis for much of the internal processor logic. Addition and subtraction are actually simulated using Boolean logic! The symbols showed above is those used in the USA, but in Sweden, The UK (and other IEC conforming countries another set of symbols is used (which according to some people (myself included) is more descriptive (Booleanly thinking, like OR (A OR B <=> A+B) [[>=1]] ) than the US versions)) Here is a good page that shows both in comparison: http://chiptoxic.geekcoalition.co.uk/?q=logic_gates

Computer Memory
In a computer system, memory is where the central processor directly stores data, or directly retrieves data and instructions. Memory has also been known as main storage, or the main store.

NOTE: Memory was an American term for some types of computer storage. Internationally, manufacturers have adopted this term for semiconductor storage designed as arrays of addressable bits. NOTE: "core" is obsolete, as it actually refers to an old technology -- ferrite core storage. This is much like the modern use of RAM or DRAM to refer to primary memory (see Scientica's comment in

RAM).

According to Webster, memory, or core, is a device or a component of a device in which information can be inserted and stored and from which it may be extracted when wanted. More generally, it is the capacity to store information. Computers need memory as we do too. Memory can be loosely understood as storage. However,storage and memory are considered distinct objects in computer terminology. The term memory (slang) is also used interchangeably with RAM (Random Access Memory). In general, its definition changes according to context and one should be able to comfortably extract it.

Computer memory is as vital for program execution as our memory is for us to remember. Having a whopping 3 GHz microprocessor with a meagre 32 MB of memory (RAM) on board kind of limits the capabilities of the computer system. Computer memory cannot store "ideas" or information as we do, but it certainly can store electrical voltage states either permanently or temporarily. Each bit-cell of memory can record either a high voltage state or a low voltage state. Computer memory can be either static or volatile. Static memory can save its state even if it lacks a supply of electricity. For example, ROM (Read-Only Memory) chips save programs permanently. The most common form of volatile memory is a RAM (Random Access Memory) module.

NOTE: Electronic digital systems are built using three kinds of digital circuits: combinatorial circuits for implementing Boolean logic, storage circuits for maintaining state, and pulse generators for generating synchronizing signals. Sequential circuits are digital systems containing all three kinds of circuits. In addition to this, interface circuits allow connections to other systems, electronic (digital or nondigital) or nonelectronic. The combinatorial circuits depend on the input states and give a 'result' for each possible entry and this are one reason for what they are called combinatorial this is they give a result for each possible combination of inputs, when the input states are 'lost' the state of the circuit is lost. The sequential circuit depend to in the state of the entries and when the entries are lost, the last state is not lost like the combinatorial. Example of combinatorial circuit can be a selector for what register to use or the location of memory to access; and a example of sequential circuit can be the hard drive or the RAM, see that the RAM lost is last state when lost the power supply, but not when have this supply even if you don't enter more information to the memory the others locations still have the last state that was 'computed' for this location. See that a CD or a floppy disk is not a circuit of any of this two types, can be considered like a medium that preserve a state and can be modified by external factor (magnetism or a laser light and it not accept entries or produce results (by the entries) only they react or change is state by the external power/factor)

Contents
[hide]

1 RAM 2 Like A Finite Set 3 Dividing the set Or Making partitions of the set 4 Computing the location address 5 The Memory Bus 6 Reading Memory 7 Writing Memory 8 For What Is Used 9 Advanced Memory Topics 10 How Semiconductor RAM Works

RAM
Short for Random Access Memory. This kind of memory is often used to mean the primary memory (because this type of memory is used there, it's a similar mistake to the "CMOS circuit" (Complementary

Metal-

Oxide-Silicon). A longer more detailed explanation of RAM can be found here http://en2.wikipedia.org/wiki/RAM
It is a sequential circuit that can hold is states and can be read written, but itself can not decodificate the address or location.

Like A Finite Set


For What I talk of A Finite set?, is so important understand that you will have a finite set of cells, in each cell you can save a block of bits (binary units) called a byte, the memory in the computer is not infinite, then is important that you calculate the correct computations for administrate in a correct way the space that you will use. The size of the memory are power of 2, normally you have 8, 16, 32, 64, 128, 256, 512 or 1024 Mb. A simple process can fill less bytes like 200 or 0 when they are not using the memory. When an operating system loads your program to memory or create a process, it give a specific space in the memory or a subset of the whole memory, the memory that is not available for you directly is the others programs memory, or the used for the OS specific, there normally exist a subspace that is not assigned to a process, then if you need more space, you can take or request to the OS a little space from here. But see that you need stay always in a correct way or manipulation with your actual set of memory, normally if you access memory outside of your set, you have a page fault or strange results in the computations.

Dividing the set Or Making partitions of the set

A partition of a set is a part of the set, the intersection of each partition in the set is null or not exist. That is, a partition not overlap any other partition or have a intersection with other. The memory can be partitioned like in general way (see that a partition is referred to space of memory): a partition for the OS; a partition for each process; a partition that doesn't have any type of process, that is, free space. The OS can be considered like a process but is not redundant take here like a separate space of the partition of each process, for what? the partition of the OS is normally always in the same place of the memory or is loaded always at the same partition of memory, with difference on the other process that they not always are not loaded at the same position of memory. The partition of the free space is important because when a process need be executed or need more memory, is taken from this partition, see that when this happened the partition of all process grown-up, when the free space partition is grown-down. Lets examine the partition of a specific process. Let p be a process that belongs to the union of all partitions of process, this is any p in the actual executable process tree controlled by the operating system. This p in it space will have a data, the data is the instructions and data, see that when a program is not executed it can be considered like simple data. The size of the data in p can grown-up or can free some memory reducing its space. The addressable memory is inside this space and is in own partition.

Computing the location address


The memory only read and write to a specific location, this location is calculated by the Micro-Processor, the x86 family of processors have different modes of addressing a location.

The Memory Bus


The memory bus is the set of connections between the processor and memory. The memory bus can be divided into three parts. 1) The address bus, which carries the memory address provided by the processor. 2) The data bus, which carries the data being transmitted back and forth between the processor and memory. 3) The control bus, which carries the timing and direction signals for controlling the read and write transactions. To keep things simple, most data buses have a fixed width, which is the number of data bits that can be transmitted simultaneously. It may or may not correspond with the data width associated with the architecture. For example: Intel designed the 8086 as a 16-bit architecture, and built a chip with a 16-bit data bus. However, nothing prevents the designer from building a software compatible chip with an 8-bit data bus. And it didn't, as Intel modified the 8086 to support an 8-bit data bus, calling it the 8088.

Another example: The first 32-bit x86 architecture designed by Intel was the 80386, which had a 32bit data bus. The Pentium supports the same instruction set. The Pentium has more instructions, but they are considered to be part of a 32-bit architecture. However, the Pentium data bus is 64-bits in width. The processor is responsible for handling any mismatches of data size and alignment between the data bus and what the processor instruction requests.

Reading Memory
Ok, a computer can read or they have states that can be held and then read. How read a memory of a computer?, First we need the location and is computed like was said in the previous section. Then this address is sent through the address bus and is received by the memory. Remember in memory is read it state in a specific location of memory with a specific size. A short description of the process can be The processor decodes an instruction that needs to read memory, computes the address using

address modes. The processor determines how many reads will be required to retrieve the data. The address is sent via the address bus, to all memory and address decoders, for read. The address decoders (part of motherboard chipset) use part of the address to select (activate) a

specific memory bank. The selected memory bank uses the rest of the address to retrieve data, and to send the data to the

processor. If more reads are needed, the previous three steps are repeated with the next address.

Writing Memory
The process is nearly the same as reading a location of memory, the write of memory is important because it can change the states. Consider an iteration that counts from 10 to 0, and stops when 0. The data that holds the counter is in memory, what happens if you cannot change the initial value 10? answer: an infinite loop. See that in this case is implicit the read/write of memory that use the addressing modes of the processor.

A short description can be The processor decodes an instruction that needs to write to memory, and it computes the address

using address modes. The processor determines how many writes will be needed to store the data. The address is sent via the address bus, to all memory and address decoders, with data, for write. The address decoders (part of motherboard chipset) use part of the address to select (activate) a

specific memory bank. The selected memory bank uses the rest of the address to write the data. If more writes are needed, the previous three steps are repeated with the next address.

For What Is Used


The memory in a computer is used for hold states, and then request this old states for make computations with this data. In a computation like a simple addition like 234+8467 is important remember the two "sumandos" 234 and 8467 and is important to save or write the result 3701. With the memory of a computer happened similar, but that will be Watched in

Data Representation, but

see that here we only talk about How the computer get a Address of memory?, What is done in the process of read and write, and we talk about states in the memory that are read or written.

Advanced Memory Topics


Hardware topics, and the segmentation and paging topics in

The Microprocessor.

How Semiconductor RAM Works


A RAM chip has address pins and data pins. The address pins select one of many memory cells within the chip. The memory cell may be as small as 1 bit. It may be larger, for example, it may hold 8 or 16 bits. The data pins provide the means to put data into the cell, and to get the data back out.

There are two control pins. One pin tells the RAM if we want to access it or not -- it is called "chip select" or "chip enable". The other pin tells the RAM whether we wish to read (fetch) data from the RAM or to write (store) data into the RAM -- it is often just called the "write enable", if it's not writing, it's reading. This is the minimum configuration. There may be an extra enable pin, "output enable", to further control a data read. This is sufficient for what manufacturers call static RAM. Static RAM uses a circuit called a flip-flop as its storage element. In the absence of stimuli, static RAM will retain its contents as long as it has power. Dynamic RAM or DRAM, uses a capacitor as its storage element. The capacitor is charged up to hold data. Because of "leakage", it will lose charge, leading to loss of data. The data must be "refreshed" before the data is lost. Because of the high density of DRAM, it can require a lot of address bits to access every memory cell. To keep the number of pins low, modern DRAMs receive the address in two parts, called row and column. One more pin, Row Address Strobe (RAS), tells the DRAM to capture the row portion of the address. Yet another pin, Column Address Strobe (CAS), tells the DRAM to capture the column portion of the address, from the same set of address pins. When both strobes have been activated, the DRAM will store or output data, depending on the state of the read/write pin. And, lastly, DRAMs are designed to refresh a whole row of memory cells when RAS is strobed. At this writing, SDRAMs seem to be the most popular form of DRAM, used in PC-100 and PC-133 boards. They have a few extra features which I will not go into. Check the Intel web site for more information about SDRAMs.

The Microprocessor
The microprocessor is the most central part of a computer. Everything a computer can do is determined by the capabilities of the microprocessor inside it. A microprocessor, generally, reads data from memory, works on it, and writes the result back to memory. It also performs many additional operations including arithmetic, logic, and input-output. Microprocessors have quite a history. The invention of the integrated circuit (IC), the containment of an entire CPU on a single chip, and the introduction of the IBM PC are events of particularly great importance from the historical viewpoint. Building an entire CPU on a single chip for the first time ever was a great achievement for Intel Corporation (founded by Dr. Robert Noyce and Gordon Moore) and also one of the main reasons why this CPU-on-a-chip was being trade-named the microprocessor.

A microprocessor is manufactured by placing extremely tiny transistors on extremely small semiconductor integrated circuits. Older CPUs were made of vacuum tubes and also of separate transistors, which resulted in large sizes of computers. Chips used in more recent microprocessors are silicon dies of incredibly small size on the order of 10 m. Your microprocessor comes to be made from beach sand! Intel Corporation has had a major hand in the development of the microprocessor, and its efforts should be well applauded. The name "Intel" derives from Integrated Electronics just in case you wanted to know. However, Intel is not the only company manufacturing microprocessors; there are other competent companies like AMD (American Micro Devices), Motorola, and Cyrix that also manufacture microprocessors. Some of these companies also roll out Intel-compatible microprocessors. There are many families and generations of microprocessors, but we are going to study only those from the 80x86 family. When we use the term "80x86," it refers to both Intel-manufactured and Intel-compatible 3rdparty manufactured microprocessors. Details about any particular company otherwise will be specifically noted.
Contents
[hide]
-9

1 Basic Architecture

1.1 The von Neumann Machine

2 Inside the Microprocessor

o o

2.1 A simple analogy 2.2 Bus Interface Unit

2.2.1 Address Bus 2.2.2 Data Bus 2.2.3 Control Bus

2.3 Execution Unit

2.3.1 Registers

3 Basic operation

3.1 The processor interface

4 Memory management features

o o

4.1 Protected mode and segmented memory (it ain't where you think it is) 4.2 Paging and virtual memory (it still ain't where you think it is)

5 Performance enhancements

5.1 Memory cache

5.2 Overlapped instruction execution (aka out-of-order execution)

Basic Architecture
The architecture that the 80x86 microprocessor-based computers use is based on a fundamental architecture first proposed by Dr. John von Neumann in 1946. This basic architecture is, therefore, known as the von Neumann architecture. Although based on this architecture, the 80x86 microprocessor architectures are highly enhanced over it. As a result of technological innovations and clever marketing, the Intel 80x86 architecture has become the de facto industry standard.

The von Neumann Machine


A von Neumann machine is a stored-program computer that uses a single store for both data and executable instructions. This store in our computers is mostly semiconductor-based memory. A von Neumann machine has 5 parts: arithmetic-logic unit (ALU), control unit (CU), memory, input-output, and a bus. The ALU, CU and the bus are generally considered to form the CPU. Since von Neumann computers spend a lot of time moving data between memory and the CPU (slowing down processing considerably), the bus is usually replaced by a bus unit (made of multiple separate busses). Many computers even today are based on this architecture, but several have additional enhancements made to them. Digital computers based on the von Neumann architecture loosely follow this pattern of operation 1. Fetch instruction at current code location (pointed by the program counter). 2. Update current code location (add length of the fetched instruction to program counter). 3. Calculate addresses, if any. 4. Fetch any operands. 5. Perform the requested operation. 6. Store any results. 7. Go back to step 1, to execute the next instruction.

Inside the Microprocessor


For reasons that will soon become clearer, we begin our discussion of the workings of a microprocessor by highlighting a simple analogy between us and microprocessors.

A simple analogy
For a rough analogy, consider yourself and compare your brain with the microprocessor. When you were a toddler, a symbol such as wouldn't have made had much sense to you except for that it was a picture. As you grew up to become a kindergarten kid, you started identifying things, and learning the alphabet and the digits. Pictures started coming to life.

As you grew older, you came to know about how these individual picture symbols were grouped together to form words of various sizes and different meanings. Later on, you learnt about simple sentences, and then complex ones. You may also have used your index finger to point to words to easily locate them while slowly reading sentences. With age you began reading and comprehending entire paragraphs, and all this while you only got quicker and quicker at doing it. A microprocessor works in a similar manner. It contains a bus interface unit that enables it to communicate with external devices and an execution unit that executes the instructions fed to it.

Bus Interface Unit


The microprocessor has a part called the bus interface unit (BIU), which establishes the communication link between the microprocessor and external devices. "Bus" is a general computer term for a pathway consisting of a number of electronic signal lines through which data and signals are transferred. The number of path-lines in a bus determines its size. Each signal-line can carry only one of two voltage values (high or low) at a time, thus signaling either a logic-1 state or a logic-0 state (a sort of yes or no) to the microprocessor. The BIU is primarily made up of three busses: an address bus, a data bus, and a control bus.

Address Bus
The address bus is much like your index finger, which you can use to locate words while reading them. It tells the microprocessor where to fetch data from or where to send it to. The location can either be in memory or it can be an input/output port (connecting to an external device). The address bus in an 8086 microprocessor has 20 signal-lines, and therefore, can only hold an address of size 20 bits. Since each bit can have only one of two possible states and a group of n bits can have only 2 total possible states, the number of different locations that the 8086 can address is 2
20 n

= 1,048,576 or 1 M (one Meg). The


32

Pentium, on the other hand, has a 32-bit address bus, and can address up to 2 locations.

= 4096 M = 4 Gig

Data Bus
The data bus is responsible for getting data into and sending it outside the microprocessor. The size of the data bus decides how much data can be transferred through it at a time. The microprocessor has two types of data bus: an internal data bus and an external system data bus. The system data bus of the microprocessor communicates with the external devices and transfers data to the internal bus. The internal bus transfers data to and fro between the ALU, the registers, and the instruction decoder. A 16bit microprocessor has an internal data bus width of 16 bits, a 32-bit microprocessor internal bus has a width of 32 bits, and a 64-bit microprocessor internal bus has a width of 64 bits.

Control Bus
This particular bus is the one that the Bus Interface Unit uses to notify the memory of its intentions. For example, if the microprocessor wanted to write to memory a write line on the Control Bus would activate letting the memory know that the microprocessors intention is to write a value to main memory. (ADD MORE-Read, Cache, etc.)

Execution Unit
To operate on data using instructions, a microprocessor contains an execution unit (EU). A square wave oscillator or clock circuit generates the timing signals based on which the processor synchronizes all its activities. It also determines the speed with which instructions are fetched and executed. The more the number of instructions executed per clock cycle, the faster the processor. Binary instructions that a microprocessor understands as executable define its instruction set. You cannot feed just about anything to the microprocessor and tell it to execute it. 80x86 microprocessors are based on the CISC regulations and have large instruction sets. However, later microprocessors in this family are fully compatible with earlier ones, so that the newer instruction set overlays and extends the previous ones. This simply means that programs written for an 80386 will run comfortably on a Pentium-based computer, but may not on an 80286 one.

Registers
Within the execution unit are the registers we will use to program the x86 microprocessors.

Basic operation
The processor interface
The CPU (Central Processor Unit) is the part of the computer system that contains the logic for fetching instructions, deciphering them, and executing them. Attached to the CPU are storage for code and data, also known as memory. Coprocessors, and other units known as peripherals can also be attached to the CPU. (Although, memory acts as a store for code and data, there are clear distinctions between storage and memoryin computer hardware terminology). Data is transferred between these units via data paths. The number and configuration of units and data paths vary depending on who designs the system. Pentium-class processors provide one data path for data transfers between the processor chip and all other system units. Off-chip units include the memory shared by data and executable code, and various device controllers. The processor chip will read data from memory or a device, and write data to memory or a device. The processor chip also provides an address to select which device register or memory location to write to or read from.

A Pentium processor (chip) writes data by placing an address on the set of signal lines known as the address bus, and the data on the set of signal lines known as the data bus. The processor reads data by placing the address on the address bus, and capturing the data appearing on the data bus. Timing signals control the data transfer.

Memory management features


For computational purposes, the following memory management features are unnecessary. However, the use of these features explains why your program cannot easily alter or read the data of another program in multitasking systems such as Windows and Linux.

Protected mode and segmented memory (it ain't where you think it is)
Intel defined at least three operating modes for their 32-bit microprocessors: real, protected, and virtual-8086. We are primarily interested in protected mode because that is the mode our 32-bit programs in Windows and Linux operate under. Under protected mode, we can define segments, blocks of contiguous memory that hold code and data. They are managed by segment descriptors. Two types of segments are defined: code and data. Segments are allowed to overlap. Segment descriptors control read, write, and execute permissions. The following table shows all of the possible permission combinations. It shows that executable code must be in code segments, and writable data locations must be in data segments. Segment Type Execute Read Write code code data data Yes Yes Yes Yes Yes Yes

We can designate whether each segment is 16-bit or 32-bit. If a code segment is 32-bit, by default, instructions in it use 32-bit addressing and 32-bit operands (when instructions need more than one byte). If a code or data segment is 32-bit, the segment can be as large as 4G (allowing full 32-bit addressing).

Segment descriptors also hold base addresses that will be added to the effective addresses to get linear addresses. To access segments, you use a value called a selector. The selector contains an index into the descriptor table where segment descriptors are stored. When you load a segment register (CS, DS, ES, SS, FS, GS) with a selector, the indexed descriptor is loaded into a hidden register (effectively a cache) associated with the segment register. As MS-DOS assembler programmers know, every memory access uses a segment register, whether you specify a register or not. Thus, every memory access, code or data, is tested for permissions, and every memory access is modified by a base address. Windows does not take much advantage of segment registers. When your program runs, the segments associated with the four primary segment registers CS, DS, ES, and SS are set to the same base address. An effective address will be converted to the same linear address regardless of whether you are modifying it with CS (jumps), DS (most data accesses), ES (some string instructions), or SS (stack instructions). Except for execute and write privileges, the four segments are effectively the same single segment. This is the flat memory model.

Paging and virtual memory (it still ain't where you think it is)
In protected mode, the memory paging feature can be enabled. When discussing this feature, a "page" is no longer a 256 byte block of memory. When paging is enabled, a set of page tables are used to change the address again. This is the last possible alteration of the address before it goes out onto the address bus. The most recent Pentiums can generate 36-bit physical addresses with this feature. Whereas the segmentation feature gathers memory into segments of varying sizes, the paging feature breaks up memory into pages of fixed size (4096 bytes on a Win32 platform). Part of a linear address is treated as a page number, which is used to index into page tables to retrieve a page base address. The page base address is added to the rest of the linear address to create a physical address. The page base addresses allow the pages to be randomly distributed throughout physical (true, real, actual) memory, without crashing the code in them! Because software can update the page tables, we can make two programs occupy the "same memory" by making two sets of page tables. We use one set when executing one program, and we use the other set to execute the other program. This is why addresses in one program are normally invalid in another program -- the addresses map to different pages! Each page table entry also has a "present" bit, indicating if the page is loaded with page data. This bit is maintained by software, which allows us to implement "page swapping", the heart of virtual memory.

When the processor attempts to access a "not present" page, it generates a page fault exception. The OS decides if the memory is allocated. If not, the OS signals a bad memory access. Otherwise, the OS finds a suitable place to reload the "swapped out" page. If there is no room, the OS chooses a page to "swap out" to the hard disk, and then replaces it with the desired page from the hard disk. Then the page is marked as "present". Optimization note: The page table entry also has a dirty bit, which is set when a loaded page has been written to. A page that isn't dirty does not need to be swapped out, because a copy of the page already exists on the hard disk.

Performance enhancements
Memory cache Overlapped instruction execution (aka out-of-order execution)

What are Registers?


Since computers are not magic, there must be some way to physically manipulate data in the real world as indicated by computer program instructions. Registers are such data areas that are physically located on the processor.

Why are Registers used?


As stated in the last paragraph, registers are used for data manipulation. This includes mathematical operations, logic operations, program control and other various operations. Almost every form of data transfer and data manipulation is processed through registers.

How are Registers used?


Registers are used by simply utilizing instructions that involve their use. Such instructions load/store data from/to RAM, other types of Memory or even I/O Devices.

Please reference Intel's Processor documentation for specific information about instructions, their purpose and their usage.

Types of Registers
There are various types of registers that, of course, have various purposes. The following are brief descriptions of all the major types of registers found in the x86 architecture. More detailed information can be found in Intel's Processor documentation manuals.

General Purpose Registers (GPR)


General Purposes Registers are hopefully self-explanatory, registers which are used for general programming purpose.

Accumulator Regsiter (AL/AH/AX/EAX/RAX)


The Accumulator Register was initially designed to hold results from arithmetic operations, to send and receive data during I/O operations and to identify BIOS function calls.

Base Register (BL/BH/BX/EBX/RBX)


The Base Register was initially designed to be a base pointer for addressing memory locations.

Counter Register (CL/CH/CX/ECX/RCX)


The Counter Register was initially designed to perform as a counter for programmed loops and as an index number for shift operations.

Data Register (DL/DH/DX/EDX/RDX)


The Data Register was initially designed to assist in arithmetic operations and to be a pointer to I/O port addresses during I/O operations.

Source Index Register (SI/ESI/RSI)


The Source Index Register was initially designed to act as a pointer to the source of memory and string operations.

Destination Index Register (DI/EDI/RDI)


The Destination Index Register was initially designed to act as a pointer to the destination of memory and string operations.

Base Pointer Register (BP/EBP/RBP)


The Base Pointer Register was initially designed to hold the base address of the stack.

Stack Pointer Register (SP/ESP/RSP)


The Stack Pointer Register was initially designed to hold the limit (top) address of the stack.

Segment Registers
Segment Registers act as base address pointers to memory "segments" during operations that address any part of memory. These registers are apart of the x86 "Segmentation Memory Model" and are rarely used due to the advent of "flat" memory space. Despite their depreciation during the evolution of the x86 architecture, these registers are still required to have a valid values during normal CPU operation.

Code Segment Register (CS)


The Code Segment Register was initially designed to act as a pointer to the code segment in which a program is currently running.

Data Segment Register (DS)


The Data Segment Register was initially designed to act as a pointer to the data segment in which a program's variables and data structures were being accessed.

Auxiliary Segment Registers (ES/FS/GS)


These Auxiliary Segment Registers were initially designed to assist programs with addressing various segments of memory due to the the 64KB limitation of segments during 16-bit Real Mode operation.

Extended Architecture Registers


The following registers are used with certain instructions, in which the support of those instructions varies depending on the release time of the processor. Please read Intel and AMD's Documentation Manuals for more information about their instruction-sets.

Floating Point Registers (ST)


These Floating Point Registers were initially designed during the addition of the 80387 (x87) Floating Point Unit (FPU). Currently, the FPU is standard when using a 387 FPU, 487 FPU or in processors that are 586+ (Pentium and above). The FPU registers are used to store data during float-point operations of the FPU.

Matrix Math Extension Registers (MMX)


These Matrix Math Extension Registers were initially designed during the addition of MMX to the Pentium processor series. The MMX registers are used to store data during MMX operations.

System Registers
The following registers are used to control system operation and asses system/program status.

Instruction Pointer (IP/EIP/RIP)


The Instruction Pointer was initially designed to help guide program control.

Prior to instruction execution, the Instruction Pointer (IP) points to the location of that instruction in memory. During standard operation, the IP is automatically increased after the execution of each instruction.

Flags Register (FLAGS/EFLAGS/RFLAGS)


The Flags Register was initially designed to hold the state of the processor, including certain data pertaining to the currently running process.

Control Registers (CR0/CR2/CR3/CR4)


The following Control Registers were initially designed to support the enabling and/or disabling of various processor features in a programmable fashion.

General Outline
Lexical Issues Whitespace Comments Identifiers Literals Integers Characters ASCII Unicode DBCS Strings ASCII Unicode Types Null-Terminated (Zero-Terminated) Dollar-Terminated Length-Prefixed Descriptor-based Mixed-mode HLA Strings etc... Keywords Separators Instruction Syntax General Instruction Syntax Operands Registers Memory variables Literals Expressions Labels, Variables and Data Definition Data Definition and Types Simple Types BYTE or DB WORD or DW DWORD or DD QWORD etc... Packed Data Types BCD etc... Operators

Separators Comma (,) Period (.) Line-Extension (\) Arithmetic * / + MOD Bitwise Bitwise Logical AND OR NOT XOR etc.. Bit Manipulation SHR SHL etc.. Relational == < > != ! => <= Grouping Parentheses () Brackets [] Braces {} Quotes "", '' Angled-Brackets <> etc... Assignment = EQU := Special ? etc... Assembler Directives (TODO) Layout and Style Code Traditional Linear

Indented Mixed Comments Procedure Details Line Details Single-Line Multi-line Labels

Instruction Syntax
There are many kinds of assembly language that differ from one another in many ways. We will be using the Intel Architecture 32-bit (IA-32) assembly language syntax throughout. The assembly syntax of various 80x86 assemblers may vary somewhat but all of them are essentially subsets of the Intel Architecture assembly language. So, whenever we refer to assembly language, in general, it will mean we are referring to the IA-32 syntax. An instruction in the IA-32 syntax format looks like this: label: mnemonic operand1, operand2, operand3 ; Comment

Example: mylabel: mov eax, 01 ; Copies 01 into the eax register.

where "MOV" is the mnemonic and "MOV EAX, 01" is the instruction. So, whenever we say MOV instruction, we are referring to the complete statement and not just "MOV" itself. To keep the syntax above clean and simple, we have not used any special markup to indicate optional components, but since it is necessary to do so, we provide another version of the above syntax to make things clearer. Remember, only in syntax definitions like this one, we will be using curly braces to mark out optional parts. {label:} mnemonic {operand1} {, operand2} {, operand3} {; Comment}

MASM Specific
Microsoft Macro Assembler (MASM) follows this syntax closely, so all our examples that are not marked as specific to any assembler otherwise, will mean they are for MASM.

HLA Specfic

label:

mnemonic(

operand1, operand2, operand3 );

// Comment

An Example
mylabel: mov( 01, eax ); // Copies 01 into the eax register.

HLA's syntax is quite a bit different than MASM's. In general, you'll find that the operands are reversed (that is, the source operand is first and the destination operand is second, opposite MASM's (dest,src) organization). Also note that HLA uses a functional syntax for instructions, treating operands as though they were parameters to a function that does the operation. HLA supports an interesting feature known as instruction composition. This allows you to specify one instruction as an operand of another, e.g., mov( mov( 5, ebx), registers. eax ); // Copies 5 into the ebx and eax

Whenever an instruction appears as the operand to another, HLA will emit the interior instruction first and then substitute the destination (second) operand in place of the interior instruction when processing the outer instruction. Generally, you won't find instruction composition used as in this example, but it is quite useful when expanding macros, when calling procedures, and in high-level control structures, e.g., if( mov( i, eax ) ) then <> endif; // "true" if "i" contains a non-zero value.

Though HLA fully supports labels like any other assembler. The use of high-level control structures usually obviates the need for such labels in actual source code. Nevertheless, those who prefer to eschew the high-level control structures and write "low-level" assembly code, labels use the same basic syntax in HLA as in other assemblers.

FASM Specific
(todo...)

GoASM Specific
(todo...)

NASM Specific
(todo...)

Building Programs
Assembly language allows you to code programs using mnemonics, but the computer doesn't understand these. What the computer does understand is simply a sequence of high and low voltage fluctuations represented by binary digits. So, we need some kind of program to convert our code into a form the computer can understand and execute. High-level languages make use of compilers for this purpose. Assembly language, however, requires the use of an assembler, which is a sort of compiler itself. Before we begin building programs using assemblers, we will first demonstrate a few things.
Contents
[hide]

1 A crude "Hello, World"!

1.1 Using Debug

2 Resource Compilation (Windows specific) 3 Assembling

A crude "Hello, World"!


Using Debug
This "Hello, World!" example is unlike the programs that you may have written in other languages or assembly itself, because it does not use any direct function calls to print it to the screen nor does it print to the screen. It is very crude because it is only a binary file made of a series of bytes used to represent the characters in the string "Hello, World!". One important point to remember here is that a program is essentially a series of bytes. We use the command 'type' to display the contents of the resulting file. First, run DEBUG by typing 'DEBUG' at the command-prompt, and at the debug prompt (-), type as in the following (input emboldened): C:\>DEBUG -A 0B18:0100 0B18:010D

db "Hello, World!" ; press ENTER here

-R CX file CX 0000 000D N hello.bin W Q C:\>type hello.bin Hello, World! C:\>

; CX = 'number' of bytes we want to output to ; initial value of CX ; D = 13, the length of "Hello, World!", ENTER ; name of the output file ; save the data "Hello, World!" to file ; Quit DEBUG

'db', or define byte, is simply a directive to DEBUG to tell it to define some bytes for us.

The entire compilation process, called building, is not a single step though. The process is basically divided into three steps: resource compiling, assembling, and linking.

Resource Compilation (Windows specific)


Resources are objects, such as strings, bitmaps, icons, video, audio and menus, that you will use in your programs. Information concerning the resources that you want your program to use are contained in a text file, called a _resource script_ (.RC). The resource compiler program reads this resource script and combines all the resources referenced in it to form a packed resource file. Generally, for the Windows platform this resource file has the extension .RES. There are resource compilers, however, that also generate Windows object files (.OBJ) instead of or in addition to resource files (GoRC is such an example). Also, utilities to convert between the two formats exist. A Windows resource file (.RES) contains a series of packed resource entries, with no headers, footers, padding, etc.. On the other hand, a Windows object file (.OBJ) contains more than just resource bytes: relocation lists, symbol lists, modules, segment data, checksum bytes, etc.. You do not need to worry about these to start build programs--the assemblers do all the dirty work for you. (However, that shouldn't stop you from diggin' in and building the next big assembler.) Information regarding specific resource compilers can be found a little later.

Assembling
Assembly language source code files are strictly plain-text files having the extension ".ASM".

Data Transfer Instructions


Contents

[hide]

1 The MOV Opcode 2 Loading memory or register with a constant 3 So how does mov work? 4 Moving data from a memory location to another

The MOV Opcode


So far, you would have read about registers and memory, but there was no mention on how to transfer data to the memory location or register. So let us embark on the journey to learn how to move data from memory to register, from register to memory, from register to register and setting the value in the registers and memory. Of course I will describe some tricks (ie some size optimization) along the way.

Loading memory or register with a constant


To load a register with a constant you do the following: mov eax, 10 In the above example, the opcode mov moves the constant 10 to eax. This means that the value in eax is now 10 after the instruction. To load a memory with a constant you do the following: mov [memory], 10 In the above example, the opcode mov moves the constant 10 to [memory]. The brackets tells the assembler that the label is a memory for most assembler (Though some assemblers ignores it. Refer to the assembler manuals for more details). This means that the value in [memory] is now 10 after the instruction.

So how does mov work?


The opcode mov works in the following method (simplified): mov dest, source dest = source Where dest can be register or memory and source can be register, memory or constant.However, do note that both the source and dest cannot be memory at the same time. You cannot use mov opcode to copy from memory to memory directly. Moving data from a memory location to another will be discussed later. This convention might look strange to HLL coders but fear not. One will get used to it after a while. Most opcode are in the form opcode dest, source. Of course if you do not like the above convention, you can use other assemblers like HLA or GAS. To move data at certain memory location to register: mov eax, [memory] ; or any other register

To move data from register to memory location: mov [memory], eax ; or any other register To move data from register to register: mov eax, ecx ; or any other register

Moving data from a memory location to another


As mentioned above, the mov opcode cannot move data from one memory location to another memory location. Don't panic, you still can move data from a memory location to another by another method. You either temporary copy to an avaible register and then copy it to the memory location or you make use of the stack. mov eax, [memory] ;or any other avaible register mov [memory2], eax OR push [memory] pop [memory2]

Conditional Statements
Conditional statements are the "if...else...endif", "switch...case", "while...wend", "do..while" etc. The statements that allow for branching (deviation to separate choices) based on conditions are called conditional branching statements. Those that allow for repetition of statements of code without rewriting code based on conditions are called conditional looping statements. However, assembly language does not natively support high-level representation of these conditional statements. Specialized instructions use the EFlags register to determine a condition and then a jump is executed based on the state. High-level constructs can be implemented for these instructions, but they are mostly either macros or incorporated assembler directives. This chapter focuses on teaching you how we implement these constructs using plain-vanilla assembly instructions.

Contents
[hide]

1 The Flags

o o o

1.1 IF statement 1.2 FOR statement (C version) 1.3 IF-THEN-ELSE statement

o o

1.4 Advanced IF statements 1.5 SWITCH-CASE statements

The Flags
In assembler, conditional statements revolve around one thing and that is the EFLAG register (or more commonly known as the flag register). All the opcodes can be classified into 2 groups, the first being opcodes that modifies the EFLAG, and those that do not modify the EFLAG. For the former group it could be further classified into opcodes that which modifies what flags and so on. The most important flags are carry flag (CF), overflow flag (OF), zero flag (ZF), Sign flag (SF), Parity flag (PF). Also another somewhat important flag would be the direction flag (DF), but it would only be used by string opcodes and can only be modified by cld (clear direction flag) and std (set direction flag). Opcodes relating to Conditional Statements The opcodes that would be most commonly seen in conditional statements in assembler would be the following Instruction JMP Jcc JCXZ/JECXZ LOOP LOOPZ/LOOPE Description Unconditional jump Jump if conditions met Jump if cx/ecx equals 0 Loop count Loop count while zero/equal

LOOPNZ/LOOPNE Loop count while not zero/equal CMOVcc TEST CMP Conditional move Logical compare Compare 2 operands

For Jcc, CMOVcc and SETcc, there is actually a whole range of opcodes. The "cc" in Jcc, CMOVcc and SETcc represent the tttn (condition test fields). Some of the conditions test fields have their alias thus actually they are opcodes that are the same (For example, JZ is the same as JE). The tttn is as following (listed according to its encoding)

O (Overflow) OF = 1 NO (No overflow) OF = 0 C/B/NAE (Carry, Below, Not above or equal) CF = 1 NC/NB/AE (No carry, Not below, Above or equal) CF = 0 E/Z (Equal, Zero) ZF = 1 NE/NZ (Not equal, Not zero) ZF = 0 BE/NA (Below or equal, Not above) CF = 1 or ZF = 1 NBE/A (Not below or equal, Above) CF = 0 and ZF = 0 S (Sign) SF = 1 NS (Not sign) SF = 0 P/PE (Parity, Parity even) PF = 1 NP/PO (Not parity, Parity odd) PF = 0 L/NGE (Less than, Not greater than or equal to) SF <> OF NL/GE (Not less than, Greater than or equal to) SF = OF LE/NG (Less than or equal to, Not greater than) ZF = 1 or SF <> OF NLE/G (Not less than or equal to, Greater than) ZF = 0 and SF = OF

One would wonder what is the difference between ja and jg. Well the difference is that ja is jump if above (intended for unsigned numbers), while jg is jump if greater (intended for signed number). Alright so the above list could be classified into conditional jumps for signed numbers, conditional jumps for unsigned, and others. Conditional Jumps for signed numbers JL/JNGE JNL/JGE JLE/JNG JNLE/JG

Conditional Jumps for unsigned numbers JC/JB/JNAE JNC/JNB/JAE JBE/JNA JNBE/JA

Others JO

JNO JE/JZ JNE/JNZ JS JNS JP/JPE JNP/JPO

"JMP" is an unconditional jump. For JMP, there is 2 types of jump commonly used, one is jump near, relative, displacement relative to next instruction, the one is jump near, absolute indirect, address given in operand. Jcc is almost similiar to JMP, just that the jump is only taken if the conditions are right (For example for JZ label, the processor will only jump to label if ZF = 0). JCXZ/JECXZ is a jump if cx/ecx (dependent on opcode used) is zero. But take note that the displacement for JCXZ and JECXZ is only 1 byte, id est can only jump relative to JCXZ/JECXZ -127 to +127. LOOP/LOOPxx instruction makes us of ecx or cx as the counter. Each time LOOP instruction is executed, ecx or cx (depending on address size) is decremented, then if counter != 0, the code will jump to the label. So in short LOOP label is the same as the following label: dec jnz ecx label ; decrement counter in count register ; go back to label if ecx is not zero

LOOPZ and LOOPNZ is similiar to LOOP but the jump is also dependent on the value in Zero Flag. For LOOPZ, the code will jump to the label if counter != 0 and zero flag is set to 1. For LOOPNZ, the code wil jump to the label if counter != 0 and zero flag is set 0 (or rather is cleared). Do take note that Intel do not recommend LOOP/LOOPZ/LOOPNZ because they say it is a complex instruction and it would be much better to do the above code to replace LOOP. Also loop has a displacement of 1 byte, so the jump must be within displacement of -127 to +127. SETcc will set the byte if the condition is met. Please bear in mind that SETcc only accept 8bit register and 8bit memory and nothing else, no support for 32bit or 16bit memory or register. Though, if you wish to generate a 32-bit result from SETcc that can be done by zero extending the 8bit register by using the MOVZX instruction. CMOVcc is only available on 686 and later, but I personally think it is more useful than SETcc. For CMOVcc, if the condition met, the code will move data from register to register, memory to register or register to memory. Do take note that the conditional move is only for 32bit and 16bit register and memory. Conditional move for 8bit register and memory is not support. CMP instruction compares the first operand with the second source operand and set the status flag in the EFLAGS register according to the results. The comparison is performed by subtracting the second operand from the first operand and then setting the status flags in the same manner, but the results are not updated.

TEST instruction compares the bit-wise logical AND of the first operand and the second operand and set the SF, ZF and PF status flags according to the result. The result is then discarded. Now the more commonly used opcodes for conditional statements are introduced, lets dive in into the topic itself. How to implement conditional statement in assembler In this section, I will give some pseudo code and later how it would look like in assembler.

IF statement
Pseduo code: IF eax < 25 //do something here ENDIF Assembler: cmp eax, 25 jnc _out ;do something here _out: HLA (low-level syntax): cmp( eax, 25 ); jnb _out; // do something here _out: MASM/TASM (high-level syntax): .if eax < 25 ;do something here .endif HLA (high-level syntax): if( eax < 25 ) then // do something here endif; Comment I generally perfer jnc to jnb because jnc means jump if carry flag is not set as opposed to jnb which means jump if not below.

When using high-level control structures like HLA's "if" and MASM's .if, you have to be careful when comparing registers against values. By default, most high-level assemblers assume you're usingunsigned comparisons. The following rarely does what the author expects

if( eax > -1 ) then // do something endif; // equivalent low-level code: cmp( eax, -1 ); jna endOfIf; // do something endOfIf: : The problem is that -1 is equivalent to $ffff_ffff (0ffffffffh) and EAX, when treated as an unsigned value, is never greater than this value (hence the expression above is always false). You'll have to explicitly tell the assembler if you want to do a signed comparison, e.g., if( (type int32 eax) > -1 ) then // do something endif; // equivalent low-level code: cmp( eax, -1 ); jng endOfIf; // do something endOfIf: Always be aware of the differences between signed and unsigned comparisons!

Pseudo code: IF eax == 0 //do something here ENDIF HLA high-level syntax: if( !eax ) then // do something here endif; Assembler: test eax, eax ;set the flags jnz _out ;do something here _out: HLA low-level syntax: test( eax, eax ); jnz _out; // do something here _out:

or or eax, eax ;set the flags jnz _out ;do something here _out: HLA syntax: or( eax, eax ); jnz _out; // do something here _out: or xchg eax, ecx jecxz _out ;do something here _out: HLA syntax: xchg( eax, ecx ); jecxz _out; // do something here _out: Comment This would be probably one of the more common code seen in assembler (Quite a number of windows API returns 0 in eax on error). One reason why "cmp eax, 0" is not used is because test eax, eax and or eax, eax is shorter than the cmp (The last example is 1byte smaller than the test and or variant because xchg eax, reg is only 1byte. However the drawback is that the displacement must be -127 to +128). Call it size optimisation. The test instruction could be used for test whether a bit is set. For example: Pseudo code: IF eax is odd edx++ ENDIF Assembler code: test eax, 1 jz _even inc edx _even: HLA syntax: test( 1, eax ); jz _even;

inc( edx ); _even: or shr eax, 1 jnc _even inc edx _even: HLA syntax: shr( 1, eax ); jnc _even; inc( edx ); _even: or bt eax, 0 jnc _even: inc edx _even: HLA syntax: bt( 0, eax ); jnc _even; inc( edx ); _even: or shr eax, 1 adc edx, 0 HLA syntax: shr( 1, eax ); adc( 0, edx ); or bt eax, 0 adc edx, 0 HLA syntax: bt( 0, eax ); adc( 0, edx ); Comment The above codes are just examples of testing for "even-ness", id est whether the last bit is set or not. The first should be easy to understand, the second makes use of the fact the carry flag contains the

last bit shifted out, while the last makes use of the instruction bt which test the bit and sets the carry flag according to whether the bit is set or not. All in all, the first and third does not destroy the value in eax, but the second does. If you want to preserve the value, go for the test version or the bt version.

Pseudo code: IF eax > 47 edx = eax ENDIF MASM syntax (high-level): .if eax > 47 mov edx, eax .endif HLA Syntax (high-level): if( eax > 47 ) then mov( eax, edx ); endif; Assembler: cmp eax, 47 cmova edx, eax HLA syntax: cmp( eax, 47 ); cmova( eax, edx ); or cmp eax, 47 jna @F mov edx, eax @@: HLA syntax: cmp( eax, 47 ); jna atF; mov( eax, edx ); atF: Comment This is just an example of how the instruction CMOVcc could be used (Sweet and short huh?). This could be replaced by some conditional jumps (as shown in the later example), but misjumps take up alot of cycles. Just do take note that CMOVcc are introduced in P6 family processor, and may not be supported on all IA-32 processors.

Pseudo code: IF eax == 9 ecx = 1 ENDIF Assembler: cmp eax, 9 setz cl movzx ecx, cl HLA syntax: cmp( eax, 9 ); setz( cl ); movzx( cl, ecx ); Comment This is just an example of how the instruction SETcc could be used. The movzx zero extend the value in cl (which is set depending on the value in eax) to ecx.

FOR statement (C version)


Pseudo code: FOR (ecx==0;ecx<=9;ecx++){ array[ecx] = array2[ecx] } HLA high-level syntax: for( xor( ecx, ecx ); ecx <= 9; inc( ecx )) do mov( array2[ ecx*4 ], eax ); mov( eax, array[ ecx*4 ] ); endfor; Assembler: xor ecx, ecx _loopstart: mov eax, array2[ecx*4] mov array[ecx*4], eax inc ecx cmp ecx, 9 jbe _loopstart HLA syntax: xor( ecx, ecx ); _loopstart: mov( array2[ ecx*4 ], eax );

mov( eax, array[ ecx*4 ] ); inc( ecx ); cmp( ecx, 9 ); jbe _loopstart; or mov ecx,9 _loopstart: mov eax, array2[ecx*4] mov array[ecx*4], eax dec ecx jnz _loopstart HLA syntax: mov( 9, ecx ); _loopstart: mov( array2[ ecx*4 ], eax ); mov( eax, array[ ecx*4 ] ); dec( ecx ); jnz _loopstart; Comment Both examples does the same thing in this context, but the second example is one instruction shorter than the other. Also in both examples, ecx is used as the counter. This need not be the case, you can use any of the other registers as the counter.

IF-THEN-ELSE statement
Pseudo code: IF (ecx<eax) edx = 8 ELSE edx = 16 ENDIF Assembler: cmp sbb and add ecx, edx, edx, edx, eax edx 8 8

HLA syntax: cmp( sbb( and( add( or ecx, eax ); edx, edx ); 8, edx ); 8, edx );

cmp ecx, eax jnc @F mov edx, 8 jmp _@@ @@: mov edx, 16 _@@: HLA syntax: cmp( ecx, eax ); jb notLessThan; mov( 8, edx ); jmp endOfIF; notLessThan: mov( 16, edx ); endOfIF: Comment Personally I prefer the first code because there is no branching. Instead carry flag is used to set edx to -1 or 0. The and instruction then sets edx to 8 or 0. Then finally the add instruction fixes the number to the desired number.

Advanced IF statements
Pseudo code: IF EAX>="0" && EAX<="9" ;do something ENDIF HLA high-level syntax: if( eax >= '0' && eax<= '9' ) then // do something endif; also: if( eax in '0'..'9' ) then // do something endif; Assembler: cmp eax, "0" jc @F cmp eax, "9" ja @F ;do something @@:

HLA low-level syntax: cmp( jnae cmp( jnbe // atF: or lea ecx, [eax-"0"] cmp ecx, "9"-"0" ja @F ;do something @@: HLA syntax: lea( cmp( jnbe // atF: or sub( '0', eax ); // or xor( '0', eax ); cmp( eax, 9 ); jnbe atF; // do something atF: Comment The second code makes use of one register but there is only one conditional jump while the second code makes use of two conditional jumps. Generally the best optimised code is when you do not need conditional jumps at all. Kudos to Nexo for coming up with the second code. ecx, [eax-@byte('0')]); ecx, 9 ); atF; do something eax, '0' ); atF; eax, '9' ); atF; do something

SWITCH-CASE statements
Pseudo code: SWITCH eax case 0: mov ecx, 7 break case 1: mov edx, 8 break case 2: mov ecx, 9 break case 3:

mov edx, 10 break case 4: mov ecx, 11 break default: or ecx, -1 break END SWITCH HLA switch macro (from the HLA standard library): switch( eax ) case( 0 ) mov ecx, 7 case(1) mov edx, 8 case(2) mov ecx, 9 case(3) mov edx, 10 case(4) mov ecx, 11 default or ecx, -1 endswitch; Assembler: .data jmptable dd offset _0, offset _1, offset _2, offset _3, offset _4 .code cmp eax, 4 ja _default jmp jmptable[eax*4] _0: mov ecx, 7 jmp @F _1: mov edx, 8 jmp @F _2: mov ecx, 9 jmp @F _3: mov edx, 10 jmp @F _4: mov ecx, 11 jmp @F

_default: or ecx, -1 @@: HLA Syntax: static jmptable: dword[] := [&_0, &_1, &_2, &_3, &_4 ]; endstatic; cmp( eax, 4 ); ja _default; jmp( jmptable[eax*4] ); _0: mov( 7, ecx ); jmp atF; _1: mov( 8, edx ); jmp atF; _2: mov( 9, ecx ); jmp atF; _3: mov( 10, edx ); jmp atF; _4: mov( 11, ecx ); jmp atF; _default: or( -1, ecx ); atF: Comment The following example just introduce you to the idea of a jump table. One may wonder why there is only one conditional jump to test whether it is within the range. This is because negative numbers are "bigger" than normal number and we are using unsigned comparison (Which would be the concept behind Nexo code in the previous example).

Bit Operations
Bitwise operations There are certain operations that you can use on individual bits or groups of bits. These operations are calledbitwise operations and the operators we use to perform them are called bitwise operators.

There is a clear distinction between bitwise operators and bitwise instructions. Bitwise operators are symbolic operators that we shall use for clarifying operational concepts. Bitwise instructions are the equivalent machine instructions provided by the microprocessor to simulate these operators. The logical bitwise instructions for 80386 and higher microprocessors are NOT, AND, OR, XOR, TEST, BSF, BSR, BT, BTC, BTS, BTR, and SETcc. These aid in manipulating bit values. Additional instructions, called thebit manipulation instructions, that aid in moving bits include SHL, SAL, SHR, SAR, SHLD, SHRD, ROL, ROR, RCL, and RCR. Since their uses are important to every programmer, a good grasp of the bitwise operations, operators, and their machine-equivalent instructions is essential to learning ASM.

WHY USE BITWISE OPERATORS There are several reasons why we need to or should use bitwise operators. One of the most popular ones is: most programmers prefer using the individual bits of a variable as boolean options for their code rather than using separate boolean variables as shown in an example (not asm code) below. This saves memory space and speeds up the application program. Instead of using separate variables like this bShowAmbientLight = TRUE bNoSound = FALSE bNoVideo = TRUE bNoSeek = TRUE . . .

We could create a variable dwCodeOptions and assign options to its bits like this bit bit bit bit and 0 to bShowAmbientLight 1 to bNoSound 2 to bNoVideo 3 to bNoSeek so on...

So, whenever bit 2 in dwCodeOptions is set to 1, the video preview will not be shown. If it were cleared to 0, then videos will be enabled. The same applies for the other options.

Setting, clearing, and toggling bits Setting a bit means changing its value to 1, while clearing a bit means changing its value to 0. By toggling a bit, we mean changing its current boolean value to the other boolean value. If it is currently 1, then toggling it will change it to 0. Knowing these terms is quite essential for understanding how to apply the operators discussed below. Number of Options The number of bits you can use as boolean options in your code is limited by the size of the variable you use for storing them. If you create a 32-bit (DWORD-sized) variable, then 32 bits are available for use as boolean options. Creating a 16-bit (WORD-sized) variable will only allow for 16 bits to be used. Similarly, 8-bit (BYTE-sized) variables will restrict your options to 8 bits. Bit Masks A bit mask is a temporary value that we use while setting, clearing, and toggling bits. As an example, see the one discussed under the AND Operator heading.

THE LOGICAL OPERATORS NOT bitwise operator NOT is probably the easiest bitwise operator to understand. The NOT operator is an unary operator, meaning that it takes only one operand. NOTing a value results in changing it to its opposite--NOT toggles values. If you NOT 1, you get 0. If you NOT 0, you get 1. NOT(0) = 1 NOT(1) = 0 ; NOT changes "I am not going" to "I am going" ; NOT changes "I am going" to "I am not going"

; The parentheses only suggest that NOT is unary.

It should be noted, however, that NOT toggles the values of all the bits its operand contains. If you want to toggle the values of specific bits in an operand directly, you can use the XOR operator instead as discussed a little later. Example: NOT 1100 1011b ------------= 0011 0100b NOT 0101 0111b -----------= 1010 1000b

Assembly Syntax MASM: NOT reg/mem HLA: NOT( reg/mem );

Use the NOT instruction to: Toggle all the bits in a value to get its 1's complement. To get the 2's complement use the NEG instruction or add 1 to the 1's complement. Don't do this: NOT is not equal to NEG. NOT a = NEG a - 1 or NEG a = NOT a + 1. Never confuse the NOT instruction with the NEG instruction in assembly language. NOTing a value doesn't negate it but gives us the 1's complement (all bits flipped). The 2's complement notation is used to represent negative numbers by x86-based systems. The relationship between the two operations can be shown as below: ; => 2's complement = 1's complement + 1 NEG(0101 1100b) = NOT(0101 1100b) + 1

AND BITWISE OPERATOR AND is quite simple to understand, in that when you AND a bit (one of two states) with another, if one of the bits is 0, the result would be zero. However if both bits are 1, the result would be 1. "To result in TRUE, AND requires that BOTH of its operands be TRUE."

0 AND 0 = then I am 0 AND 1 = going. 1 AND 0 = going. 1 AND 1 = going.

0 ; If both Mary AND Julia are not going to the party, not going too. 0 ; If Mary is not ready to go AND Julia is, then I am not 0 1 ; If Mary is ready to go AND Julia is not, then I am not ; If both Mary AND Julia are going, then I am also

Example: 0101 1110b AND 1010 1010b ------------= 0000 1010b 1011 1110b AND 0001 0101b ------------= 0001 0100b

Assembly Syntax MASM: AND dest, src AND reg/mem, reg AND reg, reg/mem AND reg/mem, const HLA: AND( AND( AND( AND( src, dest ); reg, reg/mem ); reg/mem, reg ); const, reg/mem);

Use the AND operator to: Clear bits. ANDing a bit with 0, clears it. We use a temporary value called a bit mask for this purpose. Consider the conversion of lowercase letters to uppercase. The lowercase letters have binary representations wherein the 5th bit of any value is a 1. Each uppercase letter value has a 0 as its 5th bit. To convert 'a' (0110 0001b, 61h) to 'A' (0100 0001b, 41h), we use a bit mask with bit 5 as a 0 and the rest as 1s, so that it ANDs with the bit 5 (0) in 'a' to result in 'A'. 0110 0001b AND 1101 1111b -------------= 0100 0001 ; 'a' ; bit mask with bit 5 = 0 ; which is 'A'

OR bitwise operator

OR is somewhat the opposite of AND. When you OR a bit with another, if one of the bits is 1, the result will be 1. However if both are 0, the result would be 0. "To result in TRUE, OR requires that ONE OR BOTH of its operands be TRUE." 0 0 1 1 OR OR OR OR 0 1 0 1 = = = = 0 1 1 1

Example: 11100101b 01010111b OR -----------11110111b 00000100b 10101010b OR -----------10101110b

Assembly Syntax MASM: OR dest, src OR reg/mem, reg OR reg, reg/mem OR reg/mem, const HLA: OR( src, dest ); OR( reg, reg/mem ); OR( reg/mem, reg ); OR( const, reg/mem );

Use the OR instruction/operator to: Set a bit. ORing a bit with 1 will set it. You'll use this operator very much when programming Windows. For example, when setting options for Window styles you use the OR operator. For more information, see the Windows ASM volume. * Compare the value in a register to 0. Although, this is another little trick, it may prove useful and you may also see it used sometimes. It works because of the way the OR instruction affects the bits in the flags register. For example

; MASM OR eax, eax ; HLA OR( eax, eax );

; equivalent to cmp eax, 0

is equivalent to eax comparing with 0 and sets the value of the Zero Flag bit in the flags register. You can then use a ZF-testing conditional jump (JNZ, JZ) to proceed with control flow. This instruction is equivalent to "cmp eax, 0", but since it produces shorter code, it is preferable to use it. cmp eax, 0 or eax, eax ; compiles into 4 bytes - 66 83 F8 00 ; compiles into 3 bytes - 66 0B C0

XOR bitwise operator XOR, or exclusive OR, is a binary operator like OR but with a slight difference. Frame what is said below, hang it on your favorite wall, and don't ever ask again. "To result in TRUE, XOR requires that ONLY ONE of its two operands to be TRUE." So, when you XOR two bits when both are valued 1, the result will ALWAYS be 0. XOR is used in some encryption technology such as XOR encryption (XOR encrpytion is weak) and for XOR swap. XOR allows you to swap the contents of two variables without using a third variable--only how we can do that is shown in actual code a little later. 0 0 1 1 Example 0011 1000b XOR 0011 1000b ------------= 0000 0000b XOR XOR XOR XOR 0 1 0 1 = = = = 0 1 1 0

0101 0101b XOR 1111 1000b ------------= 1010 1101b </verbatim>

'''''Assembly Syntax'''''

<source lang="asm"> MASM: XOR dest, src XOR reg/mem, reg XOR reg, reg/mem XOR reg/mem, const HLA: XOR( src, dest ); XOR( reg, reg/mem ); XOR( reg/mem, reg ); XOR( const, reg/mem); </source>

'''''Use the XOR instruction to:''''' * ''Swap values.'' The x86 assembly XOR instruction allows us to swap two values without using a third placeholder. For example, to swap the values of the microprocessor registers eax and edx, you use the following XOR instructions. <pre> XOR eax, edx XOR edx, eax XOR eax, edx

Although this is a neat trick, the x86 already provides an XCHG instruction for swapping registers. This instruction is covered in the Data Transfer Instructions chapter. Note also that the XOR instruction affects the x86 flags, whereas the XCHG instruction does not. The XCHG instruction, when you supply a memory operand, has an implicit LOCK associated with it, which slows the execution of the instruction by a tremendous amount. Using XOR is a possibility when you don't want a bus lock associated with the XCHG operation.

Toggle bits in a value. To toggle all the bits in 1000 1011b (to 0111 0100b) you can use NOT or you can XOR the value with -1 (1111 1111b). XORing the operand with -1 does exactly what NOTing it does. To toggle specific bits, you use a bit mask wherein the toggling bits should be valued 1, while the others should be 0. For example, to toggle bit 5 and bit 3 in 1011 0101b, we use 0010 1000b as the bit mask. ; not code, only operator 1011 0101b 0010 1000b

XOR

--------------= 1001 1101b ; code it like this XOR 10110101b, 00101000b * Clear a register to 0. As interesting as it is, you can use the XOR operator to perform a very subtle operation that may not be very self-evident at first but is truly another marvel. XORing a register with itself will clear it to 0. Here's an example XOR eax, eax ; same as "MOV eax, 0" This instruction has the same effect as assigning 0 to the register. The above instruction has "MOV eax, 0" as its equivalent. However, "XOR eax, eax" is generally used (mostly to return 0 in conjunction with a RET directive). The reason it is used is that reg32 instructions are a little faster and much smaller. For example, "xor eax,eax" compiles as 66 33 C0 while "mov eax,0" compiles as 66 B8 00 00 00 00

It occupies only 3 bytes as opposed to the 6 of the MOV version. The advantage to smaller instructions is that you can hold more in the instruction cache, speeding up program execution. Also, they require fewer memory fetches, which slow down programs as well.

Detect errors and compute parity. Tricky XOR can "undo" itself. Exactly how it is done is explained a litte later, but first, let's answer: "What is parity?" Parity is " the state of being odd or even used as the basis of a method of detecting errors in binarycoded data " according to the Merriam-Webster Dictionary. Parity information is simply redundant information that is calculated from an actual set of values. Simply stated If you have N values stored, you use these values to calculate an extra value (parity information) so that the number of values stored now becomes N + 1. If you happen to lose ANY ONE of the N + 1 values, you can recalculate it using the remaining N. Parity can be odd or even. Parity is calculated by first counting the number of 1s in a unit of binary data, and then adding a 0 or a 1 (parity value) to make the count odd or even. In even parity the total number of 1s (including the parity value) should be even. In odd parity, the total number of 1s (including the parity value) should be odd.

Example: Consider the bits in 1001b: 1. 1001 has 2 1-bits. 2. We now add a parity bit to 1001. a. If we want even parity, number of 1s (incl. parity bit) should be even. So, the parity bit for 1001 in this case is 0, resulting in: 1001 0 ;--- parity bit = 0

b. If we want odd parity, number of 1s (incl. parity bit) should be odd. So, the parity bit for 1001 in this case is 1, resulting in: 1001 1 ;--- parity bit = 1

Take another example, 1101b: 1. 1101 has 3 1-bits. 2. Add a parity bit: a. For even parity, 1101 1 b. For odd parity, 1101 0 ;--- parity bit = 0 ;--- parity bit = 1

SHR bitwise operator SHR means shift right. Taking a number as a binary, and when SHR n is performed on it, all the bits would move right by n times. The uses for shift right are that when you shift right a number by n, it is like dividing the number by 2^n. Dividing on a computer is much slower than using shifts. Thus, it is better to replace divides with shift rights. Example: 01011010b SHR 3 = 00001011b 90 SHR 3 = 11 10101110b SHR 2 = 00101011b 174 SHR 2 = 43

Assembly Syntax MASM: SHR dest, cnt SHR reg/mem, const SHR reg/mem, CL HLA: SHR( cnt, dest ); SHR( const, reg/mem ); SHR( CL, reg/mem );

SHL bitwise operator SHL means shift left. Taking a number as a binary, and when SHL n is performed on it, all the bits in the number would move left by n times. The uses for shift left are that when you shift left a number by n, it is like multiplying the number by 2^n. Multiplication on a computer is much slower than using shifts. Thus, it is better to replace multiplication with shift lefts. Example: 10010101b SHL 2 = 1001010100b 149 SHL 2 = 596 00011110b SHL 3 = 11110000b 30 SHL 3 = 240

Assembly Syntax MASM: SHL dest, cnt SHL reg/mem, const SHL reg/mem, CL HLA: SHL( cnt, dest ); SHL( const, reg/mem ); SHL( CL, reg/mem );

Addressing Modes
Contents
[hide]

1 Building memory addresses

2 Varying the address 3 Scaling the index 4 The CPU doesn't care 5 Doing it in assembly

o o o o o o o o

5.1 Constant (static) base only -- this is also known as "direct addressing" 5.2 Constant (static) base + constant displacement -- this is also "direct addressing" 5.3 Constant (static) base + scaled index 5.4 Constant (static) base + double indexing 5.5 Variable base only -- this is also known as "indirect addressing" 5.6 Variable base + constant displacement -- this is also known as "based addressing" 5.7 Variable base + scaled index 5.8 Variable base + scaled index + constant displacement

Building memory addresses


We often work with blocks of data spanning several memory addresses. For example, a dword is stored in memory as four consecutive bytes. We pick one of the addresses as a reference point - the base address. This is usually the lowest address of the data block. (For an example where this is not true, see the stack frame inThe Stack.) If we have a large data block (say a 20-byte data structure), but only need to look at a few bytes of data embedded in it, then we can locate the data by adding an offset or a displacement to the base address. Thus the calculated address (or effective address) is the sum of a base address and a displacement.

Varying the address


Both the base address and displacement can be constant. The assembler will combine these two into a single value, the direct address. Either the base or the displacement (or both) can be varied. We do this on the x86 by loading a register with the variable part. A register loaded with a base address acts as a base register. This is the basis for indirect and basedaddressing. A register loaded with a displacement is often called an index register. This is the basis for array or indexedaddressing.

Scaling the index


In a HLL, an array index with a value n is used to access the n-th array item (or element). At the machine level, we need to convert this index into a displacement. We do this by multiplying the index by the item size. (See All About Arrays for added details.) This computation is called scaling. The x86 has the built-in capacity to scale the value of one register (by 1, 2, 4, or 8) before computing the effective address.

The CPU doesn't care


The x86 is capable of adding together three numbers (one constant, two variable) to create an address. The CPU doesn't care which number is the base address. It only cares that the final value is a valid address. And in the case of the LEA instruction, the address doesn't need to be valid at all. The last case is the reason you may see code that performs nonaddress arithmetic with the LEA instruction.

Doing it in assembly
Constant (static) base only -- this is also known as "direct addressing"
MASM: mov eax,dword_data FASM: mov eax,[dword_data] HLA: mov( dword_data, eax );

Constant (static) base + constant displacement -- this is also "direct addressing"


MASM: mov eax,dword_data+4 FASM: mov eax, [dworddata+4] HLA: mov( dword_data[4], eax );

Constant (static) base + scaled index


MASM: mov eax,dword_data[ecx*4] FASM: mov eax,[dword_data+ecx*4] HLA: mov( dword_data[ecx*4], eax );

Constant (static) base + double indexing


MASM: mov eax,dword_data[ebx+ecx*4]

FASM: mov eax, [dword_data+ebx+ecx*4] HLA: mov( dword_data[ebx+ecx*4], eax );

Variable base only -- this is also known as "indirect addressing"


MASM/FASM: mov eax,[ebx] HLA: mov( [ebx], eax );

Variable base + constant displacement -- this is also known as "based addressing"


MASM: mov eax, 4[ebx] mov eax,[ebx+4] FASM: mov eax,[ebx+4] HLA: mov( [ebx+4], eax );

Variable base + scaled index


MASM/FASM: mov eax,[ebx+ecx*4] HLA: mov( [ebx+ecx*4], eax );

Variable base + scaled index + constant displacement


MASM: mov eax,24[ebx+ecx*4] mov eax,[ebx+ecx*4+24] FASM: mov eax, [ebx+ecx*4+24] HLA: mov( [ebx+ecx*4+24], eax );

A. xor eax, eax mov eax, [ebx*4+eax] or B. mov eax, [ebx*4] Most people will think that A will be longer in size, but in fact it is wrong. A is 5 byte (xor eax, eax = 33C0, mov eax, ~[[ebx4+ eax]] = 8B0498) while B is 7 bytes (mov eax, ~[[ebx4]] = 8B049D 00000000). This is

because when sib is encoded, the only time that the index is nulled is when the displacement is 4bytes. For more about it you have to learn the opcode format.

The Stack
Stack While programming in assembly language, you will often require to save the contents of a register to free it for other purposes. You could either copy the contents of the register to another available register or to a memory location reserved for such a purpose. An array of such reserved memory locations is called a stack. It is used for a number of things but mainly for local variables. The stack is a linear data structure--an array of allocatable locations in memory. Memory allocations and deallocations in the stack occur on a last-in-first-out (LIFO) basis. The first data to come into the stack becomes the last one to go out. So, whenever you PUSH some data into the stack, remember toPOP it out in the reverse order. It is important to make sure that everything that you push onto the stack is popped off it as well. This is called balancing the stack and if it is not done your program will crash in short order. For example, say you PUSHed in 1 2 3 4 one after another. You'll have to POP them out in the reverse 4 3 2 1

You can think of the stack as a closed box of fixed size with only one of the sides (top) open. Because the size of the box is fixed, you can put only an allowable number of CDs inside. If the box can contain 16 CDs, you can't put a 17th CD inside. Nonetheless, the one that you put in first will always be the last one to come out. Also, the last one you put in will be the first one to be picked out. This simply means that you'll have to pick out the CDs in the reverse order of placement.

The structure of the stack In the early DOS times, the .COM (COre iMage) executable format was limited to 65,536 bytes. The code, data, and the stack had to be fit into this tiny space, and saving room was high priority. Code and data occupied space downward, from byte 0100 (origin), and the stack was organized backward, from byte 0FFFF. The code was usually placed first, then the static data, then what we

now call, the virtual data, eventually growing down, while the stack was placed eventually growing upward. Stacks in executables are still placed proceeding backward. There can be many stacks present at a time in memory, but there's only one current stack. Each stack is located in memory in its own segment to avoid overwriting other parts of memory. This current stack segment is pointed to by the stack segment (SS) register. The offset address to the most recently allocated stack location, called the top of the stack (TOS), is held in another register called the stack pointer (SP). The 16-bit SP register is used only in the 16-bit environment; in a 32-bit environment, the 32-bit extended stack pointer (ESP) register is used to point to the TOS. A pointer called the stack base (SB) points to the first entry that you put into the stack. Adding an entry into the stack fills the next allocatable location (lower memory address) and changes (decrements) the stack pointer (SP) to point to the that location. Removing an entry empties the current location and changes (increments) the stack pointer to point to the previous allocated location (higher memory address). Each entry pushed into the stack makes the allocated section of the stack grow downward (toward lower memory addresses) and decreases SP by unit size. Each entry popped out shortens the allocated section of the stack upward (toward higher memory addresses) and increments SP by unit size (4 bytes for 32-bit; 2 for 16-bit; 1 for 8-bit). Comment To picture a stack growing or shrinking, think of 'icicles'. Just as icicles grow from top to bottom, so does the stack.

Caution: The stack is a fixed number of memory locations for temporary use.Allocations and deallocations made in a stack don't make the stack grow or shorten. Only the number of allocations made increases or decreases. The data in a stack doesn't move and shouldn't move--data moving or a stack growing or shortening defeats its purpose. A variable stack or a corrupted stack is no reliable place to save data--after all, that's what the stack is primarily used for. Example: ; (TODO)

Consider a stack of a capacity of 16 bytes. It can hold only 16 bytes, or 8 words, or 4 dwords of temporary data. If you try to allocate memory any further in the stack after it is full, a stack overflow occurs. The most recent data that you tried to push into the stack is lost.

Using the stack We use the PUSH instruction to put data into the next allocatable stack location and the POP instruction to remove a data entry from the current stack location (SP). Besides local variables there are two other very important functions of the stack. The first is to pass data to procedures including Windows API functions, for example, the following INVOKE call INVOKE SendMessage,[hWnd],WM_CLOSE,0,0 is actually assembled as follows push 0 push 0 push WM_CLOSE push [hWnd] call [SendMessage] The parameters are pushed in reverse order because Windows uses the STDCALL convention in which the stack is reversed but the function will balance it for you on return. For calling conventions that you can use in Windows, refer to Win CallingConventions. The second function of the stack is to hold the return addresses of procedures. When you call a procedure, the address that it was called from is pushed onto the stack then the program jumps to the procedure. When the CPU encounters a RET it will pop the return address off the stack and jump back to that address. It is, therefore, critical to balance the stack--if the return address is not where it is expected to be on RET the program will crash. For the most part if you do not manipulate the stack directly, you never have to worry about this but you will find that the stack is a very powerful tool and eventually you will need to keep these things in mind. A common question is: "Where is the stack located?" The stack is located in memory and is reserved for use by your program. esp and ebp are stack-related pointers. esp is the stack pointer, while ebp is the base pointer. When you enter/step into most functions, usually a stack frame would be created, and you can use ebp to access parameters and local variables. However functions can be created without the creating of a stack frame. An important point to note is that the stack should be aligned to DWORD (align to 4), if not it would raise some general protection fault (or simply known as GPF), and NT are extremely touchy to stack alignment issue. According to fodder, one of our friends on the Win32 ASM board, "Misaligned stack doesn't necessarily give GPF, but it does have weird effects - and it's true that especially NT is very picky." Example (MASM): .386 .model flat,stdcall option casemap:none include /masm32/include/user32.inc include /masm32/include/kernel32.inc

includelib /masm32/lib/user32.lib includelib /masm32/lib/kernel32.lib .code start: testing @@: jmp @F db sub invoke invoke end start Now, more about stacks and its related opcodes. The most common opcodes related to the stack are 'push' and 'pop'. The usage for push is something like push eax, as in you push the data on eax to the stack. The esp (which holds the pointer to the stack) is then decemented by the size of data you pushed onto the stack. Similarly,when you pop eax, the data to the stack is moved to eax. The esp is then incremented by the size of the data moved from the stack. Example: push size pop size ; add esp,4 eax ; mov [esp],eax ; => mov eax,[esp] ;pop increments esp by byte eax ; => sub esp,4 ;push decrements esp by byte "Stack needs to be aligned to dword" esp,2 ; remove dword align to crash app on Windows NT MessageBox,0,OFFSET testing,0,0 ExitProcess,0

However, mov is much faster than pushes and pops since it saves bytes and requires less clock cycles to execute. Thus some member (stryker/arkane) at the forums (win32asm) have came out with the xcall macro which is supposed to be faster than invokes, as it replaces all the pushes with mov and sub (note, however, that the result is much larger if you use push instructions). Of course there are some limitations which are that the macro cannot handle direct memory and cannot handle BYTE, WORD, QWORD, TBYTE size parameters. ;by gfalen @str MACRO _str:VARARG LOCAL @@1 IF @InStr(1, <_str>, <!"> ) .DATA @@1 DB _str, 0 .CODE EXITM <OFFSET @@1> ELSE EXITM <_str> ENDIF ENDM ;by stryker xcall MACRO function:REQ, parameters:VARARG

LOCAL psize, paddr, plen IFNB <parameters> psize = 0 FOR param, <parameters> psize = psize + 4 ENDM IF psize EQ 4 push parameters ELSE sub esp, psize psize = 0 FOR param, <parameters> IF @SizeStr(<param> ) GT 4 paddr SUBSTR <param>, 1, 5 IFIDNI paddr, <ADDR > paddr SUBSTR <param>, 6, @SizeStr(<param> ) - 5 lea eax, paddr mov DWORD PTR ~[esp+psize*4], eax ELSE mov DWORD PTR ~[esp+psize*4], @str(<param> ) ENDIF ELSE mov DWORD PTR ~[esp+psize*4], @str(<param> ) ENDIF psize = psize + 1 ENDM ENDIF ENDIF call function ENDM The uses of push and pop are to store data temporarily (store data on the stack) and to pass parameter (pop are not used though). There are some opcodes that help to store and later restore values in the registers. They are namely pushad (pusha being the 16bit version), popad (popa being the 16bit version), pushfd (pushf being the 16bit version) and popfd (popf being the 16bit version). For pushad, all general registers are pushed onto the stack in the following order: eax, ecx, edx, ebx, esp ,ebp, esi and edi. For popad, the registers are popped off the stack in the following order: edi, esi, ebp, esp, edx, ecx, eax. For pushfd, the Flags register (EFLAGS) is transferred onto the stack. For popfd, the data from the stack are popped into the Flags register (EFLAGS). Stack frame Eariler on, I have mentioned that ebp is the base pointer and its uses are to access the local variables and parameters passed to the function. Below I have listed a sample MASM code (MASM have certain internal macro, one of it is to create an internal stack frame) and the code produced (viewed from a disassembler). The following codes shows how parameters can be access, and how a stack frame is created so as to access the parameter with ebp. ; MASM example: test47 proc mov mov par1:DWORD, para2, para3, para4 eax,par1 ecx,par2

test47

mov mov ret endp

edx,par3 ebx,par4

// HLA example procedure test47( par1:dword; para2:dword; para3:dword; para4:dword ); @nostackalign; @nodisplay; @stdcall; begin test47; mov( mov( mov( mov( end test47; par1, eax ); para2, ecx ); para3, edx ); para4, ebx );

becomes this after compiling (due to some MASM internal macros, which sets up the stack frame) test47: push ebp mov ebp,esp mov eax,[ebp+08h] PTR[ebp+08h] = par1 mov ecx,[ebp+0Ch] mov edx,[ebp+10h] mov ebx,[ebp+14h] leave ret 10h ; store value of ebp on stack ; copy value of esp to ebp ; original value of ebp stored at [ebp+04h], DWORD ; DWORD PTR [ebp+0Ch] = par2 ; DWORD PTR [ebp+10h] = par3 ; DWORD PTR [ebp+14h] = par4 ; sizeof parameters * number of parameters = 4*4

; Actual MASM code output from the HLA compiler: 1_test47__hla_ proc near32 push ebp mov ebp, esp mov eax, dword mov ecx, dword mov edx, dword mov ebx, dword xL1_test47__hla___hla_: leave ret 16 L1_test47__hla_ endp

ptr ptr ptr ptr

[ebp+8] [ebp+12] [ebp+16] [ebp+20]

;/* ;/* ;/* ;/*

par1 */ para2 */ para3 */ para4 */

The code "push ebp" and "mov ebp,esp" creates a stack frame. The instruction "leave" removes the stack frame by esp and ebp back to their condition before the stack frame is initialized (mov esp,ebp pop ebp). There is opcode that does the opposite of leave, but is not used as it is slow. The "ret 10h" tells the processor to transfers control from the procedure back to the instruction address saved on the stack (surprise, surprise the stack is used to store the initial value of the instruction pointer when "calling" a function. The address of the function is loaded to eip and code continues with excution according to eip). "ret

10h" (something like add esp, 16 - thereby removing function parameters from the stack)is because of the STDCALL calling convention, while C calling convention would only do "ret" and leave it to the caller to adjust esp. One may ask why the first parameter is stored in DWORD PTR[ebp+08h] and not DWORD PTR[ebp+04h]. This is due to the fact that ebp is pushed onto the stack, thus DWORD PTR[ebp+04h]] contains the original value of ebp. Parameters could be accessed via DWORD PTR [ebp+4+4*positionofparameter] The above code shows how a stack frame is created and how ebp is used to access the parameters passed to the functions. The following code (MASM & HLA) would show how ebp can be used to access local variables (Local variables are acutally data stored on the stack). test124 proc par1:DWORD, para2, para3, para4 LOCAL buffer[32]:BYTE LOCAL dd1:DWORD LOCAL dd2:DWORD mov eax,dd1 ; Spare me the crap, this is just an example. mov dd2,eax lea eax,buffer ret test124 endp // HLA example: procedure test124( par1:dword; para2:dword; para3:dword; para4:dword ); @nostackalign; @leave; @nodisplay; @stdcall; var buffer: byte[32]; dd1: dword; dd2: dword; begin test124; mov( dd1, eax ); mov( eax, dd2 ); lea( eax, buffer ); end test124; becomes this after compiling (due to some MASM internal macros, which sets up the stack frame) push ebp mov ebp,esp add esp, -28h ; reserve stack mov eax,[ebp-24h] ; DWORD PTR mov [ebp-28h],eax ; DWORD PTR lea eax, [ebp-20h]; [ebp-20h] leave ret 10h ; sizeof parameters *

space for local variables ~[ebp-24h] = dd1 ~[ebp-28h] = dd2 = address of first byte in the array number of parameters = 4*4

; Actual HLA compiler output: L2_test124__hla_ proc near32 push ebp mov ebp, esp sub esp, 40 mov eax, dword ptr [ebp-36] ;/* dd1 */ mov dword ptr [ebp-40], eax ;/* dd2 */ lea eax, byte ptr ~[ebp-32] ;/* buffer */ xL2_test124__hla___hla_: leave ret 16 L2_test124__hla_ endp Okay, so the code is almost similar to the above code, creating a stack frame. The instruction "add esp,-28h" (sub esp, 40 in the HLA output) might seem weird, but it has its purpose. It is to ensure the values stored in local variables are not corrupted any data when something is pushed onto the stack. (Hopefully I do make some sense.) However I cannot comprehend why MASM produce "add esp,-28h" instead of "sub esp,28h". Maybe it is due to some macro defined deep into MASM. Local variables differ from parameters in the fact that they are accessed by negative displacement (Remember the fact that when you push something, the value of esp would decrease). I think it would be easier to understand how to access local variables by examining how to calculate the displacement needed to access a certain local variable (by looking at the above example) than my explanation. Some code gurus definitely cares about how big the code size and how fast their code runs. To optimise their code, they might even not have a stack frame in their functions (Yes, it is possible and I would show you how). Removing stack frame can shave off some clocks and some bytes (push ebp = 1byte, mov ebp,esp = 2 byte, leave = 1 byte, total bytes saved = 4). When stack frame is removed, remember that pushing data would cause a change in the value of esp. You need to manually adjust the offsets from esp. Also, if you don't have a stack frame you don't want 'leave'. The following codes are ways to create functions without stack frame. According to f0dder, "Usually, the reason for not using a stack frame is either that you don't need it, or that you want to use EBP as a general purpose register - not so much to save the push ebp . Push/pop ebp would still be needed if you want to use EBP as a general purpose register, since you have to return it in it's original state". He further states that "If you don't use a stack frame, you cannot use local variables (in the automated masm way), nor can you access function parameters the usual way - you have to handcode (well, unless somebody has macros) all ESP references, and remember to further adjust these if you do push/pop". call function1 ... function1: nop ; to represent whatever code present ret 4*numberofparameter

or OPTION PROLOGUE:NONE OPTION EPILOGUE:NONE function2 function2 proc par1:DWORD,par2,par3,par4 nop ; to represent whatever code present ret 4*4 endp

OPTION PROLOGUE:PROLOGUEDEF OPTION EPILOGUE:EPILOGUEDEF or function3 par1 par2 par3 par4 function3 proc equ equ equ equ nop ret endp <esp+4> <esp+8> <esp+12> <esp+16> ; to represent whatever code present 4*4

In HLA, you can tell the compiler to skip the generation of the stack frame by using the @noframe procedure option and the @basereg and @parmoffset compile-time variables. Note that when using this option you cannot use the @stdcall scheme, so the parameters will be passed on the stack in the opposite order. ?@basereg := esp; ?@parmoffset := 4; anymore. // Tell HLA to use ESP as the base register. // Parameters start at offset 4 since we're not pushing EBP

procedure nostk( par1:dword; para2:dword; para3:dword; para4:dword ); @nodisplay; // Don't automatically generate code @noframe; // for the stack frame. // Note: no @stdcall option! begin nostk; mov( mov( mov( mov( ret( par1, eax ); para2, ecx ); para3, edx ); para4, ebx ); _parms_ );

// "_parms_" is a constant HLA creates that specifies // the number of bytes of parameters. to EBP as the base register.

end nostk; ?@basereg := ebp; // Now switch back Here's the code the HLA compiler emits L3_nostk__hla_ proc near32 mov eax, dword ptr mov ecx, dword ptr mov edx, dword ptr mov ebx, dword ptr ret 16 xL3_nostk__hla___hla_:

[esp+16] [esp+12] [esp+8] [esp+4]

;/* ;/* ;/* ;/*

par1 */ para2 */ para3 */ para4 */

L3_nostk__hla_

endp

If you really need to use the stdcall calling sequence with HLA without building a stack frame, you can use equates, just as you do with MASM, e.g., procedure nostk2( _par1:dword; _para2:dword; _para3:dword; _para4:dword ); @nodisplay; @noframe; const par1 :text := "(type dword [esp+4])"; para2 :text := "(type dword [esp+8])"; para3 :text := "(type dword [esp+12])"; para4 :text := "(type dword [esp+16])"; begin nostk2; mov( mov( mov( mov( ret( par1, eax ); para2, ecx ); para3, edx ); para4, ebx ); _parms_ );

end nostk2; -- Output from HLA compiler: L4_nostk2__hla_ proc mov mov mov mov ret xL4_nostk2__hla___hla_: L4_nostk2__hla_ endp Further notes on the stack For one that seriously did mess around with the stack, he/she will realise that somehow windows will mysteriously terminate the program or perhaps having a weird GPF(general protection fault), all these depending on the variants of Windows. So what is the limiting fact, one would ask. Let me tell you, the stack is bounded by 2 things, namely the lower stack boundary and the upper stack boundary. The lower stack boundary is located at fs:[8]] and the upper stack boundary at fs:[4]]. fs:[4]] and fs:[8]] will make sure that when you enter some part of the kernel(that checks ESP against fs:[4]] and fs:[8]]) the OS won't kill your program. For example mov dword ptr fs:[4], ffffffffh mov dword ptr fs:[8], 0 near32 eax, dword ecx, dword edx, dword ebx, dword 16 ptr ptr ptr ptr [esp+4] [esp+8] [esp+12] [esp+16] ;/* ;/* ;/* ;/* (type (type (type (type dword dword dword dword [esp+4]) */ [esp+8]) */ [esp+12]) */ [esp+16]) */

The above example allows the stack to be located on any memory location as long as the memory is committed and accessible. In short you can create your stack. I would strongly suggest that the values in fs:[4]] and fs:[8]] is restored on the exit of you program. It is only a few instruction and hopefully it does ensure compatibility across the different OS.

Alignment
Writing efficient code is an art. Although hand optimizations can squeeze the juice out of the microprocessor, a little alertness and precautions here and there while coding can also save you fortunes. Misalignment of data is one of the problems that you need to take care of when writing efficient code. The CPU "feels" better when data is aligned on 4-BYTE boundaries or in some cases 16-BYTE boundaries. Actually, it makes code run faster and, in some cases, the data must be aligned in order to use certain CPU features. The processor is unable to access misaligned data in a way "natural" to it. Misaligned data is data located at an address that the processor cannot access efficiently. A 32-bit microprocessor "naturally" accesses data positioned at address boundaries evenly divisible by 4. Also, some operating systems require alignments of some structures to DWORD boundaries. Whether a piece of data is aligned depends not only on the address where it's located, but also on its size. 1BYTE data is always aligned, 2-BYTE (WORD) data is aligned when located at evenly divisible addresses, and 4-BYTE (DWORD) data is aligned when located at address boundaries evenly divisible by 4. This is called natural alignment.
Contents
[hide]

1 A Simple Example 2 Causes of Misalignment

o o o

2.1 Improper Structures 2.2 Data type organization 2.3 Misaligned Stack Data

3 Aligning Data

A Simple Example
Boundaries are evenly divisible memory addresses. For example, an address that is aligned on a 4BYTE (DWORD) boundary is evenly divisible by 4. The processor will always get it's data from DWORD boundaries and in DWORD sizes. So, if you had the following

1111 2222 3333 4444... and you wanted to get the second DWORD, the processor would find it on an address divisible by 4 (a boundary) and get it in one fetch. However, if the data was misaligned like this 1122 2233 3344 4400... and you wanted the same DWORD, the processor would 1. Get the first DWORD (FETCH 1) 2. Chop off the leftmost 3 bytes 3. Get the second DWORD (FETCH 2) 4. Chop off the rightmost 1 byte 5. Then put them both together. This requires 2 memory fetches and takes longer to execute. This is how it applies to DWORDs as in the previous example. For word size values, there would be no change because the 2 bytes are available in the first pass in both cases. There are certain instructions that work better on 16-BYTE boundaries (such as movsd) and some that require it (some FPU instructions).

Detecting misalignment at debug-time is difficult. The effects it has on a processor depends on the architecture of the microprocessor. Common aftereffects of misalignment are * More number fetches required to access data, hence slowing down execution. General protection faults, or GPFs.

Note that on modern processors with decent cache designs, the effects of misaligned data access are somewhat mitigated. In particular, misaligned accesses within a cache line generally do not require additional cycles to access. However, misaligned accesses across a cache line incur the penalty.

Causes of Misalignment
(TODO)

Improper Structures
(TODO)

Data type organization


(TODO: Strings and data types order)

Misaligned Stack Data


(TODO)

Aligning Data
It is worth aligning code labels that are frequent jump targets because speed increases are often observed. With data, however, it is important to at least align it to 4-byte (DWORD) boundaries otherwise the processor making 2 reads to get the value slows down processing considerably. Some of the SIMD (Simple Instruction Multiple Data) instructions require memory aligned at 32-BYTE boundaries, which usually means allocating memory with a bit over and aligning the start position to read and write to. As a general rule, you should try to define the larger-sized data first. For example, you should define DWORDs before WORDs, and WORDs before BYTEs. You should make it a point to align data after you've defined your strings. (TODO: Structure and stack Alignment) The stack should be always aligned to 4 in Windows-based programs because misalignment often causes some API functions to fail. Begin MASM Specific To align data using MASM, use the ALIGN directive. The ALIGN directive aligns the next instruction or data to the boundary specified. To align labels, the ALIGN directive places NOP (no operation) instructions wherever needed. Syntax: ALIGN [[boundary]] Example: ALIGN 4 ALIGN 16 ; align next data or instruction to DWORD boundary. ; align next data or instruction to 16-BYTE boundary.

These are the two most common alignment directives but, generally, you can use any even number from 2 through 16. MASM, however, will complain if you ask for alignment that is greater than the segment alignment. If you use full segment definitions and specify "page", you can specify up to "ALIGN 256". (This page is not the same 4 KB page that the 80x86 microprocessor uses for paging with segment descriptors defined as in Windows. Rather, it is 256 bytes.) MASM will properly align variables declared with LOCAL to their natural boundaries up to DWORD. QWORDs are not properly aligned. With a 16-bit stack, it seems to do the same alignment of variables, but it makes no effort to align the stack properly, so its alignment of the DWORD variables will not be of much use half the time on average.

Here, or with a 32-bit stack, extensively using variables larger than 32 bits efficiency will be improved by forgoing the convenience of proc and manually assigning variables and aligning the stack. I have found this to be somewhat awkward to do, especially if you want to preserve all registers on entry.

Example: Aligning after defining strings. ; a 17-byte string string1 db "this is a string",0 ; one reason to cause misalignment ALIGN 4 ; Align next piece of data at the next 4-byte boundary. dwValue dd 0 ; This data is now aligned. In the above example, the string is 17 bytes long. If you do not use the ALIGN 4 directive, the next piece of data gets deposited at the next byte (byte 18). I order to get the value of dwValue, the microprocessor will fetch data twice. You don't want it to do that, do you? We guess not. End MASM Specific Begin HLA Specific To align data using HLA, use the ALIGN directive or procedure option. The ALIGN directive aligns the next instruction or data to the boundary specified. To align labels, the ALIGN directive places NOP (no operation) instructions wherever needed. Syntax: ALIGN( <<boundary>> ); Example: ALIGN( 4 ); ALIGN( 16 ); ; align next data or instruction to DWORD boundary. ; align next data or instruction to 16-BYTE boundary.

These are the two most common alignment directives but, generally, you can use any even number from 2 through 16, though in general you should use a power of two. In theory, HLA supports alignments of any value, but in certain circumstances you may not be allowed to use values greater than 16. Also, as HLA's alignment capabilities depend on the underlying assembler that processes HLA's output, there may be additional restrictions based on the assembler you're using with HLA. To force the first instruction of a procedure to begin on some boundary, you may use the HLA align procedure option as follows procedure procName( <<OptionalParameters>> ); align(4); <<otherOptions>> begin procName; <<statements>> end procName; This aligns the first instruction of the procedure on the specified boundary.

HLA automatically pads all procedure variables to 32 bits (a requirement of Windows). HLA does not,

however, provide this padding to local variables. If you want to align the addresses of your local variables on the stack to some particular boundary, you can use the align directive for this purpose: procedure procName( <<OptionalParameters>> ); <<otherOptions>> var b:byte; align(4); d:dword; begin procName; <<statements>> end procName; Note that the alignment is only within the activation record; true address alignment depends on the stack being properly aligned upon entry into the procedure. Most of the time you can count on the stack being aligned on a double-word boundary upon entry into your procedure. However, it's possible to mess with the stack prior to calling a procedure and invalidating this assumption. To help overcome this problem, HLA, by default, emits some extra code to align the stack upon entry into a procedure. For example, compiling the following HLA code procedure TestProc(parameter: dword); @nodisplay; var b:byte; w:word; d:dword; begin TestProc; mov( b, al ); mov( w, ax ); mov( d, eax ); end TestProc; produces the following MASM code L1_TestProc__hla_ proc near32 push ebp mov ebp, esp sub esp, 8 1 byte padding. and esp, 0fffffffch boundary! mov al, byte ptr [ebp-1] mov ax, word ptr [ebp-3] mov eax, dword ptr [ebp-7] mov esp, ebp pop ebp ret 4 L1_TestProc__hla_ endp

;Allocate storage for 7 bytes + ;Align stack to four-byte ;/* b */ ;/* w */ ;/* d */

Unfortunately, the "and esp, 0fffffffch" instruction does not align the current activation record to a four-byte boundary, but if Test Proc calls any other procedures, those procedures' stacks will be dword aligned (unlessTest Proc also messes with the stack before calling those procedures).

If your program doesn't mess up the dword alignment of the stack, you can use the @nostackalign procedure option to tell HLA not to bother emitting the "and esp, 0fffffffch" instruction, thus making your code a tiny bit more efficient program t; procedure TestProc(parameter: dword); @nodisplay; @noalignstack; var b:byte; w:word; d:dword; begin TestProc; mov( b, al ); mov( w, ax ); mov( d, eax ); end TestProc; begin t; end t; Emits the following MASM code L1_TestProc__hla_ proc near32 push ebp mov ebp, esp sub esp, 8 mov al, byte ptr [ebp-1] ;/* b */ mov ax, word ptr [ebp-3] ;/* w */ mov eax, dword ptr [ebp-7] ;/* d */ mov esp, ebp pop ebp ret 4 L1_TestProc__hla_ endp Note in the examples to this point that the w and d local variables have been misaligned in the activation record. This is easy to fix with an align directive in the VAR section of the procedure program t; procedure TestProc(parameter: dword); @nodisplay; @noalignstack; var b:byte; align(2); w:word; align(4); d:dword; begin TestProc; mov( b, al ); mov( w, ax ); mov( d, eax ); end TestProc; begin t;

end t; MASM code generated by the HLA compiler L1_TestProc__hla_ proc near32 push ebp mov ebp, esp sub esp, 8 mov al, byte ptr [ebp-1] mov ax, word ptr [ebp-4] mov eax, dword ptr [ebp-8] mov esp, ebp pop ebp ret 4 L1_TestProc__hla_ endp

;/* b */ ;/* w */ ;/* d */

Note that HLA always guarantees that literal string constants you create in an HLA program are stored in memory aligned to a four-byte boundary and always consume a multiple of four bytes. For example, consider the following HLA string constants appearing in a program program t; static s1: s2: s3: s4: string string string string := := := := "Hello "Hello "Hello "Hello World"; World."; World.."; World...";

begin t; end t; Note the code that HLA emits for this string data (keep in mind that HLA prefixes string data with the maximum length and current length of the string) align 4 ;align to dword boundary L2_len__hla_ label dword dword 0bh ;maximum length dword 0bh ;current length L2_str__hla_ label byte db "Hello World" db 0 ;zero terminating byte align label dword dword label db db byte byte byte L6_len__hla_ align label 4 dword 0ch 0ch byte "Hello World." 0 0 0 0 4 dword ;Extra padding to ensure that string ; object is a multiple of four bytes long

L4_len__hla_ L4_str__hla_

L6_str__hla_

dword dword label db db byte byte

0dh 0dh byte "Hello World.." 0 0 0 ;Extra padding to ensure that string ; object is a multiple of four bytes long

L8_len__hla_ L8_str__hla_

align label dword dword label db db byte

4 dword 0eh 0eh byte "Hello World..." 0 0 ;Extra padding for dword alignment.

End HLA Specific Begin FASM Specific (TODO) End FASM Specific Begin GoASM Specific Achieving correct data alignment in GoAsm Good alignment can usually be achieved automatically by declaring data in size sequence in the data section. So you would declare all qwords first, then dwords, then words, then bytes and strings. Twords, being 10 bytes, would upset the sequence - you could do them all first then correct the alignment using ALIGN. Example: Code: DATA TWORDINTEGER DT 0.0 TWORDRESULT DT 0.0 ALIGN 8 QWORD_DATA1 DQ 0 QWORD_DATA2 DQ 0 COUNTD1 DD 0 COUNTD2 DD 0 COUNTW1 DW 0 COUNTW2 DW 0 COUNTB DB 0 Mess1 DB 'Input message',0 Mess2 DB 'Output message',0

;for floating point operations ;re-align data to 8-byte boundary

Here ALIGN is used to pad the DATA section with zeroes to bring it back into alignment for the qwords. The same can be done in a CONST section or for uninitialized data (using ? as the initializer).

For Win32, GoAsm automatically aligns structures on a dword boundary, both when they are declared as local data and in the data section. For Win64 GoAsm automatically aligns structures and structure members to suit the natural boundary of the structure and its members. GoAsm also pads the size of the structure to suit. GoAsm also automatically aligns the stack pointer (RSP) ready for an API call. See the GoAsm help file for more details. Code alignment in GoAsm Correct code alignment will differ between processors. There are some speed tests in TestBug which show what difference correct alignment can make when reading from, writing to or comparing the contents of, memory. When you use ALIGN in a CODE section, GoAsm pads with instruction NOP (opcode 90h), which performs no operation. End GoASM Specific Alignment to x (if x is power of 2 is simple). For example alignment to 16 add esi, 16-1 and esi, -16 ;esi the pointer to the pointer to be aligned

The Floating Point Unit (FPU)


The x87 coprocessor, or the floating-point unit (FPU), executes approximately 70 instructions. This chapter will describe the instruction set of the FPU up to the Pentium processor. It includes the data transfer, arithmetic, comparison, transcendental, and constant instructions. The FPU can be very useful for performing calculation and mathematics heavy applications such as 3D graphics and audio processing. The FPU uses a different system for moving data around in the processor. Instead of using named registers, like the CPU, it uses a stack. The stack is refered to as ST(x) where x is a position on the stack. This causes a lot of confusion to newcomers because unlike programming on the main CPU, where eax is always eax, when you move data into a register on the FPU, everything below it moves down, and you need to keep track of your register stack at all times. Throughout this article, ST(0) will be used to represent the top of the FPU stack, though many people refer to it as ST without a number. The FPU also contains a status register, who's purpose is to hold status flags for operations like comparison, bit test operations, etc. The FSTSW instruction moves the status word register into the CPU, so you can use it for flow controll, exception detection, and response, and the like.
Contents
[hide]

1 FPU Data Formats

o o

1.1 Signed Integers 1.2 Binary-Coded Decimal (BCD)

2 List of FPU Instructions

FPU Data Formats


The FPU uses 3 different types of data: signed integer, BCD, and floating point. The following table shows the various data types used by the FPU along with their sizes and approximate ranges. Type Type Length Length Range Range

Word Integer 16-bit -32,768 to 32,767 Short Integer 32-bit -2.14e9 to 2.14e9 Long Integer Single Real Double Real 64-bit -9.22e18 to 9.22e18 32 bit 1.18e-38 to 3.40e38 64 bit 2.23e-308 to 1.79e308

Extended Real 80 bit 3.37e-4932 to 1.18e4932 Packed BCD 80 bit -1e18 to 1e18

Signed Integers
Positive signed integers are stored in normal format, with the left-most sign bit set to 0. Negative numbers are stored in 2's-complement form, with the left-most sign bit equal to 1. Following shows how to define the various integer types, and what their binary representations are ;--------------------------+----------------------+ ; Definition | Hexadecimal | ;-------------------------------------------------+ var1 dw 24 ; 0024 | var2 dw -2 ; FFFE | var3 dd 1234 ; 000004D2 | var4 dd -123 ; FFFFFF85 | var5 dq 9876 ; 0000000000002694 | var6 dq -321 ; FFFFFFFFFFFFFEBF | ;+------------------------------------------------+

Binary-Coded Decimal (BCD)


BCD numbers are 10-bytes in size. Each number is stored as 18 digits, with 2 digits per byte. The highest order byte stores the sign of the number, with the highest order bit of this byte being the sign bit. Note that both positive and negative numbers are stored in true form, and not in complemented form. +----------+-----+-----+-----+-----+-----+-----+-----+-----+----+----+----+---+----+----+----+----+----+----+ | Sign bit | D17 | D16 | D15 | D14 | D13 | D12 | D11 | D10 | D9 | D8 | D7 | D6 | D5 | D4 | D3 | D2 | D1 | D0 | +----------+-----+-----+-----+-----+-----+-----+-----+-----+----+----+----+---+----+----+----+----+----+----+ 79 0

List of FPU Instructions


This list of instructions was compiled from Ray Filiatreault's online floating point tutorial. F2XM1 FABS FADD FADDP FBLD FBSTP FCHS FCLEX FCMOVcc* FCOM FCOMI FCOMIP FCOMP FCOMPP FCOS FDECSTP FDIV FDIVP FDIVR FDIVRP FFREE FIADD FICOM FICOMP FIDIV FIDIVR FILD FIMUL 2 to the X power minus 1 Absolute value of ST(0) Add two floating point values Add two floating point values and pop ST(0) Load BCD data from memory Store BCD data to memory Change the sign of ST(0) Clear exceptions Conditional move based on CPU flags Compare ST(0) to a floating point value Compare ST(0) to ST(i) and set CPU flags Compare ST(0) to ST(i) and set CPU flags and pop ST(0) Compare ST(0) to a floating point value and pop ST(0) Compare ST(0) to ST(1) and pop both registers Cosine of the angle value in ST(0) Decrease stack pointer Divide two floating point values Divide two floating point values and pop ST(0) Divide in reverse two floating point values Divide in reverse two floating point values and pop ST(0) Free a data register Add an Integer located in memory to ST(0) Compare ST(0) to an integer value Compare ST(0) to an integer value and pop ST(0) Divide ST(0) by an Integer located in memory Divide an Integer located in memory by ST(0) Load integer from memory Multiply ST(0) by an Integer located in memory

FINCSTP FINIT FIST FISTP FISUB FISUBR FLD FLD1 FLDCW FLDENV FLDL2E FLDL2T FLDLG2 FLDLN2 FLDPI FLDZ FMUL FMULP FNCLEX FNINIT FNOP FNSAVE FNSTCW FNSTENV FNSTSW FPATAN FPREM FPREM1 FPTAN FRNDINT FRSTOR FSAVE FSCALE FSIN FSINCOS FSQRT FST FSTCW FSTENV FSTP FSTSW FSUB FSUBP FSUBR FSUBRP FTST FUCOM FUCOMI

Increase stack pointer Initialize the FPU Store integer to memory Store integer to memory and pop ST(0) Subtract an Integer located in memory from ST(0) Subtract ST(0) from an Integer located in memory Load real number Load the value of 1 Load control word LoaD environment Load the log base 2 of e (Napierian constant) Load the log base 2 of Ten Load the log base 10 of 2 (common log of 2) Load the log base e of 2 (natural log of 2) Load the value of PI Load the value of Zero Multiply two floating point values Multiply two floating point values and pop ST(0) Clear exceptions (no wait) Initialize the FPU (no wait) No operation Save state of FPU (no wait) Store control word (no wait) Store environment (no wait) Store status word (no wait) Partial arctangent of the ratio ST(1)/ST(0) Partial remainder Partial remainder 1 Partial tangent of the angle value in ST(0) Round ST(0) to an integer Restore all registers Save state of FPU Scale ST(0) by ST(1) Sine of the angle value in ST(0) Sine and cosine of the angle value in ST(0) Square root of ST(0) Store real number Store control word Store environment Store real number and pop ST(0) Store status word Subtract two floating point values Subtract two floating point values and pop ST(0) Subtract in reverse two floating point values Subtract in reverse two floating point values and Pop ST(0) Test ST(0) by comparing it to +0.0 Unordered Compare ST(0) to a floating point value Unordered Compare ST(0) to ST(i) and set CPU flags

FUCOMIP FUCOMP FUCOMPP FWAIT FXAM FXCH FXTRACT FYL2X FYL2XP1

Unordered Compare ST(0) to ST(i) and set CPU flags and pop ST(0) Unordered Compare ST(0) to a floating point value and pop ST(0) Unordered Compare ST(0) to ST(1) and pop both registers Wait while FPU is busy Examine the content of ST(0) Exchange the top data register with another data register Extract exponent and significand Y*Log2X Y*Log2(X+1)

* cc refers to any of these variations FCMOVB Move if below (CF=1) FCMOVE Move if equal (ZF=1) FCMOVBE Move if below or equal (CF=1 or ZF=1) FCMOVU Move if unordered (PF=1) FCMOVNB Move if not below (CF=0) FCMOVNE Move if not equal (ZF=0) FCMOVNBE Move if not below or equal (CF=0 and ZF=0) FCMOVNU Move if not unordered (PF=0)

You might also like