Hello Again Professor Hyde,We are currently working on ways to publish this text in a form other than HTML (e.g., Postscript, PDF, Frameviewer, hard copy, etc.). This, however, is a low-priority project. Please do not contact Randall Hyde concerning this effort. When something happens, an announcement will appear on "Randall Hyde's Assembly Language Page." Please visit this WEB site at http://webster.ucr.edu for the latest scoop.
Dallas gave me permission to take orders for the Computer Science 13 Manuals. We would need to take charge card orders. The only cards we take are: Master Card, Visa, and Discover. They would need to send the name, numbers, expiration date, type of card, and authorization to charge $95.00 for the manual and shipping, also we should have their phone number in case the company has any trouble delivery. They can use my e-mail address for the orders and I will process them as soon as possible. I would assume that two weeks would be sufficient for printing, packages and delivery time.
I am open to suggestions if you can think of any to make this as easy as possible.
Thank You for your business,
Kathy Chapman, Assistant
Printing and Reprographics
University of California
mov) that will prove handy in the discussion throughout the rest of this chapter. These are the load effective address (
esand general purpose register (
les), addition (
add), and multiply (
mul). These instructions, along with the
movinstruction, provide all the necessary power to access the different data types this chapter discusses.
leainstruction takes the form:
lea reg16, memoryreg16 is a 16 bit general purpose register. Memory is a memory location represented by a mod/reg/rm byte (except it must be a memory location, it cannot be a register).
lea ax,1000h[bx][si],for example, would load
axwith the address of the memory location pointed at by
1000h[bx][si].This, of course, is the value
1000h+bx+si. Leais also quite useful for obtaining the address of a variable. If you have a variable I somewhere in memory,
lea bx,Iwill load the
bxregister with the address (offset) of I.
lesinstruction takes the form
les reg16, memory32This instruction loads the
esregister and one of the 16 bit general purpose registers from the specified memory address. Note that any memory address you can specify with a mod/reg/rm byte is legal but like the
leainstruction it must be a memory location, not a register.
lesinstruction loads the specified general purpose register from the word at the given address, it loads the
esregister from the following word in memory. This instruction, and it's companion
ds) are the only instructions on pre-80386 machines that manipulate 32 bits at a time.
addinstruction, like it's x86 counterpart, adds two values on the 80x86. This instruction takes several forms. There are five forms that concern us here. They are
add reg, reg add reg, memory add memory, reg add reg, constant add memory, constantAll these instructions add the second operand to the first leaving the sum in the first operand. For example,
bx := bx + 5.
mul(multiply) instruction. This instruction has only a single operand and takes the form:
mul reg/memoryThere are many important details concerning
multhat this chapter ignores. For the sake of the discussion that follows, assume that the register or memory location is a 16 bit register or memory location. In such a case this instruction computes
dx:ax :=ax*reg/mem. Note that there is no immediate mode for this instruction.
program useless(input,output); var i,j:integer; begin i := 10; write('Enter a value for j:'); readln(j); i := i*j + j*j; writeln('The result is ',i); end.When the computer executes the statement
i:=10;it makes a copy of the value 10 and somehow remembers this value for use later on. To accomplish this, the compiler sets aside a memory location specifically for the exclusive use of the variable
i.Assuming the compiler arbitrarily assigned location DS:10h for this purpose it could use the instruction
mov ds:[10h],10to accomplish this. If
iis a 16 bit word, the compiler would probably assign the variable
jto the word starting at location 12h or 0Eh. Assuming it's location 12h, the second assignment statement in the program might wind up looking like the following:
mov ax, ds:[10h] ;Fetch value of I mul ds:[12h] ;Multiply by J mov ds:[10h], ax ;Save in I (ignore overflow) mov ax, ds:[12h] ;Fetch J mul ds:[12h] ;Compute J*J add ds:[10h], ax ;Add I*J + J*J, store into IAlthough there are a few details missing from this code, it is fairly straightforward and you can easily see what is going on in this program.
i? Indeed, why should you even care that the variable
iis at location 10h and
jis at location 12h? Why shouldn't you be able to use names like
jrather than worrying about these numerical addresses? It seems reasonable to rewrite the code above as:
mov ax, i mul j mov i, ax mov ax, j mul j add i, axOf course you can do this in assembly language! Indeed, one of the primary jobs of an assembler like MASM is to let you use symbolic names for memory locations. Furthermore, the assembler will even assign locations to the names automatically for you. You needn't concern yourself with the fact that variable
iis really the word at memory location DS:10h unless you're curious.
dswill point to the dseg segment in the SHELL.ASM file. Indeed, setting up
dsso that it points at dseg is one of the first things that happens in the SHELL.ASM main program. Therefore, all you've got to do is tell the assembler to reserve some storage for your variables in dseg and attach the offset of said variables with the names of those variables. This is a very simple process and is the subject of the next several sections.
jin the preceding section are examples of scalar variables. Examples of data structures that are not scalars include arrays, records, sets, and lists. These latter data types are made up from scalar values. They are the composite types. You'll see the composite types a little later; first you need to learn to deal with the scalar types.
ByteVar byte ?
ByteVaris a label. It should begin at column one on the line somewhere in the dseg segment (that is, between the
dseg endsstatements). You'll find out all about labels in a few chapters, for now you can assume that most legal Pascal/C/Ada identifiers are also valid assembly language labels.
jlocated at the same address now, would it?). After declaring said variable, MASM will allow you to refer to that variable by name rather than by location in your program. For example, after inserting the above statement into the data segment (dseg), you could use instructions like
mov ByteVar,alin your program.
dseg segment para public 'data' bytevar byte ? ;byte allocates bytes wordvar word ? ;word allocates words dwordvar dword ? ;dword allocs dbl words byte2 byte ? word2 word ? dseg endsMASM allocates storage for bytevar at location DS:0. Because bytevar is one byte long, the next available memory location is going to be DS:1. MASM, therefore, allocates storage for wordvar at location DS:1. Since words require two bytes, the next available memory location after wordvar is DS:3 which is where MASM allocates storage for dwordvar. Dwordvar is four bytes long, so MASM allocates storage for byte2 starting at location DS:7. Likewise, MASM allocates storage for word2 at location DS:8. Were you to stick another variable after word2, MASM would allocate storage for it at location DS:0A.
mov ax,wordvarinstruction into
mov ax,ds:.So now you can use symbolic names for your variables and completely ignore the fact that these variables are really memory locations with corresponding offsets into the data segment.
colors = (red, blue, green, purple, orange, yellow, white, black);Most Pascal compilers will assign the value zero to red, one to blue, two to green, etc.
mov color,2is the same thing as saying
color:=greenin Pascal. (Later, you'll even learn how to use more meaningful statements like
mov color,greento assign the color green to the color variable).
identifier db ? identifier byte ? and identifier sbyte ?identifier represents the name of your byte variable. "
db" is an older term that predates MASM 6.x. You will see this directive used quite a bit by other programmers (especially those who are not using MASM 6.x or later) but Microsoft considers it to be an obsolete term; you should always use the
bytedeclaration declares unsigned byte variables. You should use this declaration for all byte variables except small signed integers. For signed integer values, use the
sbyte(signed byte) directive.
i db ? j byte ? k sbyte ? . . . mov i, 0 mov j, 245 mov k, -5 mov al, i mov j, al etc.Although MASM 6.x performs a small amount of type checking, you should not get the idea that assembly language is a strongly typed language. In fact, MASM 6.x will only check the values you're moving around to verify that they will fit in the target location. All of the following are legal in MASM 6.x:
mov k, 255 mov j, -5 mov i, -127Since all of these variables are byte-sized variables, and all the associated constants will fit into eight bits, MASM happily allows each of these statements. Yet if you look at them, they are logically incorrect. What does it mean to move -5 into an unsigned byte variable? Since signed byte values must be in the range -128...127, what happens when you store the value 255 into a signed byte variable? Well, MASM simply converts these values to their eight bit equivalents (-5 becomes 0FBh, 255 becomes 0FFh [-1], etc.).
mov al, -5 . ; Any number of statements which do not affect AL . mov j, alThere is, unfortunately, no way the assembler is going to be able to tell you that you're storing an illegal value into
j. The registers, by their very nature, are neither signed nor unsigned. Therefore the assembler will let you store a register into a variable regardless of the value that may be in that register.
mov i, ax ;Cannot move 16 bits into eight mov i, 300 ;300 won't fit in eight bits. mov k, -130 ;-130 won't fit into eight bits.You might ask "if the assembler doesn't really differentiate signed and unsigned values, why bother with them? Why not simply use
dball the time?" Well, there are two reasons. First, it makes your programs easier to read and understand if you explicitly state (by using byte and sbyte) which variables are signed and which are unsigned. Second, who said anything about the assembler ignoring whether the variables are signed or unsigned? The
movinstruction ignores the difference, but there are other instructions that do not.
i db 0 j byte 255 k sbyte -1In this example, the assembler will initialize
kto zero, 255, and -1, respectively, when the program loads into memory. This fact will prove quite useful later on, especially when discussing tables and arrays. Once again, the assembler only checks the sizes of the operands. It does not check to make sure that the operand for the
bytedirective is positive or that the value in the operand field of
sbyteis in the range -128...127. MASM will allow any value in the range -128...255 in the operand field of any of these statements.
bytevariables as positive values.
swordstatements to declare word variables. The following examples demonstrate their use:
NoSignedWord dw ? UnsignedWord word ? SignedWord sword ? Initialized0 word 0 InitializedM1 sword -1 InitializedBig word 65535 InitializedOfs dw NoSignedWordMost of these declarations are slight modifications of the byte declarations you saw in the last section. Of course you may initialize any word variable to a value in the range -32768...65535 (the union of the range for signed and unsigned 16 bit constants). The last declaration above, however, is new. In this case a label appears in the operand field (specifically, the name of the NoSignedWord variable). When a label appears in the operand field the assembler will substitute the offset of that label (within the variable's segment). If these were the only declarations in dseg and they appeared in this order, the last declaration above would initialize InitializedOfs with the value zero since NoSignedWord's offset is zero within the data segment. This form of initialization is quite useful for initializing pointers. But more on that subject later.
swordvariables. It always displays the unsigned values as positive integers. On the other hand, it will display
swordvariables as signed values (complete with minus sign, if the value is negative). Debugging support is one of the main reasons you'll want to use
sdwordinstructions to declare four-byte integers, pointers, and other variables types. Such variables will allow values in the range -2,147,483,648...4,294,967,295 (the union of the range of signed and unsigned four-byte integers). You use these declarations like the
NoSignedDWord dd ? UnsignedDWord dword ? SignedDWord sdword ? InitBig dword 4000000000 InitNegative sdword -1 InitPtr dd InitBigThe last example initializes a double word pointer with the segment:offset address of the InitBig variable.
movinstructions on processors earlier than the 80386 are
lds, you will get an error if you attempt to access dword variables on these earlier processors using a
movinstruction. Of course, even on the 80386 you cannot move a 32 bit variable into a 16 bit register, you must use the 32 bit registers. Later, you'll learn how to manipulate 32 bit variables, even on a 16 bit processor. Until then, just pretend that you can't.
sdword. This will help you see the actual values your variables have when you're debugging your programs. CodeView only does this, though, if you use the proper declarations for your variables. Always use
sdwordfor signed values and
dwordis best) for unsigned values.
dt/tbytestatements. Declarations using these statements were originally intended for floating point and BCD values. There are better directives for the floating point variables and you don't need to concern yourself with the other data types you'd use these directives for. The following discussion is for completeness' sake.
df/fwordstatement's main utility is declaring 48 bit pointers for use in 32 bit protected mode on the 80386 and later. Although you could use this directive to create an arbitrary six byte variable, there are better directives for doing that. You should only use this directive for 48 bit far pointers on the 80386.
dq/qwordlets you declare quadword (eight byte) variables. The original purpose of this directive was to let you create 64 bit double precision floating point variables and 64 bit integer variables. There are better directives for creating floating point variables. As for 64 bit integers, you won't need them very often on the 80x86 CPU (at least, not until Intel releases a member of the 80x86 family with 64 bit general purpose registers).
dt/tbytedirectives allocate ten bytes of storage. There are two data types indigenous to the 80x87 (math coprocessor) family that use a ten byte data type: ten byte BCD values and extended precision (80 bit) floating point values. This text will pretty much ignore the BCD data type. As for the floating point type, once again there is a better way to do it.
dtthese statements reserve four, eight, and ten bytes. The operand fields for these statements may contain a question mark (if you don't want to initialize the variable) or it may contain an initial value in floating point form. The following examples demonstrate their use:
x real4 1.5 y real8 1.0e-25 z real10 -1.2594e+10Note that the operand field must contain a valid floating point constant using either decimal or scientific notation. In particular, pure integer constants are not allowed. The assembler will complain if you use an operand like the following:
x real4 1To correct this, change the operand field to "1.0".
addinstruction to add two floating point values. This text will cover floating point arithmetic in a later chapter. Nonetheless, it's appropriate to discuss how to declare floating point variables in the chapter on data structures.
dtto declare floating point variables (since these directives reserve the necessary four, eight, or ten bytes of space). You can even initialize such variables with floating point constants in the operand field. But there are two major drawbacks to declaring variables this way. First, as with bytes, words, and double words, the CodeView debugger will only display your floating point variables properly if you use the
real10directives. If you use
dt,CodeView will display your values as four, eight, or ten byte unsigned integers. Another, potentially bigger, problem with using
dtis that they allow both integer and floating point constant initializers (remember,
real10do not). Now this might seem like a good feature at first glance. However, the integer representation for the value one is not the same as the floating point representation for the value 1.0. So if you accidentally enter the value "1" in the operand field when you really meant "1.0", the assembler would happily digest this and then give you incorrect results. Hence, you should always use the
real10statements to declare floating point variables.
integer typedef sword char typedef byte boolean typedef byte float typedef real4 colors typedef byteNow you can declare your variables with more meaningful statements like:
i integer ? ch char ? FoundIt boolean ? x float ? HouseColor colors ?If you are an Ada, C, or FORTRAN programmer (or any other language, for that matter), you can pick type names you're more comfortable with. Of course, this doesn't change how the 80x86 or MASM reacts to these variables one iota, but it does let you create programs that are easier to read and understand since the type names are more indicative of the actual underlying types.
swordtype, CodeView will display variables of type integer as signed values. Likewise, if you define float to mean
real4, CodeView will still properly display float variables as four-byte floating point values.
M: array [0..1023] of integer;Even if you don't know Pascal, the concept here is pretty easy to understand. M is an array with 1024 integers in it, indexed from
M.Each one of these array elements can hold an integer value that is independent of all the others. In other words, this array gives you 1024 different integer variables each of which you refer to by number (the array index) rather than by name.
M:=100you probably wouldn't have to think at all about what is happening with this statement. It is storing the value 100 into the first element of the array M. Now consider the following two statements:
i := 0; (* Assume "i" is an integer variable *) M [i] := 100;You should agree, without too much hesitation, that these two statements perform the same exact operation as
M:=100;. Indeed, you're probably willing to agree that you can use any integer expression in the range 0...1023 as an index into this array. The following statements still perform the same operation as our single assignment to index zero:
i := 5; (* assume all variables are integers*) j := 10; k := 50; m [i*j-k] := 100;"Okay, so what's the point?" you're probably thinking. "Anything that produces an integer in the range 0...1023 is legal. So what?" Okay, how about the following:
M  := 0; M [ M  ] := 100;Whoa! Now that takes a few moments to digest. However, if you take it slowly, it makes sense and you'll discover that these two instructions perform the exact same operation you've been doing all along. The first statement stores zero into array element
M. The second statement fetches the value of
M, which is an integer so you can use it as an array index into M, and uses that value (zero) to control where it stores the value 100.
mis a pointer! Well, not really, but if you were to change "M" to "memory" and treat this array as all of memory, this is the exact definition of a pointer.
pthat contains 1000h, then
p"points" at memory location 1000h in dseg. To access the word that
ppoints at, you could use code like the following:
mov bx, p ;Load BX with pointer. mov ax, [bx] ;Fetch data that p points at.By loading the value of
bxthis code loads the value 1000h into
pcontains 1000h and, therefore, points at memory location 1000h in dseg). The second instruction above loads the
axregister with the word starting at the location whose offset appears in
bxnow contains 1000h, this will load
axfrom locations DS:1000 and DS:1001.
axdirectly from location 1000h using an instruction like
mov ax,ds:[1000h]? Well, there are lots of reasons. But the primary reason is that this single instruction always loads
axfrom location 1000h. Unless you are willing to mess around with self-modifying code, you cannot change the location from which it loads
ax. The previous two instructions, however, always load
axfrom the location that
ppoints at. This is very easy to change under program control, without using self-modifying code. In fact, the simple instruction
mov p,2000hwill cause those two instructions above to load
axfrom memory location DS:2000 the next time they execute. Consider the following instructions:
lea bx, i ;This can actually be done with mov p, bx ; a single instruction as you'll . ; see in Chapter Eight. . < Some code that skips over the next two instructions > lea bx, j ;Assume the above code skips these mov p, bx ; two instructions, that you get . ; here by jumping to this point from . ; somewhere else. mov bx, p ;Assume both code paths above wind mov ax, [bx] ; up down here.This short example demonstrates two execution paths through the program. The first path loads the variable
pwith the address of the variable
bxwith the offset of the second operand). The second path through the code loads
pwith the address of the variable
j. Both execution paths converge on the last two
movinstructions that load
jdepending upon which execution path was taken. In many respects, this is like a parameter to a procedure in a high level language like Pascal. Executing the same instructions accesses different variables depending on whose address (
j) winds up in
bx, bp, si,or
diand the segment portion into a segment register (typically
es). Then you could access the object using the register indirect addressing mode. Since the
lesinstruction is so convenient for this operation, it is the perfect choice for loading
esand one of the above four registers with a pointer value. The following sample code stores the value in
alinto the byte pointed at by the far pointer
les bx, p ;Load p into ES:BX mov es:[bx], al ;Store away ALSince near pointers are 16 bits long and far pointers are 32 bits long, you could simply use the
dd/dworddirectives to allocate storage for your pointers (pointers are inherently unsigned, so you wouldn't normally use
sdwordto declare a pointer). However, there is a much better way to do this by using the
typedefstatement. Consider the following general forms:
typename typedef near ptr basetype typename typedef far ptr basetypeIn these two examples typename represents the name of the new type you're creating while basetype is the name of the type you want to create a pointer for. Let's look at some specific examples:
nbytptr typedef near ptr byte fbytptr typedef far ptr byte colorsptr typedef far ptr colors wptr typedef near ptr word intptr typedef near ptr integer intHandle typedef near ptr intptr(these declarations assume that you've previously defined the types colors and integer with the
typedefstatements with the near ptr operands produce 16 bit near pointers. Those with the far ptr operands produce 32 bit far pointers. MASM 6.x ignores the base type supplied after the near ptr or far ptr. However, CodeView uses the base type to display the object a pointer refers to in its correct format.
bytestr nbytptr ? bytestr2 fbytptr ? CurrentColor colorsptr ? CurrentItem wptr ? LastInt intptr ?Of course, you can initialize these pointers at assembly time if you know where they are going to point when the program first starts running. For example, you could initialize the bytestr variable above with the offset of MyString using the following declaration:
bytestr nbytptr MyString