# Data Representation Chapter Three

## Preview text

Data Representation

Data Representation

Chapter Three

A major stumbling block many beginners encounter when attempting to learn assembly language is the common use of the binary and hexadecimal numbering systems. Many programmers think that hexadecimal (or hex1) numbers represent absolute proof that God never intended anyone to work in assembly language. While it is true that hexadecimal numbers are a little different from what you may be used to, their advantages outweigh their disadvantages by a large margin. Nevertheless, understanding these numbering systems is important because their use simpliﬁes other complex topics including boolean algebra and logic design, signed numeric representation, character codes, and packed data.

3.1 Chapter Overview
This chapter discusses several important concepts including the binary and hexadecimal numbering systems, binary data organization (bits, nibbles, bytes, words, and double words), signed and unsigned numbering systems, arithmetic, logical, shift, and rotate operations on binary values, bit ﬁelds and packed data. This is basic material and the remainder of this text depends upon your understanding of these concepts. If you are already familiar with these terms from other courses or study, you should at least skim this material before proceeding to the next chapter. If you are unfamiliar with this material, or only vaguely familiar with it, you should study it carefully before proceeding. All of the material in this chapter is important! Do not skip over any material. In addition to the basic material, this chapter also introduces some new HLA statements and HLA Standard Library routines.

3.2 Numbering Systems
Most modern computer systems do not represent numeric values using the decimal system. Instead, they typically use a binary or two’s complement numbering system. To understand the limitations of computer arithmetic, you must understand how computers represent numbers.

3.2.1 A Review of the Decimal System
You’ve been using the decimal (base 10) numbering system for so long that you probably take it for granted. When you see a number like “123”, you don’t think about the value 123; rather, you generate a mental image of how many items this value represents. In reality, however, the number 123 represents:
1*102 + 2 * 101 + 3*100 or
100+20+3
In the positional numbering system, each digit appearing to the left of the decimal point represents a value between zero and nine times an increasing power of ten. Digits appearing to the right of the decimal point represent a value between zero and nine times an increasing negative power of ten. For example, the value 123.456 means:
1*102 + 2*101 + 3*100 + 4*10-1 + 5*10-2 + 6*10-3 or
100 + 20 + 3 + 0.4 + 0.05 + 0.006

1. Hexadecimal is often abbreviated as hex even though, technically speaking, hex means base six, not base sixteen.

Beta Draft - Do not distribute

Page 53

Chapter Three

Volume 1

3.2.2 The Binary Numbering System
Most modern computer systems (including PCs) operate using binary logic. The computer represents values using two voltage levels (usually 0v and +2.4..5v). With two such levels we can represent exactly two different values. These could be any two different values, but they typically represent the values zero and one. These two values, coincidentally, correspond to the two digits used by the binary numbering system. Since there is a correspondence between the logic levels used by the 80x86 and the two digits used in the binary numbering system, it should come as no surprise that the PC employs the binary numbering system.
The binary numbering system works just like the decimal numbering system, with two exceptions: binary only allows the digits 0 and 1 (rather than 0-9), and binary uses powers of two rather than powers of ten. Therefore, it is very easy to convert a binary number to decimal. For each “1” in the binary string, add in 2n where “n” is the zero-based position of the binary digit. For example, the binary value 110010102 represents:
1*27 + 1*26 + 0*25 + 0*24 + 1*23 + 0*22 + 1*21 + 0*20 =
128 + 64 + 8 + 2 =
20210
To convert decimal to binary is slightly more difﬁcult. You must ﬁnd those powers of two which, when added together, produce the decimal result. One method is to work from a large power of two down to 20. Consider the decimal value 1359:
• 210 =1024, 211=2048. So 1024 is the largest power of two less than 1359. Subtract 1024 from 1359 and begin the binary value on the left with a “1” digit. Binary = ”1”, Decimal result is 1359 - 1024 = 335.
• The next lower power of two (29 = 512) is greater than the result from above, so add a “0” to the end of the binary string. Binary = “10”, Decimal result is still 335.
• The next lower power of two is 256 (28). Subtract this from 335 and add a “1” digit to the end of the binary number. Binary = “101”, Decimal result is 79.
• 128 (27) is greater than 79, so tack a “0” to the end of the binary string. Binary = “1010”, Decimal result remains 79.
• The next lower power of two (26 = 64) is less than79, so subtract 64 and append a “1” to the end of the binary string. Binary = “10101”, Decimal result is 15.
• 15 is less than the next power of two (25 = 32) so simply add a “0” to the end of the binary string. Binary = “101010”, Decimal result is still 15.
• 16 (24) is greater than the remainder so far, so append a “0” to the end of the binary string. Binary = “1010100”, Decimal result is 15.
• 23 (eight) is less than 15, so stick another “1” digit on the end of the binary string. Binary = “10101001”, Decimal result is 7.
• 22 is less than seven, so subtract four from seven and append another one to the binary string. Binary = “101010011”, decimal result is 3.
• 21 is less than three, so append a one to the end of the binary string and subtract two from the decimal value. Binary = “1010100111”, Decimal result is now 1.
• Finally, the decimal result is one, which is 20, so add a ﬁnal “1” to the end of the binary string. The ﬁnal binary result is “10101001111”
If you actually have to convert a decimal number to binary by hand, the algorithm above probably isn’t the easiest to master. A simpler solution is the “even/odd – divide by two” algorithm. This algorithm uses the following steps:
• If the number is even, emit a zero. If the number is odd, emit a one. • Divide the number by two and throw away any fractional component or remainder.

Page 54

Beta Draft - Do not distribute

Data Representation
• If the quotient is zero, the algorithm is complete. • If the quotient is not zero and is odd, insert a one before the current string; if the number is
even, preﬁx your binary string with zero. • Go back to step two above and repeat.
Fortunately, you’ll rarely need to convert decimal numbers directly to binary strings, so neither of these algorithms is particularly important in real life.
Binary numbers, although they have little importance in high level languages, appear everywhere in assembly language programs (even if you don’t convert between decimal and binary). So you should be somewhat comfortable with them.

3.2.3 Binary Formats

In the purest sense, every binary number contains an inﬁnite number of digits (or bits which is short for binary digits). For example, we can represent the number ﬁve by:

101

00000101

0000000000101

... 000000000000101

Any number of leading zero bits may precede the binary number without changing its value.

We will adopt the convention of ignoring any leading zeros if present in a value. For example, 1012 represents the number ﬁve but since the 80x86 works with groups of eight bits, we’ll ﬁnd it much easier to zero extend all binary numbers to some multiple of four or eight bits. Therefore, following this convention, we’d represent the number ﬁve as 01012 or 000001012.
In the United States, most people separate every three digits with a comma to make larger numbers easier to read. For example, 1,023,435,208 is much easier to read and comprehend than 1023435208. We’ll adopt a similar convention in this text for binary numbers. We will separate each group of four binary bits with an underscore. For example, we will write the binary value 1010111110110010 as 1010_1111_1011_0010.

We often pack several values together into the same binary number. One form of the 80x86 MOV instruction uses the binary encoding 1011 0rrr dddd dddd to pack three items into 16 bits: a ﬁve-bit operation code (1_0110), a three-bit register ﬁeld (rrr), and an eight-bit immediate value (dddd_dddd). For convenience, we’ll assign a numeric value to each bit position. We’ll number each bit as follows:

1)

The rightmost bit in a binary number is bit position zero.

2)

Each bit to the left is given the next successive bit number.

An eight-bit binary value uses bits zero through seven: X7 X6 X5 X4 X3 X2 X1 X0
A 16-bit binary value uses bit positions zero through ﬁfteen: X15 X14 X13 X12 X11 X10 X9 X8 X7 X6 X5 X4 X3 X2 X1 X0
A 32-bit binary value uses bit positions zero through 31, etc. Bit zero is usually referred to as the low order (L.O.) bit (some refer to this as the least signiﬁcant bit).
The left-most bit is typically called the high order (H.O.) bit (or the most signiﬁcant bit). We’ll refer to the intermediate bits by their respective bit numbers.

Beta Draft - Do not distribute

Page 55

Chapter Three

Volume 1

3.3 Data Organization
In pure mathematics a value may take an arbitrary number of bits. Computers, on the other hand, generally work with some speciﬁc number of bits. Common collections are single bits, groups of four bits (called nibbles), groups of eight bits (bytes), groups of 16 bits (words), groups of 32 bits (double words or dwords), groups of 64-bits (quad words or qwords), and more. The sizes are not arbitrary. There is a good reason for these particular values. This section will describe the bit groups commonly used on the Intel 80x86 chips.

3.3.1 Bits
The smallest “unit” of data on a binary computer is a single bit. Since a single bit is capable of representing only two different values (typically zero or one) you may get the impression that there are a very small number of items you can represent with a single bit. Not true! There are an inﬁnite number of items you can represent with a single bit.
With a single bit, you can represent any two distinct items. Examples include zero or one, true or false, on or off, male or female, and right or wrong. However, you are not limited to representing binary data types (that is, those objects which have only two distinct values). You could use a single bit to represent the numbers 723 and 1,245. Or perhaps 6,254 and 5. You could also use a single bit to represent the colors red and blue. You could even represent two unrelated objects with a single bit. For example, you could represent the color red and the number 3,256 with a single bit. You can represent any two different values with a single bit. However, you can represent only two different values with a single bit.
To confuse things even more, different bits can represent different things. For example, one bit might be used to represent the values zero and one, while an adjacent bit might be used to represent the values true and false. How can you tell by looking at the bits? The answer, of course, is that you can’t. But this illustrates the whole idea behind computer data structures: data is what you deﬁne it to be. If you use a bit to represent a boolean (true/false) value then that bit (by your deﬁnition) represents true or false. For the bit to have any real meaning, you must be consistent. That is, if you’re using a bit to represent true or false at one point in your program, you shouldn’t use the true/false value stored in that bit to represent red or blue later.
Since most items you’ll be trying to model require more than two different values, single bit values aren’t the most popular data type you’ll use. However, since everything else consists of groups of bits, bits will play an important role in your programs. Of course, there are several data types that require two distinct values, so it would seem that bits are important by themselves. However, you will soon see that individual bits are difﬁcult to manipulate, so we’ll often use other data types to represent boolean values.

3.3.2 Nibbles
A nibble is a collection of four bits. It wouldn’t be a particularly interesting data structure except for two items: BCD (binary coded decimal) numbers2 and hexadecimal numbers. It takes four bits to represent a single BCD or hexadecimal digit. With a nibble, we can represent up to 16 distinct values since there are 16 unique combinations of a string of four bits:
0000 0001 0010 0011 0100 0101 0110 0111 1000

2. Binary coded decimal is a numeric scheme used to represent decimal numbers using four bits for each decimal digit.

Page 56

Beta Draft - Do not distribute

Data Representation
1001 1010 1011 1100 1101 1110 1111
In the case of hexadecimal numbers, the values 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, and F are represented with four bits (see “The Hexadecimal Numbering System” on page 60). BCD uses ten different digits (0, 1, 2, 3, 4, 5, 6, 7, 8, 9) and requires four bits (since you can only represent eight different values with three bits). In fact, any sixteen distinct values can be represented with a nibble, but hexadecimal and BCD digits are the primary items we can represent with a single nibble.
3.3.3 Bytes
Without question, the most important data structure used by the 80x86 microprocessor is the byte. A byte consists of eight bits and is the smallest addressable datum (data item) on the 80x86 microprocessor. Main memory and I/O addresses on the 80x86 are all byte addresses. This means that the smallest item that can be individually accessed by an 80x86 program is an eight-bit value. To access anything smaller requires that you read the byte containing the data and mask out the unwanted bits. The bits in a byte are normally numbered from zero to seven as shown in Figure 3.1.
7654321 0

Figure 3.1

Bit Numbering

Bit 0 is the low order bit or least signiﬁcant bit, bit 7 is the high order bit or most signiﬁcant bit of the byte. We’ll refer to all other bits by their number. Note that a byte also contains exactly two nibbles (see Figure 3.2).
76 5 43 2 1 0

Figure 3.2

H.O. Nibble
The Two Nibbles in a Byte

L.O. Nibble

Bits 0..3 comprise the low order nibble, bits 4..7 form the high order nibble. Since a byte contains exactly two nibbles, byte values require two hexadecimal digits.
Since a byte contains eight bits, it can represent 28, or 256, different values. Generally, we’ll use a byte to represent numeric values in the range 0..255, signed numbers in the range -128..+127 (see “Signed and Unsigned Numbers” on page 69), ASCII/IBM character codes, and other special data types requiring no more than 256 different values. Many data types have fewer than 256 items so eight bits is usually sufﬁcient.

Beta Draft - Do not distribute

Page 57

Chapter Three

Volume 1

Since the 80x86 is a byte addressable machine (see the next volume), it turns out to be more efﬁcient to manipulate a whole byte than an individual bit or nibble. For this reason, most programmers use a whole byte to represent data types that require no more than 256 items, even if fewer than eight bits would sufﬁce. For example, we’ll often represent the boolean values true and false by 000000012 and 000000002 (respectively).
Probably the most important use for a byte is holding a character code. Characters typed at the keyboard, displayed on the screen, and printed on the printer all have numeric values. To allow it to communicate with the rest of the world, PCs use a variant of the ASCII character set (see “The ASCII Character Encoding” on page 97). There are 128 deﬁned codes in the ASCII character set. PCs typically use the remaining 128 possible values for extended character codes including European characters, graphic symbols, Greek letters, and math symbols.
Because bytes are the smallest unit of storage in the 80x86 memory space, bytes also happen to be the smallest variable you can create in an HLA program. As you saw in the last chapter, you can declare an eight-bit signed integer variable using the int8 data type. Since int8 objects are signed, you can represent values in the range -128..+127 using an int8 variable (see “Signed and Unsigned Numbers” on page 69 for a discussion of signed number formats). You should only store signed values into int8 variables; if you want to create an arbitrary byte variable, you should use the byte data type, as follows:
static byteVar: byte;

The byte data type is a partially untyped data type. The only type information associated with byte objects is their size (one byte). You may store any one-byte object (small signed integers, small unsigned integers, characters, etc.) into a byte variable. It is up to you to keep track of the type of object you’ve put into a byte variable.

3.3.4 Words
A word is a group of 16 bits. We’ll number the bits in a word starting from zero on up to ﬁfteen. The bit numbering appears in Figure 3.3.

15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Figure 3.3

Bit Numbers in a Word

Like the byte, bit 0 is the low order bit. For words, bit 15 is the high order bit. When referencing the other bits in a word, use their bit position number.
Notice that a word contains exactly two bytes. Bits 0 through 7 form the low order byte, bits 8 through 15 form the high order byte (see Figure 3.4).

Page 58

Beta Draft - Do not distribute

Data Representation
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Figure 3.4

H. O. Byte
The Two Bytes in a Word

L. O. Byte

Naturally, a word may be further broken down into four nibbles as shown in Figure 3.5.
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Nibble #3 H. O. Nibble

Nibble #2

Figure 3.5

Nibbles in a Word

Nibble #1

Nibble #0 L. O. Nibble

Nibble zero is the low order nibble in the word and nibble three is the high order nibble of the word. We’ll simply refer to the other two nibbles as “nibble one” or “nibble two. “
With 16 bits, you can represent 216 (65,536) different values. These could be the values in the range 0..65,535 or, as is usually the case, -32,768..+32,767, or any other data type with no more than 65,536 values. The three major uses for words are signed integer values, unsigned integer values, and UNICODE characters.
Words can represent integer values in the range 0..65,535 or -32,768..32,767. Unsigned numeric values are represented by the binary value corresponding to the bits in the word. Signed numeric values use the two’s complement form for numeric values (see “Signed and Unsigned Numbers” on page 69). As UNICODE characters, words can represent up to 65,536 different characters, allowing the use of non-Roman character sets in a computer program. UNICODE is an international standard, like ASCII, that allows commputers to process non-Roman characters like Asian, Greek, and Russian characters.
Like bytes, you can also create word variables in an HLA program. Of course, in the last chapter you saw how to create sixteen-bit signed integer variables using the int16 data type. To create an arbitrary word variable, just use the word data type, as follows:
static w: word;

3.3.5 Double Words
A double word is exactly what its name implies, a pair of words. Therefore, a double word quantity is 32 bits long as shown in Figure 3.6.

Beta Draft - Do not distribute

Page 59

Chapter Three

Volume 1

31

23

15

7

0

Figure 3.6

Bit Numbers in a Double Word

Naturally, this double word can be divided into a high order word and a low order word, four different bytes, or eight different nibbles (see Figure 3.7).

31

23

H.O. Word

15

7

0

L.O. Word

31 H.O. Byte

23 Byte # 2

15

7

0

Byte # 1

L.O. Byte

31

23

15

Nibble #7

#6

H. O.

#5

#4

#3

Figure 3.7

Nibbles, Bytes, and Words in a Double Word

7

#2

#1

0
#0 L. O.

Double words can represent all kinds of different things. A common item you will represent with a double word is a 32-bit integer value (which allows unsigned numbers in the range 0..4,294,967,295 or signed numbers in the range -2,147,483,648..2,147,483,647). 32-bit ﬂoating point values also ﬁt into a double word. Another common use for dword objects is to store pointer variables.
In the previous chapter, you saw how to create 32-bit (dword) signed integer variables using the int32 data type. You can also create an arbitrary double word variable using the dword data type as the following example demonstrates:
static d: dword;

A big problem with the binary system is verbosity. To represent the value 20210 requires eight binary digits. The decimal version requires only three decimal digits and, thus, represents numbers much more compactly than does the binary numbering system. This fact was not lost on the engineers who designed binary computer systems. When dealing with large values, binary numbers quickly become too unwieldy.

Page 60

Beta Draft - Do not distribute

Data Representation
Unfortunately, the computer thinks in binary, so most of the time it is convenient to use the binary numbering system. Although we can convert between decimal and binary, the conversion is not a trivial task. The hexadecimal (base 16) numbering system solves these problems. Hexadecimal numbers offer the two features we’re looking for: they’re very compact, and it’s simple to convert them to binary and vice versa. Because of this, most computer systems engineers use the hexadecimal numbering system. Since the radix (base) of a hexadecimal number is 16, each hexadecimal digit to the left of the hexadecimal point represents some value times a successive power of 16. For example, the number 123416 is equal to: 1 * 163 + 2 * 162 + 3 * 161 + 4 * 160 or
4096 + 512 + 48 + 4 = 466010.
Each hexadecimal digit can represent one of sixteen values between 0 and 1510. Since there are only ten decimal digits, we need to invent six additional digits to represent the values in the range 1010 through 1510. Rather than create new symbols for these digits, we’ll use the letters A through F. The following are all examples of valid hexadecimal numbers:
123416 DEAD16 BEEF16 0AFB16 FEED16 DEAF16 Since we’ll often need to enter hexadecimal numbers into the computer system, we’ll need a different mechanism for representing hexadecimal numbers. After all, on most computer systems you cannot enter a subscript to denote the radix of the associated value. We’ll adopt the following conventions:
• All hexadecimal values begin with a “\$” character, e.g., \$123A4. • All binary values begin with a percent sign (“%”). • Decimal numbers do not have a preﬁx character. • If the radix is clear from the context, this text may drop the leading “\$” or “%” character.
\$1234 \$DEAD \$BEEF \$AFB \$FEED \$DEAF
As you can see, hexadecimal numbers are compact and easy to read. In addition, you can easily convert between hexadecimal and binary. Consider the following table:

Table 4: Binary/Hex Conversion

Binary

%0000

\$0

%0001

\$1

%0010

\$2

%0011

\$3

%0100

\$4

%0101

\$5

%0110

\$6

%0111

\$7

%1000

\$8

%1001

\$9

Beta Draft - Do not distribute

Page 61

Chapter Three

Table 4: Binary/Hex Conversion

Binary

%1010

\$A

%1011

\$B

%1100

\$C

%1101

\$D

%1110

\$E

%1111

\$F

Volume 1

This table provides all the information you’ll ever need to convert any hexadecimal number into a binary number or vice versa.

To convert a hexadecimal number into a binary number, simply substitute the corresponding four bits for each hexadecimal digit in the number. For example, to convert \$ABCD into a binary value, simply convert each hexadecimal digit according to the table above:

0

A

B

C

D

0000 1010 1011 1100 1101

Binary

To convert a binary number into hexadecimal format is almost as easy. The ﬁrst step is to pad the binary number with zeros to make sure that there is a multiple of four bits in the number. For example, given the binary number 1011001010, the ﬁrst step would be to add two bits to the left of the number so that it contains 12 bits. The converted binary value is 001011001010. The next step is to separate the binary value into groups of four bits, e.g., 0010_1100_1010. Finally, look up these binary values in the table above and substitute the appropriate hexadecimal digits, i.e., \$2CA. Contrast this with the difﬁculty of conversion between decimal and binary or decimal and hexadecimal!

Since converting between hexadecimal and binary is an operation you will need to perform over and over again, you should take a few minutes and memorize the table above. Even if you have a calculator that will do the conversion for you, you’ll ﬁnd manual conversion to be a lot faster and more convenient when converting between binary and hex.

3.5 Arithmetic Operations on Binary and Hexadecimal Numbers
There are several operations we can perform on binary and hexadecimal numbers. For example, we can add, subtract, multiply, divide, and perform other arithmetic operations. Although you needn’t become an expert at it, you should be able to, in a pinch, perform these operations manually using a piece of paper and a pencil. Having just said that you should be able to perform these operations manually, the correct way to perform such arithmetic operations is to have a calculator that does them for you. There are several such calculators on the market; the following table lists some of the manufacturers who produce such devices:
Some manufacturers of Hexadecimal Calculators (circa 2002):
• Casio • Hewlett-Packard • Sharp • Texas Instruments
This list is by no means exhaustive. Other calculator manufacturers probably produce these devices as well. The Hewlett-Packard devices are arguably the best of the bunch . However, they are more expensive

Page 62