An important part of the use of logic circuits is for computing
various mathematical operations such as addition, multiplication,
trigonometric operations, etc.
We must therefore have a way of representing numbers as binary
data.
The easiest numbers to represent are the nonnegative integers. To
see how this can be done, recall how we represent number in the
decimal system. A number such as 2034 is interpreted
as:
2*103 + 0*102 + 3*101 + 4*100
But there is nothing special with the base 10, so we
can just as well use base 2. In base 2,
each digit value is either 0 or 1, which
we can represent for instance by false and true,
respectively.
In fact, we have already hinted at this possibility, since we
usually write 0, and 1 instead of
false and true.
All the normal algorithms for decimal arithmetic have versions
for binary arithmetic, except that they are usually simpler.
For adding two numbers, it suffices to notice that there is a
carry of 1 whenever we add 0 and 1,
or 1 and 1:
1 1 1
- - -
1 0 1 1
+ 1 0 0 1
-----------
1 0 1 0 0
Subtraction is no harder:
10 10
-- --
1 0 0 1
- 1 1 0
--------------
0 0 1 1
For multiplying two number, the algorithm is simpler, since we
only multiply by 0 (which gives all zeros), or 1
(which gives the original number):
1 1 0 1
* 1 0 1
-----------
1 1 0 1
0 0 0 0
1 1 0 1
--------------
1 0 0 0 0 0 1
Finally, division is just repeated subtraction as in decimal
arithmetic:
0 1 1 0
--------
1 0 | 1 1 0 1
---- 1 0
---
1 0
1 0
---
0 1
Things are easy as long as we stick to nonnegative integers. They
become more complicated when we want to represent negative integers
as well.
It may seem that we can do just what we do in decimal
representation, i.e., indicate with a special sign whether the
number is negative or not. In binary arithmetic, we could simply
reserve one bit to determine the sign. In the circuitry for
addition, we would have one circuit for adding two number, and
another for subtracting two numbers. The combination of signs of the
two inputs would determine which circuit to use on the absolute
values, as well as the sign of the output.
While this method works, it turns out that there is one that is
much easier to deal with by electronic circuits. This method is
called the "two's complement" method. It turns out that with this
method, we do not need a special circuit for subtracting two
numbers.
In order to explain this method, we first show how it would work
in decimal arithmetic with infinite precision. Then we show how it
works with binary arithmetic, and finally how it works with finite
precision.
Imagine the odometer of an automobile. It has a certain number of
wheels, each with the ten digits on it. When one wheel goes from
9 to 0, the wheel immediately to the left
of it, advances by one position. If that wheel already
showed 9, it too goes to 0 and advances
the wheel to its left, etc. Suppose we run the car
backwards. Then the reverse happens, i.e. when a wheel goes from
0 to 9, the wheel to its left decreases by
one.
Now suppose we have an odometer with an infinite number of
wheels. We are going to use this infinite odometer to represent
all the integers.
When all the wheels are 0, we interpret the value as
the integer 0.
A positive integer n is represented by an odometer
position obtained by advancing the rightmost wheel n
positions from 0. Notice that for each such positive
number, there will be an infinite number of wheels with the value
0 to the left.
A negative integer n is represented by an odometer
position obtained by decreasing the rightmost wheel n
positions from 0. Notice that for each such positive
number, there will be an infinite number of wheels with the value
9 to the left.
In fact, we don't need an infinite number of wheels. For
each number only a finite number of wheels is needed. We simply
assume that the leftmost wheel (which will be either 0
or 9) is duplicated an infinite number of times to the
left.
While for each number we only need a finite number of wheels, the
number of wheels is unbounded, i.e., we cannot use a
particular finite number of wheels to represent all the
numbers. The difference is subtle but important (but perhaps not
that important for this particular course). If we need an infinite
number of wheels, then there is no hope of ever using this
representation in a program, since that would require an
infinite-size memory. If we only need an unbounded number of wheels,
we may run out of memory, but we can represent a lot of numbers
(each of finite size) in a useful way. Since any program that runs
in finite time only uses a finite number of numbers, with a large
enough memory, we might be able to run our program.
Now suppose we have an addition circuit that can handle nonzero
integers with an infinite number of digits. In other words, when
given a number starting with an infinite number of 9s,
it will interpret this as an infinitely large positive number,
whereas our interpretation of it will be a negative number. Let us
say, we give this circuit the two numbers ...9998
(which we interpret as -2) and ...0005
(which we interpret as +5). It will add the two
numbers. First it adds 8 and 5 which gives
3 and a carry of 1. Next, it adds 9
and the carry 1, giving 0 and a carry of
1. For all remaining (infinitely many) positions, the
value will be 0 with a carry of 1, so the
final result is ...0003. This result is the correct
one, even with our interpretation of negative numbers. You may argue
that the carry must end up somewhere, and it does, but in infinity.
In some ways, we are doing arithmetic modulo infinity.
Some implementations of some programming languages with arbitrary
precision integer arithmetic (Lisp for instance) use
exactly this representation of negative integers.
Let us finish this section by giving a simple method for
computing the absolute value of a negative integer in our
representation. It suffices to take each individual digit, replace
it by 9 minus its original value, and then at the end,
add 1 to the number obtained. So for instance, the
number ...9998 becomes 1 plus
...0001 which is ...0002. This method works both
ways, i.e. you can also use it to negate a positive number.
What we have said in the previous section works almost as well
with a fixed bounded number of odometer wheels. The only problem is
that we have to deal with overflow and underflow.
Suppose we have only a fixed number of wheels, say 3.
In this case, we shall use the convention that if the leftmost wheel
shows a digit between 0 and 4 inclusive,
then we have a positive number, equal to its representation. When
instead the leftmost wheel shows a digit between 5 and
9 inclusive, we have a negative number, whose absolute
value can be computed with the method that we have in the previous
section.
We now assume that we have a circuit that can add positive
three-digit numbers, and we shall see how we can use it to add
negative numbers in our representation.
Suppose again we want to add -2 and +5.
The representations for these numbers with three wheels are
998 and 005 respectively. Our addition circuit
will attempt to add the two positive numbers 998 and
005, which gives 1003. But since the
addition circuit only has three digits, it will truncate the result
to 003, which is the right answer for our
interpretation.
A valid question at this point is in which situation our finite
addition circuit will not work. The answer is somewhat complicated.
It is clear that it always gives the correct result when a positive
and a negative number are added. It is incorrect in two situations.
The first situation is when two positive numbers are added, and the
result comes out looking like a negative number, i.e, with a first
digit somewhere between 5 and 9. You
should convince yourself that no addition of two positive numbers
can yield an overflow and still look like a positive number. The
second situation is when two negative numbers are added and the
result comes out looking like a nonnegative number, i.e, with a
first digit somewhere between 0 and 4.
Again, you should convince yourself that no addition of two negative
numbers can yield an underflow and still look like a negative
number.
We now have a circuit for addition of integers (positive or
negative) in our representation. We simply use a circuit for
addition of only positive numbers, plus some circuits that check:
-
If both numbers are positive and the result is negative, then
report overflow.
-
If both numbers are negative and the result is positive, then
report underflow.
So far, we have studied the representation of negative numbers
using ten's complement. In a computer, we prefer using base two
rather than base ten. Luckily, the exact method described in the
previous section works just as well for base two. For an n-bit
adder (n is usually 32 or 64), we can represent positive
numbers with a leftmost digit of 0, which gives values
between 0 and 2(n-1) - 1, and
negative numbers with a leftmost digit of 1, which
gives values between -2(n - 1) and -1.
The exact same rule for overflow and underflow detection works.
If, when adding two positive numbers, we get a result that looks
negative (i.e. with its leftmost bit 1), then we have
an overflow. Similarly, if, when adding two negative numbers, we get
a result that looks positive (i.e. with its leftmost bit 0),
then we have an underflow.
Integers are useful, but sometimes we need to compute with
numbers that are not integer.
An obvious idea is to use rational numbers. Many
algorithms, such as the simplex algorithm for linear optimization,
use only rational arithmetic whenever the input is rational.
There is no particular difficulty in representing rational
numbers in a computer. It suffices to have a pair of integers, one
for the numerator and one for the denominator.
To implement arithmetic on rational numbers, we can use some
additional restrictions on our representation. We may, for instance,
decide that:
-
positive rational numbers are always represented as two
positive integers (the other possibility is as two negative
numbers),
-
negative rational numbers are always represented with a
negative numerator and a positive denominator (the other
possibility is with a positive numerator and a negative
denominator),
-
the numerator and the denominator are always relative prime
(they have no common factors).
Such a set of rules makes sure that our representation is
canonical, i.e., that the representation for a value is unique,
even though, a priori, many representations would work.
Circuits for implementing rational arithmetic would have to take
such rules into account. In particular, the last rule would imply
dividing the two integers resulting from every arithmetic operation
with their largest common factor to obtain the canonical
representation.
Rational numbers and rational arithmetic is not very common in
the hardware of a computer. The reason is probably that rational
numbers don't behave very well with respect to the size of the
representation. For rational numbers to be truly useful, their
components, i.e., the numerator and the denominator, both need to be
arbitrary-precision integers. As we have mentioned before, arbitrary
precision anything does not go very well with fixed-size circuits
inside the CPU of a computer.
Programming languages, on the other hand, sometimes use
arbitrary-precision rational numbers. This is the case, in
particular, with the language Lisp.
Instead of using the obvious representation of rational numbers
presented in the previous section, most computers use a different
representation of a subset of the rational numbers. We call these
numbers floating-point numbers.
Floating-point numbers use inexact arithmetic, and in return
require only a fixed-size representation. For many computations
(so-called scientific computations, as if other computations weren't
scientific) such a representation has the great advantage that it is
fast, while at the same time usually giving adequate precision.
There are some (sometimes spectacular) exceptions to the
"adequate precision" statement in the previous paragraph, though. As
a result, an entire discipline of applied mathematics, called
numerical analysis, has been created for the purpose of analyzing
how algorithms behave with respect to maintaining adequate
precision, and of inventing new algorithms with better properties in
this respect.
The basic idea behind floating-point numbers is to represent a
number as mantissa and an exponent, each with a
fixed number of bits of precision. If we denote the mantissa with
m and the exponent with e, then the number thus
represented is m * 2e.
Again, we have a problem that a number can have several
representations. To obtain a canonical form, we simply add a rule
that m must be greater than or equal to 1/2
and strictly less than 1. If we write such a mantissa
in binal (analogous to decimal) form, we always get a
number that starts with 0.1. This initial information
therefore does not have to be represented, and we represent only the
remaining "binals".
The reason floating-point representations work well for so-called
scientific applications, is that we more often need to multiply
or divide two numbers. Multiplication of two floating-point
numbers is easy to obtain. It suffices to multiply the mantissas and
add the exponents. The resulting mantissa might be smaller than
1/2, in fact, it can be as small as 1/4.
In this case, the result needs to be canonicalized. We do this by
shifting the mantissa left by one position and subtracting one from
the exponent. Division is only slightly more complicated. Notice
that the imprecision in the result of a multiplication or a division
is only due to the imprecision in the original operands. No
additional imprecision is introduced by the operation itself (except
possibly 1 unit in the least significant digit). Floating-point
addition and subtraction do not have this property.
To add two floating-point numbers, the one with the smallest
exponent must first have its mantissa shifted right by n
steps, where n is the difference of the exponents. If n
is greater than the number of bits in the representation of the
mantissa, the second number will be treated as 0 as far
as addition is concerned. The situation is even worse for
subtraction (or addition of one positive and one negative number).
If the numbers have roughly the same absolute value, the result of
the operation is roughly zero, and the resulting representation may
have no correct significant digits.
The two's complement representation that we mentioned above is
mostly useful for addition and subtraction. It only complicates
things for multiplication and division. For multiplication and
division, it is better to use a representation with sign + absolute
value. Since multiplication and division is more common with
floating-point numbers, and since they result in multiplication and
division of the mantissa, it is more advantageous to have the
mantissa represented as sign + absolute value. The exponents are
added, so it is more common to use two's complement (or some related
representation) for the exponent.
Usually, computers manipulate data in chunks of 8,
16, 32, 64, or 128
bits. It is therefore useful to fit a single floating-point number
with both mantissa and exponent in such a chunk. In such a chunk, we
need to have room for the sign (1 bit), the mantissa, and the
exponent. While there are many different ways of dividing the
remaining bits between the mantissa and the exponent, in practice
most computers now use a norm called IEEE-???, which mandates the
following formats: ???
One sometimes hears variations on the phrase "computers can't
represent real numbers exactly". This, of course is not true.
Nothing prevents us from representing (say) the square-root of two
as the number two and a bit indicating that the value is the square
root of the representation. Some useful operations could be very
fast this way. It is true, though that we cannot represent all
real numbers exactly. In fact, we have a similar problem that we
have with rational numbers, in that it is hard to pick a useful
subset that we can represent exactly, other than the
floating-point numbers.
For this reason, no widespread hardware contains built-in real
numbers other than the usual approximations in the form of
floating-point. |