fpt - NYU Computer Science

Floating-Point
and
High-Level Languages
Programming Languages
Fall 2003
Floating-Point, the Basics
Floating-point numbers are
approximations of real numbers, but they
are not real numbers.
 Typical format in a machine is
sign exponent mantissa
 Exponent determines range available
 Mantissa determines precision
 Base is usually 2 (rarely 16, never 10)

The Notion of Precision
Precision is relative. Large numbers have
less absolute precision than small numbers
 For example if we have a 24 bit mantissa,
then relative precision is 1 in 2**24
 For 1.0, this is an absolute precision of
1.0*2**(-24).
 For 100, this is an absolute precision of
100*2**(-24).

Representing Numbers
Some numbers can typically be
represented exactly, e,g. 1.0, 2**(-13),
2**(+20) [assume 24 bit mantissa].
 But other numbers are represented only
approximately or not at all

Problems in Representation
2**(-9999999)
Too small, underflows to 0.0
 2**(+9999999)
Too large, error or infinity
 0.1
Cannot be represented exactly in
binary (repeating fracion in binary)
 145678325785.25
Representable in binary, but 24-bit mantissa
too small to represent exactly

Floating-Point Operations
Result may be representable exactly
r = 81.0;
s = 3.0;
x = r / s;
 Machines typically have a floating-point
division instruction 
 But it may not give the exact result 

Floating-Point Operations
Result may not be representable exactly
r = 1.0;
s = 10.0;
t = r / s;
 Result cannot be precisely corrrect
 Will it be rounded to nearest bit, or
perhaps truncated towards zero, or
perhaps even more inaccurate, all are
possible.

Unexpected Results
Let’s look at this code
a = 1.0;
b = 10.0;
c = a / b;
if (c == 0.1) printf (“hello1”);
if (c == 1.0/10.0) printf (“goodbye”);
if (c == a/b) printf (“what the %$!”);
 We may get nothing printed!

Why was Nothing Printed?
if (c == 0.1) …
 In this case, we have stored the result of
the run-time computation of 0.1, but it’s
not quite precise, in c.
 The other operand has been converted to
a constant by the compiler.
 Both are good approximations of 0.1
 But neither are accurate
 And perhaps they are a little bit different

Why Was Nothing Printed?
if (c == 1.0 / 10.0) …
 The compiler may compute 1.0/10.0 at
compile time and treat it as though it had
seen 0.1, and get a different result
 Really ends up being the same as last
case

Why Was Nothing Printed?
if (c == a/b)
 Now surely we should get the same
computation.
 Maybe not, compiler may be clever
enough to know that a/b is 0.1 in one
case and not in the other.

Now Let’s Get Something Printed!
Read in value of a at run time
 Read in value of b at run time
 Compiler knows nothing
 Now we will get some output or else!
c = a / b;
if (c == a/b) printf (“This will print!”);

Still Nothing Printed!!!
How can that be
 First a bit of background
 Typically we have two or more different
precisions of floating-point values, with
different length mantissas
 In registers we use only the higher
precision form, expanding on a load,
rounding on a store.

What Happened?
c = a / b;
if (c == a/b) printf (“This will print!”);
 First compute a/b in high precision
 Now round to fit in low precision c, loosing
significant bits
 Compute a/b in high precision into a
register, load c into a register, expanding
 Comparison does not say equal

Surprises in Precision
Let’s compute x**4
 Two methods:

 Result
= x*x*x*x;
 Result = (x*x)**2
Second has only two multiplications,
instead of 3, must be more accurate.
 Nope, first is more accurate!

Subtleties of Rounding
Suppose we insist on floating-point
operations being properly rounded.
 What does properly rounded mean for 0.5
 Typical rule, round up always if half way
 Introduces Bias
 Some computations sensitive to this bias
 Computation of orbit of pluto significantly
off because of this problem

Moral of this Story
Floating-point is full of surprises
 If you base your expectations on real
arithmetic, you will be surprised
 On any given machine, floating-point
operations are well defined
 But may be more or less peculiar
 But the semantics will differ from machine
to machine

What to do in High Level Languages
We can punt. We just say that floating-point
numbers are some approximation of real
numbers, and that the results of floating-point
operations are some approximation of the real
results.
 Nice and simple from a language definition point
of view
 Fortran and C historically did this
 Not so simple for a poor application programmer

Doing a Bit Better
Parametrize the machine model of
floating-point. What exponent range does
it have, what precision of the mantissa.
 Define fpt model in terms of these
parameters.
 Insist on results being accurate where
possible, or one of two end points it not.
 This is the approach of Ada

Doing Quite a Bit Better
What if all machines had exactly the same
floating-point model?
 IEEE floating-point heads in that direction
 Precisely defines two floating-point
formats (32-bit and 64-bit) and precisely
defines operations on them.

More on IEEE
We could define our language to require IEEE
semantics for floating-point.
 But what if the machine does not efficiently
implement IEEE
 For example, x86 implements the two formats,
but all registers have an 80-bit format, so you
get extra precision
 Which sounds good, but is as we have seem a
possible reason for suprising behavior.

IEEE and High Level Languages
Java and Python both expect/require IEEE
semantics for arithmetic.
 Java wants high efficiency, which causes a
clash if the machine does not support
IEEE in the “right” way.
 Java is potentially inefficient on x86
machines. Solution: cheat 
 Python requires IEEE too, but does not
care so much about efficiency.
