Problem Description
Here's a program that performs the basic
arithmetic operations to floating point numbers of arbitrary
size.
Try this one on your Windows calculator!
1234567890123456789012345678901234567890
E  10000 
X
98765432109876543210987654321 E +10001 
1.219326311370217952261850327337448559633622923332
E 68 
Background & Techniques
There is nothing very useful about the results produced here. It
is mainly an investigation into how to combine exponents and
output from our Big Integer arithmetic unit to apply +, , x and (maybe) ÷ operations
to floating point numbers of arbitrary size.
The program was prompted by a
viewer's question about limitations of the Windows Calculator
program. The Windows Calc.exe calculator limits input and output to 32
significant digits and input of exponents (using the Exp key) to 4
digits. Using the X^Y key we can enter exponents up to
43300. The largest intrinsic number type,
Extended, uses 80bit (10 byte) number with 64 reserved bits for significant
digits. This is equivalent to 19 or 20 decimal
digits, so some sort of large number handling routines must be used to
reach 32 digit accuracy. It is likely that
the limitations are be imposed by display space considerations and,
perhaps, by the math library Microsoft uses to calculate function
values. By the way, the PowerToy Calculator, available as
a free download from Microsoft, allows users to choose 32, 64, 128 or 512
digit accuracy  with a warning that 512 digit mode may be very
slow. I guess!
I allow up to 50 digits to be used, primarily. to keep the display
space to a reasonable size.
Programmer's Notes
I define a TBigFloat class which contains
fields Exponent, the exponent of the number, DecPart, a TInteger big
integer object containing the digits and an integer field reflecting the
number of significant digits to display.
The DecPart field represent a decimal
value between 1 and +1. Exponent is the power of ten that DecPart
must be multiplied by to get the real value of the number. So
negative exponents reflect the number of 0's that must be inserted between
the decimal point and the leftmost digit of DecPart.
Positive exponents represent the position of the decimal point moving
right from the leftmost digit of DecPart, adding 0's to the right
end of the number if required.
Procedure GetNumber
converts a string to its internal representation and Shownumber
builds a string version of the number in "Normal" or
"Scientific" format. Split is a method
that uses the number of digits and the exponent value to determine how
many digits are to the left and right of the implied decimal point.
This allows the appropriate operand to be be shifted left (multiplied by
10) as necessary to align the decimal points before
adding.
Addendum March 23, 2003: A viewer suggested an algorithm
for division, which I implemented today. The new version
divides by successive approximation. A initial range for the
quotients is set at 0 for the lowest value and 10^{(dividend exponent 
divisor exponent) }as the maximum. The algorithm loops
making quotient guesses by splitting the difference between the last guess
that was too high and the last guess that was too low. Quotients are
checked by multiplying the quotient guess by the divisor and comparing the
result to the dividend. during each iteration. This may not be
the optimum method, but it seem to work on the cases tested so
far. A "rounding" procedure was required to
trim results back the specified number of significant digits.
Otherwise exact divisors tended to produce quotients ending in
9999999999... The ShowNumber procedure was
modified to remove extra trailing 0's after a decimal point, regardless of
the significant digits specification.
Addendum April 13, 2005: The TInteger class used by
TBigFloat has been moved the DFF Library. in unit UBigIntsV2.
If you wish to recompile this program, a onetime download of the
library will be necessary.
Addendum April 4, 2006: UBigFloatV2 was
added to the DFF Library file and removed from the source zip
file. BigFloatTest program has added then Compare
operation for testing.
Addendum December 5, 2006: Charles Doumar has contributed a
number of additions to the UBigFloatV2 unit including Power,
Log (natural and base 10), and Exp functions.
The DFF Library file has been upgraded to DFFLibV08 to include
the revised UBigFloatV2 unit and a few others.
BigFloatTest was upgraded to test the new big float functions.
Addendum February 7, 2007: A few changes/enhancements
posted today in UBigFloatV3 contained in a new library release
DFFLibV10. Most were "cleanup" activirties except for the
change in the definition of the Round procedure.
 TFloatint, the large integer descendant of our
TInteger class, now contains several shift routines that were
formerly part of TInteger, but existed only for use here.
TFloatInt holds the digit values for TBigFloat numbers.
 Support procedures AssignHalf, AssignTwo,
AssignThree, AssignFour, Squareraw, GetNumber, and
ShowNumber were moved to Protected section.
 "Maxsigs", maximum significant digits parameters changed
from cardinal to integer type to avoid Delphi widening both parameters
when comparing to integer types.
 Moved zlog... variables used internally from Interface
to Implementation section.
 Changed old "Round" procedure to RoundToPrec
(round to a specified number of significant digits) and defined new
Round procedure to agree with Trunc, Ceiling
and Floor "round to" digits definition. Parameter specifies
the "round to" position relative to the decimal point. So 0 returns
integer value, 1 "rounds" to 1/10, 2 "rounds" to 1/100, 1 "rounds"
to multiple of 10, etc. The specific "rounding" operation
performed depends on the procedure called. The
revised BigFloatTest program now allows results for all 5 procedures
to be calculated.
Addendum October 16, 2009: BigFloatTest was
reposted today to incorporate two small changes to UBigFloatV3.
The Add procedure to add one TBigFloat number to another
could produce erroneous results when a number with a very large exponent
was added to another with a very large negative exponent (a very small
number). Also procedure Reciprocal could loop
and produce an "Out of Memory" error under certain conditions.
Division uses Reciprocal to divide by multiplying and a user
encountered the error when computing 1/99,999,999,999,999,999,999.
Thanks to Charles Doumar for the corrections posted yesterday in
UBigFloatV3 in library file DFFLibV13.
August 29, 2012: The "Round" procedure in
UBigFloatV3 was rewritten today to correct erroneous results when
0 digits to the right of the decimal point was specified and and values were
between 1 and +1.
May 11, 2015: A memory leak in UBigIntsForFloatV4 unit was
corrected today. BigFloatTest was changed to report allocated memory after
each test to verify the correction and to check future changes.
Programmers can avoid redownloading the library zip file by adding the line
inherited; as the last statement of TInteger.Free method in
UBigIntsForFloatV4.
September 20, 2016: A viewer recently reported significant error
with the BigFloat "Divide" and "Reciprocal" operations. Divide works by
computing the reciprocal of the denominator and multiplying by the
numerator. If the value passed to Reciprocal directly or as a
denominator is negative, values returned were incorrect. The
error has existed for several months, so hopefully negative denominators are
rare. If you use this unit and 1/1 does not return 1 as the
result, you need this fix!
Our DFF Library zip file has been updated with the corrected
UBigFloatV3.pas file and reposted as file DFFLibV14_20Sep2016.zip. The
updated version is also included in the source code download so no need to
download the library to recompile and test this program.
Running/Exploring the Program
Note: The Lazarus (Free Pascal based) programs downloadable below are not
currently maintained.
Suggestions for Further Explorations
(Done 3/23/03  see addendum note above)
Need to complete the "divide" operation
just to learn a little more about what division really
means. If you ever take a class in formal logic, one of the tautologies
(logic theorems) you learn is named "Modus
Tollens": Given the statements A and B, assume that we
know "If A is true then B is
true" and "B is false", we
can conclude "A is false". In other words, the statements "If A is true then B is true" and "If B is false then
A is false" are logically
equivalent. This reflects my attitude toward toward
programming and problem solving in general: "If I
understand the problem, I can solve it" which
implies "If I can't solve a problem, then I just don't
understand it!". If nothing else, this
approach to problem solving inspires the persistence required.
Original Date: March 11,
2003 
Modified:
September 21, 2016

