Thread: floating point representation

floating point representation

From
Hiroshi Inoue
Date:
Hi all,

I have a question of PostgreSQL's floating point
representation.

create table t (fl1 float4, fl2 float4, fl3 float4);
insert into t values (1.234567, 1.23457, 1.23457);
select * from t;  fl1   |   fl2   |   fl3
---------+---------+---------1.23457 | 1.23457 | 1.23457
(1 row)

select * from t where fl1=fl2;fl1 | fl2 | fl3
-----+-----+-----
(0 rows)

select * from t where t where fl2=fl3;  fl1   |   fl2   |   fl3
---------+---------+---------1.23457 | 1.23457 | 1.23457
(1 row)

OK, fl1 != fl2 and fl2 == fl3 but

copy t to stdout;
1.23457 1.23457 1.23457

The output of pg_dump is same. Then
after restoring from the pg_dump 
output, we would get a tuple such
that fl1==fl2==fl3.

Is it reasonable ?

In addtion this makes a client library like ODBC
driver very unhappy with the handlig of floating
point data. For example, once a floating point
data like fl1(1.234567) was stored, MS-Access
couldn't update the tuple any more.

Is there a way to change the precision of floating
point representation from clients ?

Regards,
Hiroshi Inoue


Re: floating point representation

From
Peter Eisentraut
Date:
Hiroshi Inoue writes:

> Is there a way to change the precision of floating
> point representation from clients ?

Not currently, but I image it couldn't be too hard to introduce a
parameter that changes the format string used by float*out to something
else.

The GNU C library now offers a %a (and %A) format that prints floating
point numbers in a semi-internal form that is meant to be portable.  (I
image this was done because of C99, but I'm speculating.)  It might be
useful to offer this to preserve accurate data across dumps.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: floating point representation

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> The GNU C library now offers a %a (and %A) format that prints floating
> point numbers in a semi-internal form that is meant to be portable.  (I
> image this was done because of C99, but I'm speculating.)  It might be
> useful to offer this to preserve accurate data across dumps.

Here's what I find in the C99 draft:
      a,A     A double argument representing a (finite)  floating-              point    number    is   converted   in
the   style              [-]0xh.hhhhp�d, where there is one hexadecimal digit                          ^ � == "+/-" ...
tgl             (which  is  nonzero  if the argument is a normalized              floating-point number and is
otherwise unspecified)              before   the  decimal-point  character (219) and the              number of
hexadecimaldigits after it  is  equal  to              the  precision;  if  the  precision  is  missing and
FLT_RADIX is a power of 2,  then  the  precision  is              sufficient for an exact representation of the value;
           if the precision is missing and FLT_RADIX is  not  a              power  of  2,  then  the  precision is
sufficientto              distinguish (220) values of type double, except that              trailing zeros may be
omitted;if the  precision  is              zero  and  the  # flag is not specified, no decimal-              point
characterappears.   The  letters  abcdef  are              used  for  a conversion and the letters ABCDEF for A
    conversion.  The A conversion specifier  produces  a              number  with  X  and  P  instead  of  x  and p.
The             exponent always contains at  least  one  digit,  and              only  as  many more digits as
necessaryto represent              the decimal exponent of 2.  If the  value  is  zero,              the exponent is
zero.
              A double argument representing an infinity or NaN is              converted in the style  of  an  f  or
F conversion              specifier.
 
      ____________________
      219Binary  implementations  can choose the hexadecimal digit         to the  left  of  the  decimal-point
character so  that         subsequent digits align to nibble (4-bit) boundaries.
 
      220The precision p is sufficient to  distinguish  values  of         the source type if 16p-1>bn where b is
FLT_RADIXand n is         the number of base-b digits in  the  significand  of  the         source  type.  A smaller p
mightsuffice depending on the         implementation's scheme for determining the digit to  the         left of the
decimal-pointcharacter.
 
      7.19.6.1                   Library                  7.19.6.1
      314          Committee Draft  --  August 3, 1998   WG14/N843


So, it looks like C99-compliant libc implementations will have this,
but I'd hesitate to rely on it for pg_dump purposes; it would certainly
not be very portable for awhile yet.

Peter's idea of a SET variable to control float display format might
not be a bad idea, but what if anything should pg_dump do with it?
Maybe just crank the precision up a couple digits from the current
defaults?
        regards, tom lane


RE: floating point representation

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> 
> Peter Eisentraut <peter_e@gmx.net> writes:
> > The GNU C library now offers a %a (and %A) format that prints floating
> > point numbers in a semi-internal form that is meant to be portable.  (I
> > image this was done because of C99, but I'm speculating.)  It might be
> > useful to offer this to preserve accurate data across dumps.
> 

[snip]
> 
> So, it looks like C99-compliant libc implementations will have this,
> but I'd hesitate to rely on it for pg_dump purposes; it would certainly
> not be very portable for awhile yet.
> 

Agreed.

> Peter's idea of a SET variable to control float display format might
> not be a bad idea, but what if anything should pg_dump do with it?
> Maybe just crank the precision up a couple digits from the current
> defaults?
>

Currently the precision of float display format is FLT_DIG(DBL_DIG).
It's not sufficent to distinguish float values. As Peter already suggested,
the quickest solution would be to change XXX_DIG constants to variables
and provide a routine to SET the variables. Strictly speaking the precision
needed to distigush float values seems OS-dependent. It seems preferable
to have a symbol to specify the precision. 

Regards,
Hiroshi Inoue


Re: floating point representation

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> The 7.1-release seems near.
> May I provide the followings ?
>     SET FLOAT4_PRECISION TO ..
>     SET FLOAT8_PRECISION TO ..

> Or must we postpone to fix it ?

This seems a small enough change that I do not fear fixing it at this
late date.  However, I do not like the idea of making the SET variables
be just number of digits precision.  As long as we're going to have SET
variables, let's go for the full flexibility offered by sprintf: define
the SET variables as the sprintf format strings to use.  The defaults
would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
digits we are currently using).  This way, someone could select the C99
%a format if he knew that his libc supported it.  Or he could force a
particular format like %7.3f if that's what he needed in a specific
application.
        regards, tom lane


Re: floating point representation

From
Hiroshi Inoue
Date:
I wrote:
> 
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> >
> > Peter Eisentraut <peter_e@gmx.net> writes:

[snip]

> 
> > Peter's idea of a SET variable to control float display format might
> > not be a bad idea, but what if anything should pg_dump do with it?
> > Maybe just crank the precision up a couple digits from the current
> > defaults?
> >
> 
> Currently the precision of float display format is FLT_DIG(DBL_DIG).
> It's not sufficent to distinguish float values. As Peter already suggested,
> the quickest solution would be to change XXX_DIG constants to variables
> and provide a routine to SET the variables. Strictly speaking the precision
> needed to distigush float values seems OS-dependent. It seems preferable
> to have a symbol to specify the precision.
> 

The 7.1-release seems near.
May I provide the followings ?SET FLOAT4_PRECISION TO ..SET FLOAT8_PRECISION TO ..

Or must we postpone to fix it ?

Regards,
Hiroshi Inoue


Re: floating point representation

From
Tom Lane
Date:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> Tom Lane wrote:
>> The defaults
>> would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
>> digits we are currently using).

> Wouldn't changing current '%.6g','%.15g'(on many platforms)
> cause the regression test failure ? 

I didn't check my numbers.  If the current behavior is '%.6g','%.15g'
then we should stay with that as the default.

Hmm, on looking at the code, this might mean we need some configure
pushups to extract FLT_DIG and DBL_DIG and put those into the default
strings.  Do we support any platforms where these are not 6 & 15?
        regards, tom lane


Re: floating point representation

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > The 7.1-release seems near.
> > May I provide the followings ?
> >       SET FLOAT4_PRECISION TO ..
> >       SET FLOAT8_PRECISION TO ..
> 
> > Or must we postpone to fix it ?
> 
> This seems a small enough change that I do not fear fixing it at this
> late date.  However, I do not like the idea of making the SET variables
> be just number of digits precision.  As long as we're going to have SET
> variables, let's go for the full flexibility offered by sprintf: define
> the SET variables as the sprintf format strings to use. 

Agreed.

> The defaults
> would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
> digits we are currently using).

Wouldn't changing current '%.6g','%.15g'(on many platforms)
cause the regression test failure ? 

> This way, someone could select the C99
> %a format if he knew that his libc supported it.  Or he could force a
> particular format like %7.3f if that's what he needed in a specific
> application.
> 

Regards,
Hiroshi Inoue


Re: floating point representation

From
Bruce Momjian
Date:
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > The 7.1-release seems near.
> > May I provide the followings ?
> >     SET FLOAT4_PRECISION TO ..
> >     SET FLOAT8_PRECISION TO ..
> 
> > Or must we postpone to fix it ?
> 
> This seems a small enough change that I do not fear fixing it at this
> late date.  However, I do not like the idea of making the SET variables
> be just number of digits precision.  As long as we're going to have SET
> variables, let's go for the full flexibility offered by sprintf: define
> the SET variables as the sprintf format strings to use.  The defaults
> would be "%.7g" and "%.17g" (or thereabouts, not sure what number of
> digits we are currently using).  This way, someone could select the C99
> %a format if he knew that his libc supported it.  Or he could force a
> particular format like %7.3f if that's what he needed in a specific
> application.

Added to TODO:
* Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representation

From
Peter Eisentraut
Date:
Hiroshi Inoue writes:

> The 7.1-release seems near.
> May I provide the followings ?
>     SET FLOAT4_PRECISION TO ..
>     SET FLOAT8_PRECISION TO ..

I'd prefer names that go with the SQL type names:

REAL_FORMAT
DOUBLE_PRECISION_FORMAT

Seems a bit tacky, but a lot of work has been put in to make these names
more prominent.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: floating point representation

From
Peter Eisentraut
Date:
Hiroshi Inoue writes:

> The 7.1-release seems near.
> May I provide the followings ?
>     SET FLOAT4_PRECISION TO ..
>     SET FLOAT8_PRECISION TO ..
>
> Or must we postpone to fix it ?

Actually, you're going to have to recode the float*in() functions, using
scanf, and scanf's formats are not always equivalent to printf's.

And what about the geometry types that are based on floats?

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: floating point representation

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> Added to TODO:
>     * Add SET FLOAT4_PRECISION and SET FLOAT8_PRECISION using printf args

foo_PRECISION is not the right name if these variables will contain
printf format strings.  Perhaps foo_FORMAT?  Anyone have a better idea?

After further thought I think that we ought to standardize on %.6g and
%.15g even if the local <float.h> offers slightly different values of
FLT_DIG and DBL_DIG.  IEEE or near-IEEE float math is so close to
universal that I don't think it's worth worrying about the possibility
that different precisions would be more appropriate for some platforms.
Furthermore, having cross-platform consistency of display format seems
more useful than not.

Something else we should perhaps think about, though we are very late
in beta: once these variables exist, we could have the geometry regress
test set them to suppress a couple of digits, and eliminate most if not
all of the need for platform-specific geometry results.  Doing this
would be a no-brainer at any other time in the development cycle, but
right now I am worried about whether we'd be able to reconfirm regress
results on all the currently-supported platforms before release.
Comments?
        regards, tom lane


Re: floating point representation

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> Or must we postpone to fix it ?

> Actually, you're going to have to recode the float*in() functions, using
> scanf, and scanf's formats are not always equivalent to printf's.

Hmm... that wouldn't matter, except for this %a format.  Maybe we'd
better not try to make this happen in the waning days of the 7.1 cycle.

> And what about the geometry types that are based on floats?

They should track the float8 format, certainly.
        regards, tom lane


Re: floating point representation

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Actually, you're going to have to recode the float*in() functions, using
> scanf, and scanf's formats are not always equivalent to printf's.

Further thought: one answer to this is to institute four SET variables,
two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
and similarly for FLOAT4.  The input formats would normally just be
"%lg" and "%g" but could be changed for special cases (like reading
table dumps prepared with %a output format).

However, it's becoming quite clear to me that this feature needs more
thought than first appeared.  Accordingly, I now vote that we not try
to fit it into 7.1, but do it in a more considered fashion for 7.2.
        regards, tom lane


Re: floating point representation

From
Bruce Momjian
Date:
> Hiroshi Inoue writes:
> 
> > The 7.1-release seems near.
> > May I provide the followings ?
> >     SET FLOAT4_PRECISION TO ..
> >     SET FLOAT8_PRECISION TO ..
> 
> I'd prefer names that go with the SQL type names:
> 
> REAL_FORMAT
> DOUBLE_PRECISION_FORMAT
> 
> Seems a bit tacky, but a lot of work has been put in to make these names
> more prominent.

TODO updated:
* Add SET REAL_FORMAT and SET DOUBLE_PRECISION_FORMAT using printf args

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representation

From
Hiroshi Inoue
Date:
Tom Lane wrote:
> 
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Actually, you're going to have to recode the float*in() functions, using
> > scanf, and scanf's formats are not always equivalent to printf's.
> 
> Further thought: one answer to this is to institute four SET variables,
> two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
> and similarly for FLOAT4.  The input formats would normally just be
> "%lg" and "%g" but could be changed for special cases (like reading
> table dumps prepared with %a output format).
> 

From the first I don't want to change the current default
output format"%." #FLT_DIG "g" (REAL)"%." #DBL_DIG "g" (DOUBLE PRECISION)
for 7.1 because their changes would cause a regress
test failure.

> However, it's becoming quite clear to me that this feature needs more
> thought than first appeared.  Accordingly, I now vote that we not try
> to fit it into 7.1, but do it in a more considered fashion for 7.2.
> 

The simplest way to fix it quickly would be to not provide
XXXX_IN_FORMAT and restrict XXXX_FORMAT to "%.*g" at present.

Regards,
Hiroshi Inoue


Re: floating point representationu

From
Bruce Momjian
Date:
> 
> Tom Lane wrote:
> > 
> > Peter Eisentraut <peter_e@gmx.net> writes:
> > > Actually, you're going to have to recode the float*in() functions, using
> > > scanf, and scanf's formats are not always equivalent to printf's.
> > 
> > Further thought: one answer to this is to institute four SET variables,
> > two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
> > and similarly for FLOAT4.  The input formats would normally just be
> > "%lg" and "%g" but could be changed for special cases (like reading
> > table dumps prepared with %a output format).
> > 
> 
> >From the first I don't want to change the current default
> output format
>     "%." #FLT_DIG "g" (REAL)
>     "%." #DBL_DIG "g" (DOUBLE PRECISION)
> for 7.1 because their changes would cause a regress
> test failure.

But we run regress with the proper setting, right?  How does giving
people the ability to change the defaults affect the regression tests?


--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representationu

From
Hiroshi Inoue
Date:
Bruce Momjian wrote:
> 
> >
> > Tom Lane wrote:
> > >
> > > Peter Eisentraut <peter_e@gmx.net> writes:
> > > > Actually, you're going to have to recode the float*in() functions, using
> > > > scanf, and scanf's formats are not always equivalent to printf's.
> > >
> > > Further thought: one answer to this is to institute four SET variables,
> > > two for output and two for input; perhaps FLOAT8_FORMAT, FLOAT8_IN_FORMAT,
> > > and similarly for FLOAT4.  The input formats would normally just be
> > > "%lg" and "%g" but could be changed for special cases (like reading
> > > table dumps prepared with %a output format).
> > >
> >
> > >From the first I don't want to change the current default
> > output format
> >       "%." #FLT_DIG "g" (REAL)
> >       "%." #DBL_DIG "g" (DOUBLE PRECISION)
> > for 7.1 because their changes would cause a regress
> > test failure.
> 
> But we run regress with the proper setting, right?> How does giving
> people the ability to change the defaults affect the regression tests?
> 

Hmm I'm afraid I'm misunderstanding your point.
If the default float4(8) output format would be the
same as current output format then we would have no
problem with the current regress test. But there
could be a choise to change default output format
to have a large enough presision to distinguish
float4(8).

Regards,
Hiroshi Inoue


Re: floating point representationu

From
Bruce Momjian
Date:
> > > >From the first I don't want to change the current default
> > > output format
> > >       "%." #FLT_DIG "g" (REAL)
> > >       "%." #DBL_DIG "g" (DOUBLE PRECISION)
> > > for 7.1 because their changes would cause a regress
> > > test failure.
> > 
> > But we run regress with the proper setting, right?> How does giving
> > people the ability to change the defaults affect the regression tests?
> > 
> 
> Hmm I'm afraid I'm misunderstanding your point.
> If the default float4(8) output format would be the
> same as current output format then we would have no
> problem with the current regress test. But there
> could be a choise to change default output format
> to have a large enough presision to distinguish
> float4(8).

But are they going to change the default to run the regression tests? 
How do they change it?  in ~/.psqlrc?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representation

From
Thomas Lockhart
Date:
> Hmm, on looking at the code, this might mean we need some configure
> pushups to extract FLT_DIG and DBL_DIG and put those into the default
> strings.  Do we support any platforms where these are not 6 & 15?

In principle, yes. VAX does not use IEEE math (by default anyway) and
has less range and more precision. Most machines nowadays use the IEEE
definitions, but having at least one counterexample will help keep us
honest ;)
                 - Thomas


Re: floating point representationu

From
Hiroshi Inoue
Date:
Bruce Momjian wrote:
> 
> > > > >From the first I don't want to change the current default
> > > > output format
> > > >       "%." #FLT_DIG "g" (REAL)
> > > >       "%." #DBL_DIG "g" (DOUBLE PRECISION)
> > > > for 7.1 because their changes would cause a regress
> > > > test failure.
> > >
> > > But we run regress with the proper setting, right?> How does giving
> > > people the ability to change the defaults affect the regression tests?
> > >
> >
> > Hmm I'm afraid I'm misunderstanding your point.
> > If the default float4(8) output format would be the
> > same as current output format then we would have no
> > problem with the current regress test. But there
> > could be a choise to change default output format
> > to have a large enough presision to distinguish
> > float4(8).
> 
> But are they going to change the default to run the regression tests?
> How do they change it?  in ~/.psqlrc?
> 

Probably there's a misunderstanding between you and I
but unfortunaltely I don't understand what it is in my
poor English.
Anyway in my plan(current format as default) there would
be no problem with regress test at least for 7.1.

Regards,
Hiroshi Inoue


Re: floating point representationu

From
Bruce Momjian
Date:
> > > same as current output format then we would have no
> > > problem with the current regress test. But there
> > > could be a choise to change default output format
> > > to have a large enough presision to distinguish
> > > float4(8).
> > 
> > But are they going to change the default to run the regression tests?
> > How do they change it?  in ~/.psqlrc?
> > 
> 
> Probably there's a misunderstanding between you and I
> but unfortunaltely I don't understand what it is in my
> poor English.
> Anyway in my plan(current format as default) there would
> be no problem with regress test at least for 7.1.

Oh, I see.  I can't see any way we can make this change in 7.1.  It has
to be done in 7.2.  You are right, changing it at this late date would
be a regression disaster.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representationu

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> But are they going to change the default to run the regression tests? 

You're barking up the wrong tree, Bruce.  Hiroshi specifically said
that he does *not* want to change the default behavior.
        regards, tom lane


Re: floating point representation

From
Thomas Lockhart
Date:
> ...right now I am worried about whether we'd be able to reconfirm regress
> results on all the currently-supported platforms before release.

This would be an excellent topic for a full development cycle ;)
                   - Thomas


Re: floating point representation

From
Bruce Momjian
Date:
> > ...right now I am worried about whether we'd be able to reconfirm regress
> > results on all the currently-supported platforms before release.
> 
> This would be an excellent topic for a full development cycle ;)

Oh, I see.  Never mind.  I was lost.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representationu

From
Bruce Momjian
Date:
> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > But are they going to change the default to run the regression tests? 
> 
> You're barking up the wrong tree, Bruce.  Hiroshi specifically said
> that he does *not* want to change the default behavior.

OK, I am confused.  Can someone straighten me out?

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representation

From
Pete Forman
Date:
Tom Lane writes:> Hiroshi Inoue <Inoue@tpf.co.jp> writes:> > Tom Lane wrote:> >> The defaults> >> would be "%.7g" and
"%.17g"(or thereabouts, not sure what number of> >> digits we are currently using).> > > Wouldn't changing current
'%.6g','%.15g'(onmany platforms)> > cause the regression test failure ? > > I didn't check my numbers.  If the current
behavioris '%.6g','%.15g'> then we should stay with that as the default.> > Hmm, on looking at the code, this might
meanwe need some configure> pushups to extract FLT_DIG and DBL_DIG and put those into the default> strings.  Do we
supportany platforms where these are not 6 & 15?
 

Please remind me what we are trying to do.  6 & 15 are values to
suppress trailing digits at the end of a number in a standard printf.
For example, 0.1 prints as 0.10000000000000001 at %.17g but as 0.1 at
%.16g.  However those shorter formats are less precise.  There are
several other doubles that will also print the same result.  A round
trip of printf/scanf will not generally preserve the number.

Printing for display purposes may not be adequate for dumping with a
view to restoring.  Are we talking about display or dump?

The ideal is to print just enough digits to be able to read the number
back.  There should be no redundant digits at the end.  Printf is
unable to do this by itself.  The reason is that the correct number of
decimal digits for a %.*g is a function of the number being printed.

There are algorithms to do the right thing but they can be expensive.
I play with some in a program at the URI below.  There is a minor typo
in the usage and a missing (optional) file.  I'll correct those when
the site allows uploads again.  The files' contents are currently
available at http://petef.8k.com/.
-- 
Pete Forman                 -./\.- Disclaimer: This post is originated
WesternGeco                   -./\.-  by myself and does not represent
pete.forman@westerngeco.com     -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.


Re: floating point representation

From
Tom Lane
Date:
Pete Forman <pete.forman@westerngeco.com> writes:
> Please remind me what we are trying to do.

The real point is that we need to serve several different purposes
that aren't necessarily fully compatible.

The existing default of FLT_DIG or DBL_DIG digits seems like a good
general-purpose policy, but it doesn't meet all needs.  For pg_dump,
we clearly would like to promise exact dump and restore.  On the
other side, the geometry regress tests would like to suppress a few
of the noisier low-order digits.  And we frequently see questions from
users about how they can display fewer digits than the system wants to
give them --- or, more generally, format the output in some special
form.

I think the idea of making a user-settable format string is a good one.
I'm just afraid of the idea of trying to shoehorn in a solution at the
last minute; if we do, we may find it's not quite right and then have
a backwards-compatibility problem with fixing it.  Besides, we are in
"no new features" mode during beta.  I think it should wait for 7.2.
        regards, tom lane


Re: floating point representation

From
Bruce Momjian
Date:
> I think the idea of making a user-settable format string is a good one.
> I'm just afraid of the idea of trying to shoehorn in a solution at the
> last minute; if we do, we may find it's not quite right and then have
> a backwards-compatibility problem with fixing it.  Besides, we are in
> "no new features" mode during beta.  I think it should wait for 7.2.

Agreed.  I have the items on the TODO list.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: floating point representation

From
Peter Eisentraut
Date:
Tom Lane writes:

> And we frequently see questions from users about how they can display
> fewer digits than the system wants to give them --- or, more
> generally, format the output in some special form.

to_char() should serve those people.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/



Re: floating point representation

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> And we frequently see questions from users about how they can display
>> fewer digits than the system wants to give them --- or, more
>> generally, format the output in some special form.

> to_char() should serve those people.

Only if they're willing to go through and change all their queries.
The geometry regress tests seem a good counterexample: one SET at the
top versus a lot of rewriting.
        regards, tom lane


Re: floating point representation

From
Philip Warner
Date:
At 15:48 20/02/01 -0500, Tom Lane wrote:
>
>The existing default of FLT_DIG or DBL_DIG digits seems like a good
>general-purpose policy, but it doesn't meet all needs.  For pg_dump,
>we clearly would like to promise exact dump and restore.  On the
>other side, the geometry regress tests would like to suppress a few
>of the noisier low-order digits.  And we frequently see questions from
>users about how they can display fewer digits than the system wants to
>give them --- or, more generally, format the output in some special
>form.
>

If I could add another 'nice-to-have' in here: the ability on a
per-attribute basis to specify the preferred output format. This could
apply to real, date, integer etc etc. Clearly not a 7.1 feature.




----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: floating point representation

From
Philip Warner
Date:
At 22:29 20/02/01 +0100, Peter Eisentraut wrote:
>Tom Lane writes:
>> And we frequently see questions from users about how they can display
>> fewer digits than the system wants to give them --- or, more
>> generally, format the output in some special form.
>
>to_char() should serve those people.
>

This is not a good solution if what you want (as a user) is consistency of
output no matter who retrieves the data; people should not have to wrap
every SELECT field in to_char to get the precision/format they want.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: floating point representation

From
Robert Schrem
Date:
I think a formating mode where only the relevant digits are written
to the output would be great as an alternative to the discussed 
fixed formatting strings. In this context i think of 'relevant' 
as in the following:

'Output as few characters as possible but ensure that scanf is 
still able to rebuild the binary reprressentation of the floating 
point number exactly.'

To make this happen we would need to compute a seperate formatting
string for each floating point value:

E.g. if the binary value is exactly '1.00000E00' then we just
write '1' to the output, because the rest is just 'ASCII noise'
and not neccessary for rebuilding the identical binary value for
the given floating point value.

The advantage would be, that we only generate as much ASCII data
as absolutly neccessary to rebuild the original data exactly.
At least this is what I would expect from pg_dump.

robert schrem


Re: Re: floating point representation

From
Philip Warner
Date:
At 10:19 21/02/01 +0100, Robert Schrem wrote:
>The advantage would be, that we only generate as much ASCII data
>as absolutly neccessary to rebuild the original data exactly.
>At least this is what I would expect from pg_dump.

pg_dump is only one side of thre problem, but the simplest solution might
be to add an option to dump the hex mantissa, exponent & sign. This should
be low-cost and an exact representation of the machine version of the number.

The other issues, like what is sent to psql & via interfaces like odbc
(currently text) should be application/DBA based and setable on a
per-attribute basis. eg. some applications want 1.0000 because the data
came from a piece of hardware with a know error, and 1.0000 means 1.0000+/-
0.00005 etc. Maybe this is just an argument for a new 'number with error'
type...


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.B.N. 75 008 659 498)          |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Re: floating point representation

From
Robert Schrem
Date:
On Wed, 21 Feb 2001, you wrote:
> At 10:19 21/02/01 +0100, Robert Schrem wrote:
> >The advantage would be, that we only generate as much ASCII data
> >as absolutly neccessary to rebuild the original data exactly.
> >At least this is what I would expect from pg_dump.
> 
> pg_dump is only one side of thre problem, but the simplest solution might
> be to add an option to dump the hex mantissa, exponent & sign. This should
> be low-cost and an exact representation of the machine version of the number.

The hex dumps should be done in a machine independant way - I think
that's what you ment when stating mantissa, exponent & sign seperatly,
right? I think this whould be a good solution...

> The other issues, like what is sent to psql & via interfaces like odbc
> (currently text) should be application/DBA based and setable on a
> per-attribute basis. 

You think of an additional tag in a CREATE TABLE instruction like

CREATE TABLE temperature (   id             id,   messure_time   timestamp default now() formatted as "hhmmmdd",
value         float formatted as "%3.2f"
 
);

or maybe

CREATE TABLE temperature (   id                       id,   messure_time("hhmmmdd")  timestamp default now(),
value("%3.2f")          float
 
);

or is there something in SQL99 ?

> eg. some applications want 1.0000 because the data
> came from a piece of hardware with a know error, and 1.0000 means 1.0000+/-
> 0.00005 etc. Maybe this is just an argument for a new 'number with error'
> type...

I think a float value in a database column has no known error 
range and therefore we should not care about the 'physical' error 
of a value in this context. Just think of a computed column in a 
VIEW - how can we know for shure how percise such a result is if
we don't have any additional information about the messurement 
errors of all operands (or constants) involved.

If you would introduce a new type - 'number with error' - this 
would be totally different and a big contribution to the solution 
of this. Then you can also handle errors like 1.0000+/-0.00002 
percisely - witch you can't by only formatting 1.0000.

robert schrem









Re: Re: floating point representation

From
Tom Lane
Date:
Philip Warner <pjw@rhyme.com.au> writes:
> The other issues, like what is sent to psql & via interfaces like odbc
> (currently text) should be application/DBA based and setable on a
> per-attribute basis. eg. some applications want 1.0000 because the data
> came from a piece of hardware with a know error, and 1.0000 means 1.0000+/-
> 0.00005 etc. Maybe this is just an argument for a new 'number with error'
> type...

FWIW, there is a number-with-error type in contrib/seg ...
        regards, tom lane


Re: floating point representation

From
Peter Eisentraut
Date:
Philip Warner writes:

> At 22:29 20/02/01 +0100, Peter Eisentraut wrote:
> >Tom Lane writes:
> >> And we frequently see questions from users about how they can display
> >> fewer digits than the system wants to give them --- or, more
> >> generally, format the output in some special form.
> >
> >to_char() should serve those people.
> >
>
> This is not a good solution if what you want (as a user) is consistency of
> output no matter who retrieves the data; people should not have to wrap
> every SELECT field in to_char to get the precision/format they want.

Views should serve those people.

-- 
Peter Eisentraut      peter_e@gmx.net       http://yi.org/peter-e/