Thread: Ryu floating point output patch

Ryu floating point output patch

From

Andrew Gierth

Date:

13 December 2018, 22:41:55

This is a mostly cleaned-up version of the test patch I posted
previously for floating-point output using the Ryu algorithm.

Upstream source: github.com/ulfjack/ryu/ryu at commit 795c8b57a

From the upstream, I've taken only specific files which are
Boost-licensed, removed code not of interest to us, removed C99-isms,
applied project style for things like type names and use of INT64CONST,
removed some ad-hoc configuration #ifs in favour of using values
established by pg_config.h, and ran the whole thing through pgindent and
fixed up the resulting wreckage.

On top of that I made these functional changes:

1. Added an alternative fixed-point output format so that values with
exponents not less than -4 and less than the default precision for the
type are formatted in fixed-point, as sprintf does.

2. Exponents now always have a sign and at least two digits, also as per
sprintf formatting rules.

The fixed-point output hasn't been micro-optimized to the same extent as
the rest of the code, and there is a measurable performance effect - but
it's not that significant compared to the speedup over the existing code.

As for the extent of that speedup: in my tests, float8 output is now as
fast or marginally faster than bigint output, and roughly one order of
magnitude faster than the existing float8 output.

This version of the patch continues to use extra_float_digits to select
whether Ryu output is used - but I've also changed the default, so that
Ryu is used unless you explicitly set extra_float_digits to 0 or less.

This does mean that at the moment, about 13 regression tests fail on
this patch due to the higher precision of float output. I have
intentionally not yet adjusted those results, pending discussion.

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu.patch

Re: Ryu floating point output patch

From

Andres Freund

Date:

13 December 2018, 22:56:16

Hi,

On 2018-12-13 19:41:55 +0000, Andrew Gierth wrote:
> This is a mostly cleaned-up version of the test patch I posted
> previously for floating-point output using the Ryu algorithm.

Nice!


> From the upstream, I've taken only specific files which are
> Boost-licensed, removed code not of interest to us, removed C99-isms,
> applied project style for things like type names and use of INT64CONST,
> removed some ad-hoc configuration #ifs in favour of using values
> established by pg_config.h, and ran the whole thing through pgindent and
> fixed up the resulting wreckage.

Removed C99isms? Why, we allow that these days? Did you mean C11?



> +static inline uint64
> +mulShiftAll(const uint64 m, const uint64 *const mul, const int32 j,
> +            uint64 *const vp, uint64 *const vm, const uint32 mmShift)
> +{
> +/*   m <<= 2; */
> +/*   uint128 b0 = ((uint128) m) * mul[0]; // 0 */
> +/*   uint128 b2 = ((uint128) m) * mul[1]; // 64 */
> +/*  */
> +/*   uint128 hi = (b0 >> 64) + b2; */
> +/*   uint128 lo = b0 & 0xffffffffffffffffull; */
> +/*   uint128 factor = (((uint128) mul[1]) << 64) + mul[0]; */
> +/*   uint128 vpLo = lo + (factor << 1); */
> +/*   *vp = (uint64) ((hi + (vpLo >> 64)) >> (j - 64)); */
> +/*   uint128 vmLo = lo - (factor << mmShift); */
> +/*   *vm = (uint64) ((hi + (vmLo >> 64) - (((uint128) 1ull) << 64)) >> (j - 64)); */
> +/*   return (uint64) (hi >> (j - 64)); */
> +    *vp = mulShift(4 * m + 2, mul, j);
> +    *vm = mulShift(4 * m - 1 - mmShift, mul, j);
> +    return mulShift(4 * m, mul, j);
> +}

What's with all the commented out code?  I'm not sure that keeping close
alignment with the original codebase with things like that has much
value given the extent of the reformatting and functional changes?


> +/*  These tables are generated by PrintDoubleLookupTable. */

This kind of reference is essentially dangling now, so perhaps we should
rewrite that?


> +++ b/src/common/d2s_intrinsics.h

> +static inline uint64
> +umul128(const uint64 a, const uint64 b, uint64 *const productHi)
> +{
> +    return _umul128(a, b, productHi);
> +}

These are fairly generic names, perhaps we ought to prefix them?



Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

13 December 2018, 23:23:51

>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> From the upstream, I've taken only specific files which are
 >> Boost-licensed, removed code not of interest to us, removed
 >> C99-isms, applied project style for things like type names and use
 >> of INT64CONST, removed some ad-hoc configuration #ifs in favour of
 >> using values established by pg_config.h, and ran the whole thing
 >> through pgindent and fixed up the resulting wreckage.

 Andres> Removed C99isms? Why, we allow that these days? Did you mean
 Andres> C11?

My understanding is that we do not allow // comments or mixed
declarations and code (other than loop control variables).

This code in particular was very heavily into using mixed declarations
and code throughout. Removing those was moderately painful.

 Andres> What's with all the commented out code?

Just stuff from upstream I didn't take out.

 Andres> I'm not sure that keeping close alignment with the original
 Andres> codebase with things like that has much value given the extent
 Andres> of the reformatting and functional changes?

Probably not, but otherwise I did not remove much in the way of upstream
comments or edit them except for formatting.

 >> +/*  These tables are generated by PrintDoubleLookupTable. */

 Andres> This kind of reference is essentially dangling now, so perhaps
 Andres> we should rewrite that?

Well, it's probably still useful to point out that they're generated
(though not by us); I guess it should make clear where the generator is.

 >> +++ b/src/common/d2s_intrinsics.h

 >> +static inline uint64
 >> +umul128(const uint64 a, const uint64 b, uint64 *const productHi)
 >> +{
 >> +    return _umul128(a, b, productHi);
 >> +}

 Andres> These are fairly generic names, perhaps we ought to prefix
 Andres> them?

Maybe, but presumably nothing else is going to include this file.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andres Freund

Date:

13 December 2018, 23:32:54

Hi,

On 2018-12-13 20:23:51 +0000, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
> 
>  >> From the upstream, I've taken only specific files which are
>  >> Boost-licensed, removed code not of interest to us, removed
>  >> C99-isms, applied project style for things like type names and use
>  >> of INT64CONST, removed some ad-hoc configuration #ifs in favour of
>  >> using values established by pg_config.h, and ran the whole thing
>  >> through pgindent and fixed up the resulting wreckage.
> 
>  Andres> Removed C99isms? Why, we allow that these days? Did you mean
>  Andres> C11?
> 
> My understanding is that we do not allow // comments or mixed
> declarations and code (other than loop control variables).

Ah, that makes sense.


> This code in particular was very heavily into using mixed declarations
> and code throughout. Removing those was moderately painful.

I wonder if we should instead relax those restriction for the largely
foreign pieces of code?



>  >> +/*  These tables are generated by PrintDoubleLookupTable. */
> 
>  Andres> This kind of reference is essentially dangling now, so perhaps
>  Andres> we should rewrite that?
> 
> Well, it's probably still useful to point out that they're generated
> (though not by us); I guess it should make clear where the generator is.

Right it should definitely be explained. But I do think pointing at the
source, would be good. I kind of wonder if we should, if we were to
accept something like this patch, clone the original ryu repository to
either git.pg.org, or at least into pg's github repository?  It'd be
annoying if the original authors moved on and deleted the repository.

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

13 December 2018, 23:53:23

>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> This code in particular was very heavily into using mixed
 >> declarations and code throughout. Removing those was moderately
 >> painful.

 Andres> I wonder if we should instead relax those restriction for the
 Andres> largely foreign pieces of code?

Do we have any reason for the restriction beyond stylistic preference?

 Andres> Right it should definitely be explained. But I do think
 Andres> pointing at the source, would be good. I kind of wonder if we
 Andres> should, if we were to accept something like this patch, clone
 Andres> the original ryu repository to either git.pg.org, or at least
 Andres> into pg's github repository? It'd be annoying if the original
 Andres> authors moved on and deleted the repository.

Keeping a copy on git.pg.org seems like a reasonable thing to do.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andres Freund

Date:

14 December 2018, 00:00:12

Hi,

On 2018-12-13 20:53:23 +0000, Andrew Gierth wrote:
> >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
> 
>  >> This code in particular was very heavily into using mixed
>  >> declarations and code throughout. Removing those was moderately
>  >> painful.
> 
>  Andres> I wonder if we should instead relax those restriction for the
>  Andres> largely foreign pieces of code?
> 
> Do we have any reason for the restriction beyond stylistic preference?

No. Robert and Tom were against allowing mixing declarations and code, a
few more against // comments. I don't quite agree with either, but
getting to the rest of C99 seemed more important than fighting those out
at that moment.

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 00:01:22

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

And it turns out I missed one, sigh.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 00:24:21

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

 Andrew> And it turns out I missed one, sigh.

Updated patch.

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu2.patch

Re: Ryu floating point output patch

From

Thomas Munro

Date:

14 December 2018, 00:27:49

On Fri, Dec 14, 2018 at 8:00 AM Andres Freund <andres@anarazel.de> wrote:
> Hi,
>
> On 2018-12-13 20:53:23 +0000, Andrew Gierth wrote:
> > >>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:
> >
> >  >> This code in particular was very heavily into using mixed
> >  >> declarations and code throughout. Removing those was moderately
> >  >> painful.
> >
> >  Andres> I wonder if we should instead relax those restriction for the
> >  Andres> largely foreign pieces of code?
> >
> > Do we have any reason for the restriction beyond stylistic preference?
>
> No. Robert and Tom were against allowing mixing declarations and code, a
> few more against // comments. I don't quite agree with either, but
> getting to the rest of C99 seemed more important than fighting those out
> at that moment.

-1 for making superficial changes to code that we are "vendoring",
unless it is known to be dead/abandoned and we're definitively forking
it.  It just makes it harder to track upstream's bug fixes and
improvements, surely?

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 00:48:07

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> This code in particular was very heavily into using mixed
 Andrew> declarations and code throughout. Removing those was moderately
 Andrew> painful.

 Andrew> And it turns out I missed one, sigh.

 Andrew> Updated patch.

And again to fix the windows build - why does Mkvcbuild.pm have its own
private copy of the file list for src/common?

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu3.patch

Re: Ryu floating point output patch

From

Alvaro Herrera

Date:

14 December 2018, 01:20:42

On 2018-Dec-13, Andrew Gierth wrote:

> And again to fix the windows build - why does Mkvcbuild.pm have its own
> private copy of the file list for src/common?

I think at some point the Makefile parsing code was too stupid to deal
with the src/port Makefile, so it was hardcoded; later probably I
cargo-culted that into the src/common makefile.  Maybe the parser has
already improved (or the makefile simplified) that the msvc tooling can
extract the files from the makefile?  Dunno.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 10:57:17

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> And again to fix the windows build

And again to see if it actually compiles now...

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu4.patch

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 21:01:19

>>>>> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:

 >>> Do we have any reason for the restriction beyond stylistic preference?

 >> No. Robert and Tom were against allowing mixing declarations and
 >> code, a few more against // comments. I don't quite agree with
 >> either, but getting to the rest of C99 seemed more important than
 >> fighting those out at that moment.

 Thomas> -1 for making superficial changes to code that we are
 Thomas> "vendoring", unless it is known to be dead/abandoned and we're
 Thomas> definitively forking it. It just makes it harder to track
 Thomas> upstream's bug fixes and improvements, surely?

Well, the question there is how far to take that principle; do we avoid
pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
etc., and so on.

(I notice that a stray uint8_t that I left behind broke the Windows
build but not the linux/bsd one, so something would have to be fixed in
the windows build in order to avoid having to change that.)

The Ryu code is perhaps an extreme example because it follows almost the
exact reverse of our coding standard.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

14 December 2018, 21:25:29

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Thomas" == Thomas Munro <thomas.munro@enterprisedb.com> writes:
>  Thomas> -1 for making superficial changes to code that we are
>  Thomas> "vendoring", unless it is known to be dead/abandoned and we're
>  Thomas> definitively forking it. It just makes it harder to track
>  Thomas> upstream's bug fixes and improvements, surely?

> Well, the question there is how far to take that principle; do we avoid
> pgindent'ing the code, do we avoid changing uint64_t etc. to uint64
> etc., and so on.
> The Ryu code is perhaps an extreme example because it follows almost the
> exact reverse of our coding standard.

My heart kind of sinks when looking at this patch, because it's a
large addition of not-terribly-well-documented code that nobody here
really understands, never mind the problem that it's mostly not close
to our own coding standards.

Our track record in borrowing code from "upstream" projects is pretty
miserable: almost without exception, we've found ourselves stuck with
maintaining such code ourselves after a few years.  I don't see any
reason to think that wouldn't be true here; in fact there's much more
reason to worry here than we had for, eg, borrowing the regex code.
The maintenance track record of this github repo appears to span six
months, and it's now been about four months since the last commit.
It might be abandonware already.

Is this a path we really want to go down?  I'm not convinced the
cost/benefit ratio is attractive.

If we do go down this path, though, I'm not too worried about making
it simple to absorb upstream fixes; I bet there will be few to none.
(Still, you might want to try to automate and document the coding
format conversion steps, along the line of what I've done recently
for our copy of tzcode.)

            regards, tom lane

Re: Ryu floating point output patch

From

Andres Freund

Date:

14 December 2018, 21:29:22

Hi,

On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
> Our track record in borrowing code from "upstream" projects is pretty
> miserable: almost without exception, we've found ourselves stuck with
> maintaining such code ourselves after a few years.  I don't see any
> reason to think that wouldn't be true here; in fact there's much more
> reason to worry here than we had for, eg, borrowing the regex code.
> The maintenance track record of this github repo appears to span six
> months, and it's now been about four months since the last commit.
> It might be abandonware already.

It's been absorbed into MSVC's standard library and a bunch of other
projects, so there's certainly some other prominent adoption.

The last commit was a month ago, no? November 6th afaict.

> Is this a path we really want to go down?  I'm not convinced the
> cost/benefit ratio is attractive.

float->text conversion is one of the major bottlenecks when backing up
postgres, it's definitely a pain-point in practice. I've not really seen
a nicer implementation anywhere, not even close.

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Tom Lane

Date:

14 December 2018, 21:47:53

Andres Freund <andres@anarazel.de> writes:
> On 2018-12-14 13:25:29 -0500, Tom Lane wrote:
>> The maintenance track record of this github repo appears to span six
>> months, and it's now been about four months since the last commit.
>> It might be abandonware already.

> The last commit was a month ago, no? November 6th afaict.

Hmm, the commit history I was looking at ended on Aug 19, but I must've
done something wrong, because going in from a different angle shows
me commits up to November.

> It's been absorbed into MSVC's standard library and a bunch of other
> projects, so there's certainly some other prominent adoption.

If we think that's going on, maybe we should just sit tight till glibc
absorbs it.

            regards, tom lane

Re: Ryu floating point output patch

From

Andres Freund

Date:

14 December 2018, 21:53:01

Hi,

On 2018-12-14 13:47:53 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It's been absorbed into MSVC's standard library and a bunch of other
> > projects, so there's certainly some other prominent adoption.
> 
> If we think that's going on, maybe we should just sit tight till glibc
> absorbs it.

All the stuff it'll have to put above it will make it slower
anyway. It's not like this'll be a feature that's hard to rip out if all
libc's have fast roundtrip safe conversion routines, or if something
better comes along.

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 22:22:29

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> The maintenance track record of this github repo appears to span
 Tom> six months,

The algorithm was only published six months ago.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

14 December 2018, 22:27:20

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> The maintenance track record of this github repo appears to span
>  Tom> six months,

> The algorithm was only published six months ago.

Doesn't really affect my point: there's little reason to believe that
there will be an active upstream several years down the pike.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 23:06:30

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> (Still, you might want to try to automate and document the coding
 Tom> format conversion steps, along the line of what I've done recently
 Tom> for our copy of tzcode.)

Working around our declaration-after-statement prohibition required
manually moving some lines of code, manually separating some
declarations from their assignments (removing const qualifiers), and as
a last resort introducing new code blocks. I doubt it could be automated
in any sane way. (This might be an argument to relax that rule.)

Mechanical search-and-replace accounted for the _t suffix on types, the
addition of the UINT64CONSTs, and changing assert to Assert; that much
could be automated.

pgindent did a pretty horrible job on the comments, a script could
probably do better.

I guess removal of the unwanted #ifs could be automated without too much
difficulty.

(But I'm not likely going to do any of this in the forseeable future,
because I don't expect it to be useful.)

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 December 2018, 23:37:46

>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> Is this a path we really want to go down? I'm not convinced the
 >> cost/benefit ratio is attractive.

 Andres> float->text conversion is one of the major bottlenecks when
 Andres> backing up postgres, it's definitely a pain-point in practice.

Also an issue with queries. I got into this whole area after diagnosing
a problem for a user on IRC who was seeing a 4x slowdown for PG as
compared to MSSQL, from 30 seconds to 120 seconds. We determined that
the _entire_ difference was accounted for by float conversion.

Now that was an unusual case, downloading a large dataset of floats at
once into a statistics program, but it's very easy to show
proportionally large overheads from float output:

(using this table:
  create table flttst4 as
    select i a, i b, i c, i d, i::float8 e, random() f
      from generate_series(1::bigint, 10000000::bigint) i;
)

postgres=# copy flttst4(d) to '/dev/null';  -- bigint column
COPY 10000000
Time: 2166.001 ms (00:02.166)

postgres=# copy flttst4(e) to '/dev/null';  -- float8 col, integer values
COPY 10000000
Time: 4233.005 ms (00:04.233)

postgres=# copy flttst4(f) to '/dev/null';  -- float8 col, random()
COPY 10000000
Time: 7261.421 ms (00:07.261)

-- vs. timings with Ryu:

postgres=# copy flttst4(e) to '/dev/null';  -- float8 col, integer values
COPY 10000000
Time: 2391.725 ms (00:02.392)

postgres=# copy flttst4(f) to '/dev/null';  -- float8 col, random()
COPY 10000000
Time: 2245.699 ms (00:02.246)

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

11 January 2019, 05:40:25

Rebased this patch onto current master. No functional changes; just
fixed up the copyright dates and a couple of stray missing UINT64CONSTs.

Regression tests still fail because how to fix them depends on the
issues below. Likewise docs are not changed yet for the same reason.

Decisions that need to be made in order to proceed:

1. Should exact output become the default, or is it more important to
   preserve the historical output?

  I will argue very strongly that the default output should be exact.

2. How far do we need to go to support existing uses of
   extra_float_digits? For example, do we need a way for clients to
   request that they get the exact same output as previously for
   extra_float_digits=2 or 3, rather than just assuming that any
   round-trip-exact value will do?

  (this would likely mean adding a new GUC, rather than basing
  everything off extra_float_digits as the patch does now)

3. Do we need to do anything about how conversions from floats to
   numeric work? At present they _ignore_ extra_float_digits (presumably
   on immutability grounds) and convert using only DBL_DIG digits of
   precision.

  I have no very strong position on this but incline to keeping the
  existing behavior, though possibly it would be good to add functions
  to get the round-trip value and possibly even the true exact value.
  (It would be sort of nice if CAST(floatval AS numeric(400,18)) or
  similar could work as you'd expect, but currently we treat that as
  floatval::numeric::numeric(400,18) so the cast function doesn't know
  about the eventual typmod.)

4. Can we allow declaration-after-statement please? That would allow
   keeping this code significantly closer to its upstream.

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu5.patch

Re: Ryu floating point output patch

From

Chapman Flack

Date:

11 January 2019, 07:02:01

Does the project have an established view on non-ASCII names in
sources or docs?

AFAICS [1], the name of the algorithm may be Ryū.

-Chap


[1] https://dl.acm.org/citation.cfm?id=3192369

Re: Ryu floating point output patch

From

Andres Freund

Date:

11 January 2019, 07:04:37

Hi,

On 2019-01-10 23:02:01 -0500, Chapman Flack wrote:
> Does the project have an established view on non-ASCII names in
> sources or docs?
> 
> AFAICS [1], the name of the algorithm may be Ryū.

I think it'd be a really bad idea to start having non-ascii
filenames, and I quite doubt that I'm alone in that.

- Andres

Re: Ryu floating point output patch

From

Tom Lane

Date:

11 January 2019, 07:09:15

Andres Freund <andres@anarazel.de> writes:
> On 2019-01-10 23:02:01 -0500, Chapman Flack wrote:
>> Does the project have an established view on non-ASCII names in
>> sources or docs?
>> AFAICS [1], the name of the algorithm may be Ryū.

> I think it'd be a really bad idea to start having non-ascii
> filenames, and I quite doubt that I'm alone in that.

Non-ASCII filenames seem right out.  I thought the question was
about whether we even want to have non-ASCII characters within
source code (my view: avoid if possible) or in docs (we do,
but it's better if you can make it into html entities).

            regards, tom lane

Re: Ryu floating point output patch

From

Robert Haas

Date:

11 January 2019, 20:13:50

On Thu, Jan 10, 2019 at 11:10 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2019-01-10 23:02:01 -0500, Chapman Flack wrote:
> >> Does the project have an established view on non-ASCII names in
> >> sources or docs?
> >> AFAICS [1], the name of the algorithm may be Ryū.
>
> > I think it'd be a really bad idea to start having non-ascii
> > filenames, and I quite doubt that I'm alone in that.
>
> Non-ASCII filenames seem right out.  I thought the question was
> about whether we even want to have non-ASCII characters within
> source code (my view: avoid if possible) or in docs (we do,
> but it's better if you can make it into html entities).

+1 to all that.

I don't have an opinion on the code that is part of this patch, but
the benchmark results are impressive.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

11 January 2019, 20:33:19

>>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:

 Robert> I don't have an opinion on the code that is part of this patch

At this point, frankly, I'm less interested in opinions on the code
itself (which can be addressed later if need be) than on answers to (or
discussion of) the questions asked upthread.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Robert Haas

Date:

11 January 2019, 20:55:54

On Fri, Jan 11, 2019 at 12:33 PM Andrew Gierth
<andrew@tao11.riddles.org.uk> wrote:
> >>>>> "Robert" == Robert Haas <robertmhaas@gmail.com> writes:
>  Robert> I don't have an opinion on the code that is part of this patch
>
> At this point, frankly, I'm less interested in opinions on the code
> itself (which can be addressed later if need be) than on answers to (or
> discussion of) the questions asked upthread.

Unfortunately, I can't help you very much there.  My knowledge is
insufficient to give intelligent answers to those questions, and I
don't think giving you unintelligent answers is a good idea.  About as
much as I can say is that if you commit this patch, and as a result, a
dump-and-restore from v11 to v12 causes a change in the bits on disk
that would not have occurred without this patch, that's very bad.
What ramifications there will be if the output changes in ways that
don't affect the bits that get written to the platter -- I don't know.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ryu floating point output patch

From

Andres Freund

Date:

11 January 2019, 21:09:42

Hi,

On 2019-01-11 02:40:25 +0000, Andrew Gierth wrote:
> Decisions that need to be made in order to proceed:
> 
> 1. Should exact output become the default, or is it more important to
>    preserve the historical output?
> 
>   I will argue very strongly that the default output should be exact.

Same.


> 2. How far do we need to go to support existing uses of
>    extra_float_digits? For example, do we need a way for clients to
>    request that they get the exact same output as previously for
>    extra_float_digits=2 or 3, rather than just assuming that any
>    round-trip-exact value will do?

I think it might be ok to say that any positive value will yield
round-trip-exact values, even if they're not precisely the same as the
ones before.


> 3. Do we need to do anything about how conversions from floats to
>    numeric work? At present they _ignore_ extra_float_digits (presumably
>    on immutability grounds) and convert using only DBL_DIG digits of
>    precision.
> 
>   I have no very strong position on this but incline to keeping the
>   existing behavior, though possibly it would be good to add functions
>   to get the round-trip value and possibly even the true exact value.
>   (It would be sort of nice if CAST(floatval AS numeric(400,18)) or
>   similar could work as you'd expect, but currently we treat that as
>   floatval::numeric::numeric(400,18) so the cast function doesn't know
>   about the eventual typmod.)

Hm. What's the argument to not just always use roundtrip-safe conversion
here?


> 4. Can we allow declaration-after-statement please? That would allow
>    keeping this code significantly closer to its upstream.

As I said on IRC: I'm personally on-board with changing this styilistic
requirement, but I also think that it'd be OK to just disable the
warning for the individual object files when they're imported (likely
using target specific makefile variable assignements) if we cannot get
agreement on the global policy change.

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Andres Freund

Date:

11 January 2019, 21:30:15

Hi,

On 2019-01-11 12:55:54 -0500, Robert Haas wrote:
> About as much as I can say is that if you commit this patch, and as a
> result, a dump-and-restore from v11 to v12 causes a change in the bits
> on disk that would not have occurred without this patch, that's very
> bad.  What ramifications there will be if the output changes in ways
> that don't affect the bits that get written to the platter -- I don't
> know.

Are you trying to highlight the potential dangers of this patch, or are
you seeing a concrete risk of that in the discussion? Both ryu's and the
current output (as used by pg_dump, with extra_float_digits = 3) ought
to be roundtrip safe.  There's possibly some chance for differences in
insignificant bits, but that ought to be pretty much it?

The current discussion would probably change the data that's stored on disk
between ryu/non-ryu for queries like
CREATE TABLE AS SELECT floatcol::text::float FROM ...
or
CREATE TABLE AS SELECT floatcol::text FROM ...

when not specifyiung extra_float_digits = 3, but the ryu case would be
more precise, so that ought to be good?

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Robert Haas

Date:

11 January 2019, 21:33:54

On Fri, Jan 11, 2019 at 1:30 PM Andres Freund <andres@anarazel.de> wrote:
> On 2019-01-11 12:55:54 -0500, Robert Haas wrote:
> > About as much as I can say is that if you commit this patch, and as a
> > result, a dump-and-restore from v11 to v12 causes a change in the bits
> > on disk that would not have occurred without this patch, that's very
> > bad.  What ramifications there will be if the output changes in ways
> > that don't affect the bits that get written to the platter -- I don't
> > know.
>
> Are you trying to highlight the potential dangers of this patch, or are
> you seeing a concrete risk of that in the discussion? Both ryu's and the
> current output (as used by pg_dump, with extra_float_digits = 3) ought
> to be roundtrip safe.  There's possibly some chance for differences in
> insignificant bits, but that ought to be pretty much it?
>
> The current discussion would probably change the data that's stored on disk
> between ryu/non-ryu for queries like
> CREATE TABLE AS SELECT floatcol::text::float FROM ...
> or
> CREATE TABLE AS SELECT floatcol::text FROM ...
>
> when not specifyiung extra_float_digits = 3, but the ryu case would be
> more precise, so that ought to be good?

I'm not complaining about the patch.  I'm saying that the patch can't
change people's existing data when they dump and restore.  If you're
saying it doesn't, cool.  I don't mind if the answers to queries get
more precise -- that's good, as you say -- just that we don't change
their data.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ryu floating point output patch

From

Andres Freund

Date:

11 January 2019, 21:40:49

Hi,

On 2019-01-11 13:33:54 -0500, Robert Haas wrote:
> On Fri, Jan 11, 2019 at 1:30 PM Andres Freund <andres@anarazel.de> wrote:
> > On 2019-01-11 12:55:54 -0500, Robert Haas wrote:
> > > About as much as I can say is that if you commit this patch, and as a
> > > result, a dump-and-restore from v11 to v12 causes a change in the bits
> > > on disk that would not have occurred without this patch, that's very
> > > bad.  What ramifications there will be if the output changes in ways
> > > that don't affect the bits that get written to the platter -- I don't
> > > know.
> >
> > Are you trying to highlight the potential dangers of this patch, or are
> > you seeing a concrete risk of that in the discussion? Both ryu's and the
> > current output (as used by pg_dump, with extra_float_digits = 3) ought
> > to be roundtrip safe.  There's possibly some chance for differences in
> > insignificant bits, but that ought to be pretty much it?
> >
> > The current discussion would probably change the data that's stored on disk
> > between ryu/non-ryu for queries like
> > CREATE TABLE AS SELECT floatcol::text::float FROM ...
> > or
> > CREATE TABLE AS SELECT floatcol::text FROM ...
> >
> > when not specifyiung extra_float_digits = 3, but the ryu case would be
> > more precise, so that ought to be good?
> 
> I'm not complaining about the patch.  I'm saying that the patch can't
> change people's existing data when they dump and restore.  If you're
> saying it doesn't, cool.  I don't mind if the answers to queries get
> more precise -- that's good, as you say -- just that we don't change
> their data.

It'd potentially change data between an non-ryu->{ryu,non-ryu} and
ryu->ryu versions, for dumps that didn't specify extra_float_digits =
3. But that'd not be pg_dump, and already broken previously, so I don't
have particularly much sympathy?

And of course it'd change the dump's text contents between ryu and
non-ryu backends even with extra_float_digits = 3, but the resulting
floats ought to be the same. It's just that ryu is better at figuring
out what the minimal text representation is than the current code.

Greetings,

Andres Freund

Re: Ryu floating point output patch

From

Robert Haas

Date:

11 January 2019, 21:42:25

On Fri, Jan 11, 2019 at 1:40 PM Andres Freund <andres@anarazel.de> wrote:
> It'd potentially change data between an non-ryu->{ryu,non-ryu} and
> ryu->ryu versions, for dumps that didn't specify extra_float_digits =
> 3. But that'd not be pg_dump, and already broken previously, so I don't
> have particularly much sympathy?

Agreed.

> And of course it'd change the dump's text contents between ryu and
> non-ryu backends even with extra_float_digits = 3, but the resulting
> floats ought to be the same. It's just that ryu is better at figuring
> out what the minimal text representation is than the current code.

wfm.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Ryu floating point output patch

From

Tom Lane

Date:

11 January 2019, 22:54:55

Andres Freund <andres@anarazel.de> writes:
> And of course it'd change the dump's text contents between ryu and
> non-ryu backends even with extra_float_digits = 3, but the resulting
> floats ought to be the same. It's just that ryu is better at figuring
> out what the minimal text representation is than the current code.

I'm fairly concerned about this blithe assertion that the ryu code is
smarter than the rest of the world.  In particular, how does it know
how every strtod() on the planet will react to specific input?

            regards, tom lane

Re: Ryu floating point output patch

From

Andres Freund

Date:

11 January 2019, 23:03:11

Hi,

On 2019-01-11 14:54:55 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > And of course it'd change the dump's text contents between ryu and
> > non-ryu backends even with extra_float_digits = 3, but the resulting
> > floats ought to be the same. It's just that ryu is better at figuring
> > out what the minimal text representation is than the current code.
> 
> I'm fairly concerned about this blithe assertion that the ryu code is
> smarter than the rest of the world.

The paper on which ryu is based seems to have quite some careful
analysis behind it, explaining why it's correct (i.e. why the algorithm
generates the minimal output, but not output that's imprecise). Although
I certainly didn't invest substantial amounts of time reviewing the
correctness of the logic therein, and would quite possibly fail due to
insufficient maths chops if I tried.

> In particular, how does it know how every strtod() on the planet will
> react to specific input?

strtod()'s job ought to computationally be significantly easier than the
other way round, no? And if there's buggy strtod() implementations out
there, why would they be guaranteed to do the correct thing with our
current output, but not with ryu's?

Greetings,

Andres Freund

declaration-after-statement (was Re: Ryu floating point output patch)

From

Andrew Gierth

Date:

12 January 2019, 02:04:51

>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> 4. Can we allow declaration-after-statement please? That would allow
 >> keeping this code significantly closer to its upstream.

 Andres> As I said on IRC: I'm personally on-board with changing this
 Andres> styilistic requirement, but I also think that it'd be OK to
 Andres> just disable the warning for the individual object files when
 Andres> they're imported (likely using target specific makefile
 Andres> variable assignements) if we cannot get agreement on the global
 Andres> policy change.

So are there any strong disagreements with the idea of disabling the
warning at least for these specific files? If I don't hear any
objections, I'll go for that in the next version of the patch.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

12 January 2019, 05:53:14

>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> 3. Do we need to do anything about how conversions from floats to
 >> numeric work? At present they _ignore_ extra_float_digits
 >> (presumably on immutability grounds) and convert using only DBL_DIG
 >> digits of precision.
 >> 
 >> I have no very strong position on this but incline to keeping the
 >> existing behavior, though possibly it would be good to add functions
 >> to get the round-trip value and possibly even the true exact value.
 >> (It would be sort of nice if CAST(floatval AS numeric(400,18)) or
 >> similar could work as you'd expect, but currently we treat that as
 >> floatval::numeric::numeric(400,18) so the cast function doesn't know
 >> about the eventual typmod.)

 Andres> Hm. What's the argument to not just always use roundtrip-safe
 Andres> conversion here?

Hmm.

I was thinking that the fact that the existing conversion didn't respect
extra_float_digits probably meant that people didn't care. But
searching, I do find some complaints about it, though mostly they get
brushed off with some variant of "you're doing it wrong". (Arguably any
float to numeric conversion is a fairly dubious practice.)

But I guess the case could be made for using the shortest-roundtrip-safe
value here too.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Donald Dong

Date:

14 January 2019, 04:23:45

> On Jan 11, 2019, at 10:40 AM, Andres Freund <andres@anarazel.de> wrote:
>
> And of course it'd change the dump's text contents between ryu and
> non-ryu backends even with extra_float_digits = 3, but the resulting
> floats ought to be the same. It's just that ryu is better at figuring
> out what the minimal text representation is than the current code.

+1. I spent some time reading the Ryu paper, and I'm convinced.

I think Ryu will be a good choice we want to have as many digits as possible
(and roundtrip-safe). Since we have the extra_float_digits and ndig, the Ryu may
give us more digits than we asked.

Maybe one way to respect ndig is to clear the last few digits of (a, b, c)
(This is the notation in the paper. I think the code used different var names)
in base 10,
https://github.com/ulfjack/ryu/blob/b2ae7a712937711375a79f894106a0c6d92b035f/ryu/d2s.c#L264-L266
  // Step 3: Convert to a decimal power base using 128-bit arithmetic.
  uint64_t vr, vp, vm;
  int32_t e10;
for Ryu to generate the desired string?

Also I wonder why do we need to make this change?
-int                    extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int                    extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */

Re: Ryu floating point output patch

From

"David G. Johnston"

Date:

14 January 2019, 06:35:09

On Sunday, January 13, 2019, Donald Dong <xdong@csumb.edu> wrote:

Also I wonder why do we need to make this change?
-int extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */

From the original post:

“This version of the patch continues to use extra_float_digits to select

whether Ryu output is used - but I've also changed the default, so that
Ryu is used unless you explicitly set extra_float_digits to 0 or less.”

David J.

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

15 January 2019, 13:37:58

>>>>> "Andres" == Andres Freund <andres@anarazel.de> writes:

 >> In particular, how does it know how every strtod() on the planet
 >> will react to specific input?

 Andres> strtod()'s job ought to computationally be significantly easier
 Andres> than the other way round, no? And if there's buggy strtod()
 Andres> implementations out there, why would they be guaranteed to do
 Andres> the correct thing with our current output, but not with ryu's?

Funny thing: I've been devoting considerable effort to testing this, and
the one failure mode I've found is very interesting; it's not a problem
with strtod(), in fact it's a bug in our float4in caused by _misuse_ of
strtod().

In particular, float4in thinks it's ok to do strtod() on the input, and
then round the result to a float. It is wrong to think that, and here's
why:

Consider the (float4) bit pattern x15ae43fd. The true decimal value of
this, and its adjacent values are:

x15ae43fc = 7.0385300755536269169437150392273653469292493678466371420654468238353729248046875e-26
  midpoint  7.0385303837024180189014515281838361605176203339429008565275580622255802154541015625e-28
x15ae43fd = 7.038530691851209120859188017140306974105991300039164570989669300615787506103515625e-26
  midpoint  7.0385310000000002228169245060967777876943622661354282854517805390059947967529296875e-26
x15ae43fe = 7.03853130814879132477466099505324860128273323223169199991389177739620208740234375e-26

Now look at what happens if you pass '7.038531e-26' as input for float4.

From the above values, the correct result is x15ae43fd, because any
other value would be strictly further away. But if you input the value
as a double first, here are the adjacent representations:

x3ab5c87fafffffff = 7.038530999999999074873222531206633286975087635036133540...e-26
         midpoint   7.0385309999999996488450735186517055373347249505857809130...e-28
x3ab5c87fb0000000 = 7.03853100000000022281692450609677778769436226613542828545...e-28
         midpoint   7.03853100000000079678877549354185003805399958168507565784...e-28
x3ab5c87fb0000001 = 7.03853100000000137076062648098692228841363689723472303024...e-28

So the double value is x3ab5c87fb0000000, which has a mantissa of
(1).01011100100001111111101100000...

which when rounded to 23 bits (excluding the implied (1)), is
010 1110 0100 0011 1111 1110  = 2e43fe

So the rounded result is incorrectly x15ae43fe, rather than the expected
correct value of x15ae43fd.

strtof() exists as part of the relevant standards, so float4in should be
using that in place of strtod, and that seems to fix this case for me.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

15 January 2019, 22:39:38

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> Funny thing: I've been devoting considerable effort to testing this, and
> the one failure mode I've found is very interesting; it's not a problem
> with strtod(), in fact it's a bug in our float4in caused by _misuse_ of
> strtod().

> In particular, float4in thinks it's ok to do strtod() on the input, and
> then round the result to a float.

Hm.  I'm not sure how much weight to put on this example, since we don't
guarantee that float4 values will round-trip with less than 9 digits of
printed precision.  This example corresponds to extra_float_digits = 1
which doesn't meet that minimum.  However, I agree that it's rounding
the presented input in the less desirable direction.

> strtof() exists as part of the relevant standards, so float4in should be
> using that in place of strtod, and that seems to fix this case for me.

To clarify that: strtof() is defined in C99 but not older standards
(C89 and SUSv2 lack it).

While we've agreed that we'll require C99 *compilers* going forward,
I'm less enthused about supposing that the platform's libc is up to
date, not least because replacing libc on an old box is orders of
magnitude harder than installing a nondefault version of gcc.

As a concrete example, gaur lacks strtof(), and I'd be pretty much
forced to retire it if we start requiring that function.  Putting
a more modern gcc on that machine wasn't too painful, but I'm not
messing with its libc.

But I'd not object to adding a configure check and teaching
float4in to use strtof() if available.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

15 January 2019, 23:51:15

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

On a slightly unrelated note, is the small-is-zero variant output file
for the float tests still required?

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

16 January 2019, 01:04:02

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> On a slightly unrelated note, is the small-is-zero variant output file
> for the float tests still required?

Hm, good question.  I had been about to respond that it must be,
but looking more closely at pg_regress.c I see that the default
expected-file will be tried if the one fingered by resultmap isn't
an exact match.  So if everybody is matching float8.out these
days, we wouldn't know it.

The only one of the resultmap entries I'm in a position to try
right now is x86 freebsd, and I can report that FreeBSD/x86
matches the default float8.out, and has done at least back to
FreeBSD 8.4.

It wouldn't be an unreasonable suspicion to guess that the other
BSDen have likewise been brought up to speed to produce ERANGE
on underflow, but I can't check it ATM.

On the other hand, cygwin is probably pretty frozen-in-time,
so it might still have the old behavior.  Can anyone check that?

            regards, tom lane

Re: Ryu floating point output patch

From

Donald Dong

Date:

17 January 2019, 08:10:33

> On Jan 15, 2019, at 2:37 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
> Andres> strtod()'s job ought to computationally be significantly easier
> Andres> than the other way round, no? And if there's buggy strtod()
> Andres> implementations out there, why would they be guaranteed to do
> Andres> the correct thing with our current output, but not with ryu's?
>
> Funny thing: I've been devoting considerable effort to testing this, and
> the one failure mode I've found is very interesting; it's not a problem
> with strtod(), in fact it's a bug in our float4in caused by _misuse_ of
> strtod().

Hi,

I'm trying to reproduce the results by calling float4in in my test code. But I
have some difficulties linking the code:

testfloat4.c:(.text+0x34): undefined reference to `float4in'
testfloat4.c:(.text+0x3c): undefined reference to `DirectFunctionCall1Coll'

I tried offering float.o to the linker in addition to my test program. I also
tried to link all the objects (*.o). But the errors still exist. I attached my
changes as a patch.

I wonder if creating separated test programs is a good way of testing the code,
and I'm interested in learning what I missed.

Thank you.

Attachment

test_float4.patch

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

17 January 2019, 08:29:59

>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:

 Donald> Hi,

 Donald> I'm trying to reproduce the results by calling float4in in my
 Donald> test code. But I have some difficulties linking the code:

 Donald> testfloat4.c:(.text+0x34): undefined reference to `float4in'
 Donald> testfloat4.c:(.text+0x3c): undefined reference to `DirectFunctionCall1Coll'

The way you're doing that isn't really a good way to test it - those
other programs being built by that rule are all frontend code, whereas
float4in is a backend function. If you want to add your own test it, you
could do a loadable module for it, or just do tests from SQL.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 January 2019, 13:03:29

Next version.

Changes since previous patch:

1. Remove "ryu_" from the name of interface functions; it seems better
to name them according to functionality rather than enshrining the
algorithm name. double_to_shortest_decimal*() and
float_to_shortest_decimal*() are the new names. common/ryu.h renamed
to common/shortest_dec.h accordingly.

2. Revert the removal of upstream's declaration-after-statements, and
add makefile foo to compile without warning about same. Other style
aspects (indentation, comments) still follow our style rather than
upstream's.

3. This patch _includes_ the strtof() patch that was also posted
separately; this is because we need accurate float4in in order to get
round-trip conversions to work for the edge case.

4. Passes regression tests (existing tests either by changing the
output, where the result is reasonably expected to be bit-exact; or by
locally setting extra_float_digits=0 to restore the old output, where
the output is not expected to be bit-exact).

Still to be done: docs, which are still waiting on final decisions on
the questions I asked upthread.

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu6.patch

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 January 2019, 13:05:53

>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:

 Donald> I attached my changes as a patch.

BTW, doing that in a thread about a commitfest patch confuses the
commitfest app and cfbot (both of which think it's a new version of the
patch under discussion), so best avoided.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 January 2019, 14:37:30

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> 4. Passes regression tests

Well, almost. I missed a contrib module, and there's some issues with
rounding and subnormal values on Windows that I need to track down.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

19 January 2019, 17:06:08

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> 4. Passes regression tests

 Andrew> Well, almost. I missed a contrib module, and there's some
 Andrew> issues with rounding and subnormal values on Windows that I
 Andrew> need to track down.

This should fix all the known issues.

This does make one significant change to the algorithm: it will no
longer return the exact midpoint between two values (thereby relying on
the reader to get the correct rounding direction). This gains
portability without compromising the performance (which is not affected
by the change).

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu7.patch

Re: Ryu floating point output patch

From

Donald Dong

Date:

20 January 2019, 02:59:13

> On Jan 18, 2019, at 2:05 AM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
> BTW, doing that in a thread about a commitfest patch confuses the
> commitfest app and cfbot (both of which think it's a new version of the
> patch under discussion), so best avoided.

Oops. Thank you. Noted.

I think the previous additional digits behavior still exist
in the latest patch. For example:

=# set extra_float_digits = 0;
SET
=# select float4in('1.123456789');
 float4in
----------
  1.12346
(1 row)

=# set extra_float_digits = 1;
SET
=# select float4in('1.123456789');
 float4in
-----------
 1.1234568
(1 row)

The extra_float_digits is increased by 1, but two digits are
added. Without the patch, it will render ' 1.123457' instead.

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

20 January 2019, 04:12:36

>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:

 Donald> I think the previous additional digits behavior still exist
 Donald> in the latest patch. For example:

 Donald> =# set extra_float_digits = 0;
 Donald> SET
 Donald> =# select float4in('1.123456789');
 Donald>  float4in
 Donald> ----------
 Donald>   1.12346
 Donald> (1 row)

 Donald> =# set extra_float_digits = 1;
 Donald> SET
 Donald> =# select float4in('1.123456789');
 Donald>  float4in
 Donald> -----------
 Donald>  1.1234568
 Donald> (1 row)

 Donald> The extra_float_digits is increased by 1, but two digits are
 Donald> added.

That's intentional; this patch takes any value of extra_float_digits
greater than 0 to mean "generate the shortest precise output".

In practice setting extra_float_digits to 1 is rare to nonexistent;
almost all users of extra_float_digits set it to either 3 (to get
precise values), 2 (for historical reasons since we didn't allow more
than 2 in older versions), or less than 0 (to get rounded output for
more convenient display when precision is not required).

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Donald Dong

Date:

20 January 2019, 04:20:00

> On Jan 19, 2019, at 5:12 PM, Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>
>>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:
>
> Donald> I think the previous additional digits behavior still exist
> Donald> in the latest patch. For example:
>
> Donald> =# set extra_float_digits = 0;
> Donald> SET
> Donald> =# select float4in('1.123456789');
> Donald>  float4in
> Donald> ----------
> Donald>   1.12346
> Donald> (1 row)
>
> Donald> =# set extra_float_digits = 1;
> Donald> SET
> Donald> =# select float4in('1.123456789');
> Donald>  float4in
> Donald> -----------
> Donald>  1.1234568
> Donald> (1 row)
>
> Donald> The extra_float_digits is increased by 1, but two digits are
> Donald> added.
>
> That's intentional; this patch takes any value of extra_float_digits
> greater than 0 to mean "generate the shortest precise output".
>
> In practice setting extra_float_digits to 1 is rare to nonexistent;
> almost all users of extra_float_digits set it to either 3 (to get
> precise values), 2 (for historical reasons since we didn't allow more
> than 2 in older versions), or less than 0 (to get rounded output for
> more convenient display when precision is not required).

Makes sense! Thank you for explaining.

I wonder if it's necessary to update the doc accordingly? Such as
https://www.postgresql.org/docs/11/datatype-numeric.html#DATATYPE-FLOAT
and
https://www.postgresql.org/docs/11/runtime-config-client.html
.

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

20 January 2019, 05:03:11

>>>>> "Donald" == Donald Dong <xdong@csumb.edu> writes:

 Donald> I wonder if it's necessary to update the doc accordingly?

Yes, I specifically mentioned that doc updates were needed, and that
they were not included because they depend on final decisions on the
various questions that I have asked.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

01 February 2019, 22:01:49

Latest patch.

No semantic changes; fixes postgresql.conf.sample and adds docs.

In the absence of feedback to the contrary, I've documented the current
behavior of extra_float_digits. I've also removed language from the docs
regarding non-IEEE float formats, and made it clear that all supported
platforms now use IEEE format (whether or not they implement the actual
arithmetic correctly).

I thought of what might be a good reason to leave float->numeric
conversions alone: since they're currently immutable (unlike float
output), they might appear in indexes, and pg_upgrade would have to cope
with that. I'm therefore going to consider float to numeric conversion
as being an issue for a separate patch if at all, rather than part of
this one.

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu8.patch

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

02 February 2019, 23:14:18

And another version. (I should have checked upstream before posting the
last one, sigh)

This one incorporates (with heavy adaptations) a new optimization from
upstream, adding a fastpath for values that happen to be small nonzero
integers. The overhead is low enough for this to be worthwhile.

Additionally, I've now micro-optimized the fixed-point output format
code (which is not in upstream), so it no longer makes a noticable
difference to performance.

There probably isn't any need for more optimization here since it's now
measurably faster (sometimes almost 2x faster) than bigint output in all
my tests (which probably means we could improve on bigint's performance,
but that's a job for another day).

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu9.patch

Re: Ryu floating point output patch

From

Tom Lane

Date:

03 February 2019, 02:57:50

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> [ ryu9.patch ]

I recall having tried some earlier version on my old HPUX dinosaur,
and it worked, but this one doesn't compile:

float.c: In function `float4in':
float.c:240: error: `HUGE_VALF' undeclared (first use in this function)
float.c:240: error: (Each undeclared identifier is reported only once
float.c:240: error: for each function it appears in.)

As far as I can find, that symbol doesn't appear anywhere in the
system headers on this platform.

FWIW, I did get a successful build and core regression test run on
NetBSD 8.99/aarch64.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

03 February 2019, 08:56:27

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> [ ryu9.patch ]

 Tom> I recall having tried some earlier version on my old HPUX dinosaur,
 Tom> and it worked, but this one doesn't compile:

 Tom> float.c: In function `float4in':
 Tom> float.c:240: error: `HUGE_VALF' undeclared (first use in this function)

(mutter mutter)

Try this one.

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu10.patch

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

03 February 2019, 14:04:37

Some final (I hope) updates:

 - adopt Noah's configure code for -Wno-declaration-after-statement
 - remove some code that proved to be dead, replaced by comments and
   asserts
 - add header comments for the public entry points
 - move definition of buffer lengths to public header and comment them
 - comment some assumptions that might otherwise be overlooked
 - simplify some of the fixed-point optimizations slightly
 - clarify some comments
 - change _buf variants to return the string length, just because they can

-- 
Andrew (irc:RhodiumToad)

Attachment

ryu11.patch

Re: Ryu floating point output patch

From

Tom Lane

Date:

04 February 2019, 02:56:28

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> [ ryu11.patch ]

I can confirm this compiles and passes core regression tests on
my HPPA dinosaur.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

13 February 2019, 20:09:03

>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

 Andrew> 2. How far do we need to go to support existing uses of
 Andrew>    extra_float_digits? For example, do we need a way for clients to
 Andrew>    request that they get the exact same output as previously for
 Andrew>    extra_float_digits=2 or 3, rather than just assuming that any
 Andrew>    round-trip-exact value will do?

 Andrew>   (this would likely mean adding a new GUC, rather than basing
 Andrew>   everything off extra_float_digits as the patch does now)

So it turns out, for reasons that should have been obvious in advance,
that _of course_ we need this, because otherwise there's no way to do
the cross-version upgrade regression check.

So we need something like exact_float_output='on' (default) that can be
selectively set to 'off' to restore the previous behavior of
extra_float_digits.

Thoughts?

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

14 February 2019, 18:14:14

On 2/13/19 12:09 PM, Andrew Gierth wrote:
>>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>  Andrew> 2. How far do we need to go to support existing uses of
>  Andrew>    extra_float_digits? For example, do we need a way for clients to
>  Andrew>    request that they get the exact same output as previously for
>  Andrew>    extra_float_digits=2 or 3, rather than just assuming that any
>  Andrew>    round-trip-exact value will do?
>
>  Andrew>   (this would likely mean adding a new GUC, rather than basing
>  Andrew>   everything off extra_float_digits as the patch does now)
>
> So it turns out, for reasons that should have been obvious in advance,
> that _of course_ we need this, because otherwise there's no way to do
> the cross-version upgrade regression check.
>
> So we need something like exact_float_output='on' (default) that can be
> selectively set to 'off' to restore the previous behavior of
> extra_float_digits.
>
> Thoughts?



I applied this:


diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 9edc7b9a02..a6b804ad2c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -1062,7 +1062,15 @@ setup_connection(Archive *AH, const char
*dumpencoding,
     * Set extra_float_digits so that we can dump float data exactly (given
     * correctly implemented float I/O code, anyway)
     */
-   if (AH->remoteVersion >= 90000)
+   if (getenv("PGDUMP_EXTRA_FLOAT_DIGITS"))
+   {
+       PQExpBuffer q = createPQExpBuffer();
+       appendPQExpBuffer(q, "SET extra_float_digits TO %s",
+                         getenv("PGDUMP_EXTRA_FLOAT_DIGITS"));
+       ExecuteSqlStatement(AH, q->data);
+       destroyPQExpBuffer(q);
+   }
+   else if (AH->remoteVersion >= 90000)
        ExecuteSqlStatement(AH, "SET extra_float_digits TO 3");
    else
        ExecuteSqlStatement(AH, "SET extra_float_digits TO 2");


Then I got the buildfarm cross-version-upgrade module to set the
environment value to 0 in the appropriate cases. All the tests passed,
as far back as 9.2, no new GUC needed.

We might not want to use that in more real-world cases of pg_dump use,
but I think for this purpose it should be fine.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

14 February 2019, 20:24:16

Andrew,

Is there any chance you can get me the regress/results/float[48].out
files from lorikeet and jacana? It would help a lot.

Seeing the diffs isn't enough, because I want to know if the float8 test
(which passes, so there's no diff) is matching the standard file or the
-small-is-zero file.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

14 February 2019, 20:42:51

On 2/14/19 12:24 PM, Andrew Gierth wrote:
> Andrew,
>
> Is there any chance you can get me the regress/results/float[48].out
> files from lorikeet and jacana? It would help a lot.
>
> Seeing the diffs isn't enough, because I want to know if the float8 test
> (which passes, so there's no diff) is matching the standard file or the
> -small-is-zero file.
>

Yeah, later on today or tomorrow.


Maybe we should have pg_regress report on the files it matches ...


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

15 February 2019, 01:17:47

On 2/14/19 12:42 PM, Andrew Dunstan wrote:
> On 2/14/19 12:24 PM, Andrew Gierth wrote:
>> Andrew,
>>
>> Is there any chance you can get me the regress/results/float[48].out
>> files from lorikeet and jacana? It would help a lot.
>>
>> Seeing the diffs isn't enough, because I want to know if the float8 test
>> (which passes, so there's no diff) is matching the standard file or the
>> -small-is-zero file.
>>
> Yeah, later on today or tomorrow.
>
>
> Maybe we should have pg_regress report on the files it matches ...
>
>


Rather than give you the files, I will just tell you, in both cases it
is matching float8.out, not the small-is-zero file.  As you can see in
the buildfarm web, the float4 tests are failing and being compared
against float4.out


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

15 February 2019, 02:28:39

>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

 Andrew> Rather than give you the files, I will just tell you, in both
 Andrew> cases it is matching float8.out, not the small-is-zero file.

OK, thanks. That lets me fix the float4 failures in a reasonably
straightforward way.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

17 February 2019, 16:44:41

On 2/14/19 10:14 AM, Andrew Dunstan wrote:
> On 2/13/19 12:09 PM, Andrew Gierth wrote:
>>>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>>  Andrew> 2. How far do we need to go to support existing uses of
>>  Andrew>    extra_float_digits? For example, do we need a way for clients to
>>  Andrew>    request that they get the exact same output as previously for
>>  Andrew>    extra_float_digits=2 or 3, rather than just assuming that any
>>  Andrew>    round-trip-exact value will do?
>>
>>  Andrew>   (this would likely mean adding a new GUC, rather than basing
>>  Andrew>   everything off extra_float_digits as the patch does now)
>>
>> So it turns out, for reasons that should have been obvious in advance,
>> that _of course_ we need this, because otherwise there's no way to do
>> the cross-version upgrade regression check.
>>
>> So we need something like exact_float_output='on' (default) that can be
>> selectively set to 'off' to restore the previous behavior of
>> extra_float_digits.
>>
>> Thoughts?
>
>
> I applied this:
>
>
> diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
> index 9edc7b9a02..a6b804ad2c 100644
> --- a/src/bin/pg_dump/pg_dump.c
> +++ b/src/bin/pg_dump/pg_dump.c
> @@ -1062,7 +1062,15 @@ setup_connection(Archive *AH, const char
> *dumpencoding,
>      * Set extra_float_digits so that we can dump float data exactly (given
>      * correctly implemented float I/O code, anyway)
>      */
> -   if (AH->remoteVersion >= 90000)
> +   if (getenv("PGDUMP_EXTRA_FLOAT_DIGITS"))
> +   {
> +       PQExpBuffer q = createPQExpBuffer();
> +       appendPQExpBuffer(q, "SET extra_float_digits TO %s",
> +                         getenv("PGDUMP_EXTRA_FLOAT_DIGITS"));
> +       ExecuteSqlStatement(AH, q->data);
> +       destroyPQExpBuffer(q);
> +   }
> +   else if (AH->remoteVersion >= 90000)
>         ExecuteSqlStatement(AH, "SET extra_float_digits TO 3");
>     else
>         ExecuteSqlStatement(AH, "SET extra_float_digits TO 2");
>
>
> Then I got the buildfarm cross-version-upgrade module to set the
> environment value to 0 in the appropriate cases. All the tests passed,
> as far back as 9.2, no new GUC needed.
>
> We might not want to use that in more real-world cases of pg_dump use,
> but I think for this purpose it should be fine.
>
>


I haven't seen a response to this. Cross version upgrade testing is
still broken. I don't think we need a GUC to fix it, but we do need this
or a new switch to tell pg_dump what to set extra_float_digits to,
unless someone has a better idea.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

17 February 2019, 18:34:36

>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

 >> [horrible environment variable hack]
 >> 
 >> We might not want to use that in more real-world cases of pg_dump use,
 >> but I think for this purpose it should be fine.

 Andrew> I haven't seen a response to this. Cross version upgrade
 Andrew> testing is still broken. I don't think we need a GUC to fix it,
 Andrew> but we do need this or a new switch to tell pg_dump what to set
 Andrew> extra_float_digits to, unless someone has a better idea.

I'd been holding off responding in the hope of other opinions, but for
what it's worth, I *really* dislike having pg_dump depend magically on
some new environment variable. I would suggest instead:

a) pg_dump could check if PGOPTIONS or the connect string contained an
extra_float_digits setting and defer to that if so.

Downside of this is that if someone is already using that in the
environment and pg_dump suddenly starts respecting it, they could get
imprecise values in their dumps unexpectedly. Option (a2) would be to
honour extra_float_digits only if it showed up in a connect string and
not in PGOPTIONS, which would be more explicit.

b) new command-line option, e.g. pg_dump --extra-float-digits=0

This is probably the safest option, IMO. Any preferences as to the
option name?

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

17 February 2019, 18:56:06

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> I'd been holding off responding in the hope of other opinions, but for
> what it's worth, I *really* dislike having pg_dump depend magically on
> some new environment variable.

Likewise.

> b) new command-line option, e.g. pg_dump --extra-float-digits=0

> This is probably the safest option, IMO. Any preferences as to the
> option name?

I like that too, assuming that it can be made to fit into the
structure of the cross-version-upgrade tests (which would have
to know to use it only with >= v12 pg_dump).

--extra-float-digits seems fine as a name.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

17 February 2019, 19:19:05

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> b) new command-line option, e.g. pg_dump --extra-float-digits=0

 >> This is probably the safest option, IMO. Any preferences as to the
 >> option name?

 Tom> I like that too, assuming that it can be made to fit into the
 Tom> structure of the cross-version-upgrade tests (which would have
 Tom> to know to use it only with >= v12 pg_dump).

 Tom> --extra-float-digits seems fine as a name.

As far as I can tell, test.sh only uses the new version's pg_dump (in
fact the only binaries directly used from the old version are initdb and
pg_ctl).

... wait, this initdb in test.sh is using an option that doesn't exist
before pg11? Is the cross-version test actually using some different
script? Andrew?

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

17 February 2019, 19:45:47

On 2/17/19 10:56 AM, Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> I'd been holding off responding in the hope of other opinions, but for
>> what it's worth, I *really* dislike having pg_dump depend magically on
>> some new environment variable.
> Likewise.



Sure, no problem here - that was just a POC. I did it that way to
minimize the code involved.



>
>> b) new command-line option, e.g. pg_dump --extra-float-digits=0
>> This is probably the safest option, IMO. Any preferences as to the
>> option name?
> I like that too, assuming that it can be made to fit into the
> structure of the cross-version-upgrade tests (which would have
> to know to use it only with >= v12 pg_dump).
>
> --extra-float-digits seems fine as a name.
>
>             



There are all sorts of version-specific things done there. The code will
be pretty minimal. We'll only invoke it if the target version is >= 12
and the source version is <= 11.


I can do this if we're agreed.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

17 February 2019, 19:49:50

On 2/17/19 11:19 AM, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  >> b) new command-line option, e.g. pg_dump --extra-float-digits=0
>
>  >> This is probably the safest option, IMO. Any preferences as to the
>  >> option name?
>
>  Tom> I like that too, assuming that it can be made to fit into the
>  Tom> structure of the cross-version-upgrade tests (which would have
>  Tom> to know to use it only with >= v12 pg_dump).
>
>  Tom> --extra-float-digits seems fine as a name.
>
> As far as I can tell, test.sh only uses the new version's pg_dump (in
> fact the only binaries directly used from the old version are initdb and
> pg_ctl).
>
> ... wait, this initdb in test.sh is using an option that doesn't exist
> before pg11? Is the cross-version test actually using some different
> script? Andrew?
>



The buildfarm cross-version upgrade tests do not run test.sh. See
<https://github.com/PGBuildFarm/client-code/blob/master/PGBuild/Modules/TestUpgradeXversion.pm>


Essentially it takes the result of a previous buildfarm run on the
source branch, dumps it, upgrades it, and dumps again, then compares the
two dumps. It doesn't itself run the regression tests.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

17 February 2019, 20:09:52

>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

 Andrew> There are all sorts of version-specific things done there. The
 Andrew> code will be pretty minimal. We'll only invoke it if the target
 Andrew> version is >= 12 and the source version is <= 11.

 Andrew> I can do this if we're agreed.

Sure, go ahead.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

18 February 2019, 00:34:58

BTW, another minor problem with this patch: various buildfarm members
are whining that

 prairiedog    | 2019-02-17 14:25:15 | ryu_common.h:111: warning: array subscript has type 'char'

This evidently is from the next-to-last line in

static inline int
copy_special_str(char *const result, const bool sign, const bool exponent, const bool mantissa)
{
    if (mantissa)
    {
        memcpy(result, "NaN", 3);
        return 3;
    }
    if (sign)
    {
        result[0] = '-';
    }
    if (exponent)
    {
        memcpy(result + sign, "Infinity", 8);
        return sign + 8;
    }
    result[sign] = '0';
    return sign + 1;
}

IMO, this is just awful coding, using a "bool" argument in
just one place as a boolean and in four other ones as an integer.
Aside from being cowboy coding, this has very serious risks of
misbehaving on platforms where "bool" isn't C99-like, so that
"sign" could potentially have values other than 0 and 1.

Please clean up that type-punning.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 February 2019, 01:22:03

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom>     result[sign] = '0';

 Tom> IMO, this is just awful coding, using a "bool" argument in just
 Tom> one place as a boolean and in four other ones as an integer. Aside
 Tom> from being cowboy coding, this has very serious risks of
 Tom> misbehaving on platforms where "bool" isn't C99-like, so that
 Tom> "sign" could potentially have values other than 0 and 1.

Elsewhere in the code it relies fairly heavily on the result of boolean
expressions being exactly 0 or 1 (which I'm pretty sure is required in
C89, not just C99); the only two places in the upstream that actually
required C99-specific boolean semantics were fixed in da6520be7.

Obviously I'll fix the warning, but how strict do you want to be about
the rest of the code? There are a lot of examples of things like,

    const uint32 q = log10Pow5(-e2) - (-e2 > 1);

    output = vr + (vr == vm || roundUp);

The code as it stands now follows the rule that nothing is ever assigned
to a "bool" variable, parameter, or result except the result of a
logical expression; and given that, that the value of any "bool"
variable interpreted as an integer is 0 or 1 and no other value.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

18 February 2019, 01:43:59

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>  Tom> IMO, this is just awful coding, using a "bool" argument in just
>  Tom> one place as a boolean and in four other ones as an integer. Aside
>  Tom> from being cowboy coding, this has very serious risks of
>  Tom> misbehaving on platforms where "bool" isn't C99-like, so that
>  Tom> "sign" could potentially have values other than 0 and 1.

> Elsewhere in the code it relies fairly heavily on the result of boolean
> expressions being exactly 0 or 1

Ugh.

> (which I'm pretty sure is required in C89, not just C99);

For truth-valued expressions, yes.  The question here is what can be
assumed about the integer value of a variable declared to be type "bool",
a question C89 does not address.

> Obviously I'll fix the warning, but how strict do you want to be about
> the rest of the code?

Well, given that we're now requiring C99 compilers, you'd think that
assuming stdbool semantics would be all right.  The problem on prairiedog
and locust (which seem to be the only complainants) is that stdbool
provides a _Bool type that has size 4, so c.h decides not to use stdbool:

#if defined(HAVE_STDBOOL_H) && SIZEOF_BOOL == 1
#include <stdbool.h>
#define USE_STDBOOL 1
#else

#ifndef bool
typedef char bool;
#endif

I believe that we could suppress these warnings by changing that last
to be

typedef unsigned char bool;

since what these warnings are really on about is the uncertain signedness
of the type "char".  Perhaps that's a reasonable way to deal with it.
If 0/1 assumptions are all over the place in the Ryu code, avoiding them
in this one place isn't going to move the goalposts for portability.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

18 February 2019, 01:46:26

On 2/17/19 12:09 PM, Andrew Gierth wrote:
>>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>  Andrew> There are all sorts of version-specific things done there. The
>  Andrew> code will be pretty minimal. We'll only invoke it if the target
>  Andrew> version is >= 12 and the source version is <= 11.
>
>  Andrew> I can do this if we're agreed.
>
> Sure, go ahead.
>

Here's a patch without docco. How much do we actually want to document
this? I'm not sure it has any great value outside of cross-version
upgrade testing.


cheers


andrew


-- 

Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

pg_dump_extra_float_digits.patch

Re: Ryu floating point output patch

From

Tom Lane

Date:

18 February 2019, 02:04:22

Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
> Here's a patch without docco. How much do we actually want to document
> this? I'm not sure it has any great value outside of cross-version
> upgrade testing.

I think we have to document it, but it can be fairly terse perhaps.

    Use the specified value of extra_float_digits when dumping
    floating-point data, rather than pg_dump's default.

I don't think we need to go into detail about when/why you might want
to use it; I can see people thinking of their own reasons.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 February 2019, 02:13:54

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> Obviously I'll fix the warning, but how strict do you want to be
 >> about the rest of the code?

 Tom> Well, given that we're now requiring C99 compilers, you'd think
 Tom> that assuming stdbool semantics would be all right. The problem on
 Tom> prairiedog and locust (which seem to be the only complainants) is
 Tom> that stdbool provides a _Bool type that has size 4, so c.h decides
 Tom> not to use stdbool:

Yes, this was the cause of the earlier regression test failures that
were fixed by da6520be7; I realized exactly what was going on after
writing that commit message (otherwise I'd have been more explicit).

 Tom> typedef char bool;
 Tom> #endif

 Tom> I believe that we could suppress these warnings by changing that
 Tom> last to be

 Tom> typedef unsigned char bool;

*squint* I _think_, going through the integer promotion rules, that that
should be safe.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 February 2019, 02:15:42

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
 >> Here's a patch without docco. How much do we actually want to
 >> document this? I'm not sure it has any great value outside of
 >> cross-version upgrade testing.

 Tom> I think we have to document it, but it can be fairly terse perhaps.

 Tom>     Use the specified value of extra_float_digits when dumping
 Tom>     floating-point data, rather than pg_dump's default.

 Tom> I don't think we need to go into detail about when/why you might
 Tom> want to use it; I can see people thinking of their own reasons.

It should probably at least say that (a) the default is to dump with the
maximum available precision, and at least hint at (b) that ordinary use
doesn't require the option. We shouldn't mislead people into thinking
they need it when they don't.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

18 February 2019, 02:21:56

On 2/17/19 6:15 PM, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  > Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>  >> Here's a patch without docco. How much do we actually want to
>  >> document this? I'm not sure it has any great value outside of
>  >> cross-version upgrade testing.
>
>  Tom> I think we have to document it, but it can be fairly terse perhaps.
>
>  Tom>     Use the specified value of extra_float_digits when dumping
>  Tom>     floating-point data, rather than pg_dump's default.
>
>  Tom> I don't think we need to go into detail about when/why you might
>  Tom> want to use it; I can see people thinking of their own reasons.
>
> It should probably at least say that (a) the default is to dump with the
> maximum available precision, and at least hint at (b) that ordinary use
> doesn't require the option. We shouldn't mislead people into thinking
> they need it when they don't.
>

Would you like to reformulate this, then?


cheers


andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

18 February 2019, 02:42:08

>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

 Tom> I think we have to document it, but it can be fairly terse perhaps.

 Tom> Use the specified value of extra_float_digits when dumping
 Tom> floating-point data, rather than pg_dump's default.

 Tom> I don't think we need to go into detail about when/why you might
 Tom> want to use it; I can see people thinking of their own reasons.

 >> It should probably at least say that (a) the default is to dump with the
 >> maximum available precision, and at least hint at (b) that ordinary use
 >> doesn't require the option. We shouldn't mislead people into thinking
 >> they need it when they don't.

 Andrew> Would you like to reformulate this, then?

How about:

    Use the specified value of extra_float_digits when dumping
    floating-point data, instead of the maximum available precision.
    Routine dumps made for backup purposes should not use this option.

-- 
Andrew (irc:RhodiumToad)

Re: Ryu floating point output patch

From

Tom Lane

Date:

18 February 2019, 02:49:04

Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>  Andrew> Would you like to reformulate this, then?

> How about:

>     Use the specified value of extra_float_digits when dumping
>     floating-point data, instead of the maximum available precision.
>     Routine dumps made for backup purposes should not use this option.

OK by me.

            regards, tom lane

Re: Ryu floating point output patch

From

Andrew Dunstan

Date:

18 February 2019, 16:56:19

On 2/17/19 6:49 PM, Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:
>>  Andrew> Would you like to reformulate this, then?
>> How about:
>>     Use the specified value of extra_float_digits when dumping
>>     floating-point data, instead of the maximum available precision.
>>     Routine dumps made for backup purposes should not use this option.
> OK by me.
>
>             


Done.


cheers


andrew


-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Ryu floating point output patch

From

Andrew Gierth

Date:

21 February 2019, 03:10:14

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> typedef char bool;
 Tom> #endif

 Tom> I believe that we could suppress these warnings by changing that
 Tom> last to be

 Tom> typedef unsigned char bool;

After testing this locally with a hand-tweaked configure result (since I
don't have hardware that triggers it) it all looked ok, so I've pushed
it and at least locust seems to be happy with the result.

-- 
Andrew (irc:RhodiumToad)