Home > mailing lists

Re: Performance improvements for src/port/snprintf.c - Mailing list pgsql-hackers

From	Andrew Gierth
Subject	Re: Performance improvements for src/port/snprintf.c
Date	October 7, 2018 14:59:18
Msg-id	878t3a9w7r.fsf@news-spur.riddles.org.uk Whole thread Raw
In response to	Re: Performance improvements for src/port/snprintf.c (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Performance improvements for src/port/snprintf.c Re: Performance improvements for src/port/snprintf.c
List	pgsql-hackers

Tree view

>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> Now, "shortest value that converts back exactly" is technically
 Tom> cool, but I am not sure that it solves any real-world problem that
 Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

 Tom> I'm also worried that introducing it would result in complaints like
 Tom> https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

 Tom> As for #2, my *very* short once-over of the code led me to think
 Tom> that the speed win comes mostly from use of wide integer
 Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

 Tom> and maybe from throwing big lookup tables at the problem. If so,
 Tom> it's very likely possible that we could adopt those techniques
 Tom> without necessarily buying into the shortest-exact rule for how
 Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89999999999999991 for 1.9 when
extra_float_digits=3).

 >> One option would be to stick with snprintf if extra_float_digits is
 >> less than 0 (or less than or equal to 0 and make the default 1) and
 >> use ryu otherwise, so that the option to get rounded floats is still
 >> there. (Apparently some people do use negative values of
 >> extra_float_digits.) Unlike other format-changing GUCs, this one
 >> already exists and is already used by people who want more or less
 >> precision, including by pg_dump where rount-trip conversion is the
 >> requirement.

 Tom> I wouldn't necessarily object to having some value of
 Tom> extra_float_digits that selects the shortest-exact rule, but I'm
 Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

-- 
Andrew (irc:RhodiumToad)

pgsql-hackers by date:

From: Michael Paquier
Date: 07 October 2018, 12:37:44
Subject: Re: Unclear error message

From: Thomas Munro
Date: 07 October 2018, 15:17:31
Subject: DSM segment handle generation in background workers

Re: Performance improvements for src/port/snprintf.c - Mailing list pgsql-hackers

Previous

Next