Re: Performance improvements for src/port/snprintf.c - Mailing list pgsql-hackers

From Andrew Gierth
Subject Re: Performance improvements for src/port/snprintf.c
Date
Msg-id 878t3a9w7r.fsf@news-spur.riddles.org.uk
Whole thread Raw
In response to Re: Performance improvements for src/port/snprintf.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Performance improvements for src/port/snprintf.c  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Performance improvements for src/port/snprintf.c  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 Tom> Now, "shortest value that converts back exactly" is technically
 Tom> cool, but I am not sure that it solves any real-world problem that
 Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

 Tom> I'm also worried that introducing it would result in complaints like
 Tom> https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

 Tom> As for #2, my *very* short once-over of the code led me to think
 Tom> that the speed win comes mostly from use of wide integer
 Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

 Tom> and maybe from throwing big lookup tables at the problem. If so,
 Tom> it's very likely possible that we could adopt those techniques
 Tom> without necessarily buying into the shortest-exact rule for how
 Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89999999999999991 for 1.9 when
extra_float_digits=3).

 >> One option would be to stick with snprintf if extra_float_digits is
 >> less than 0 (or less than or equal to 0 and make the default 1) and
 >> use ryu otherwise, so that the option to get rounded floats is still
 >> there. (Apparently some people do use negative values of
 >> extra_float_digits.) Unlike other format-changing GUCs, this one
 >> already exists and is already used by people who want more or less
 >> precision, including by pg_dump where rount-trip conversion is the
 >> requirement.

 Tom> I wouldn't necessarily object to having some value of
 Tom> extra_float_digits that selects the shortest-exact rule, but I'm
 Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

-- 
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Unclear error message
Next
From: Thomas Munro
Date:
Subject: DSM segment handle generation in background workers