Re: Poorly-thought-out handling of double variables in pgbench - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: Poorly-thought-out handling of double variables in pgbench
Date
Msg-id alpine.DEB.2.10.1605052126560.30701@sto
Whole thread Raw
In response to Poorly-thought-out handling of double variables in pgbench  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Poorly-thought-out handling of double variables in pgbench
List pgsql-hackers
Hello Tom,

> While testing 7a622b273 I happened to notice this:
>
> \set x greatest(3, 2, 4.9)
> create table mytab (x numeric);
> insert into mytab values(:x);
>          x
> ----------------------
> 4.900000000000000355
>
> The reason for that is that the result of a "double" calculation
> is coerced to text like this:
>
>                sprintf(res, "%.18e", result.u.dval);
>
> apparently on the theory that this will result in being able to convert it
> back to double with no loss of low-order bits.

Yep. I did that for this very reason.

ISTM that the alternative was to reimplement the variable stuff to add & 
manage types, but that induces significant changes, and such change are 
likely to be rejected, or at least to raise long debates and questions, 
make the patch harder to check... so I just avoided them.

> But of course the last two or three digits of such output are pretty 
> much garbage to the naked eye.

Sure, it is binary to decimal conversion noise.

> Then what gets interpolated as the variable value is something like 
> '4.900000000000000355e+00'.
>
> I think this is a very poor decision; it's going to cause surprises and 
> probably bug reports.

> Moreover, if we were testing this behavior in the buildfarm (which we 
> are not, but only because the test coverage for pgbench is so shoddy),

I think that "none" is the right word?

> we would be seeing failures, because those low-order digits are going to 
> be platform dependent.

Possibly.

> The only value of doing it like this would be if people chained multiple
> floating-point calculations in successive pgbench \set commands and needed
> full precision to be retained from one \set to the next ... but how likely
> is that?

The constraint for me is that a stored double should not lose precision. 
Any solution which achieves this goal is fine with me.

> A minimum-change fix would be to print only DBL_DIG digits here.

Probably 15, but there is some rounding implied I think (?). I'm not that 
sure of DBL_DIG+1, so I put 18 to be safer. I did not envision that 
someone would store that in a NUMERIC:-) Your example would work fine with 
a DOUBLE PRECISION attribute.

> A better answer, perhaps, would be to store double-valued variables in 
> double format to begin with, coercing to text only when and if the value 
> is interpolated into a string.

Yep, but that was yet more changes for a limited benefit and would have 
increase the probability that the patch would have been rejected.

> That's probably a bigger change than we want to be putting in right now, 
> though I'm a bit tempted to go try it.

> Thoughts?

My 0.02€:

I definitely agree that the text variable solution is pretty ugly, but it 
was the minimum change solution, and I do not have much time available.

I do not think that rounding values is desirable, on principle.

Now doubles are only really there because random_*() functions need one 
double argument, otherwise only integers make much sense as keys to select 
tuples in a performance bench. So this is not a key feature, just a side 
effect of having more realistic non uniform random generators and 
expressions. In an early submission I did not bother to include double 
variables because I cannot see much actual use to them, and there were 
implementation issues given how integer variables were managed.

Also I think that the above NUMERIC/DOUBLE case is a little contrived, so 
I would wait for someone to actually complain with a good use case before 
spending any significant time to change how variables are stored in 
pgbench.

I would expect a long review process for such a patch if I submitted it, 
because there are details and the benefit is quite limited.

> BTW, just to add insult to injury, the debug() function prints double
> values with "%.f", which evidently had even less thought put into it.

AFAICR the thought I put into it is that it is definitely useful to be 
able to get a simple trace for debugging purpose. The precision of this 
debug output is not very important, so I probably just put the simplest 
possible format.

> That one should definitely be %g with DBL_DIG precision.

That would be fine as well.

-- 
Fabien.

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)
Next
From: Andres Freund
Date:
Subject: Re: quickdie doing memory allocations (was atomic pin/unpin causing errors)