Re: improving GROUP BY estimation - Mailing list pgsql-hackers

From Tom Lane
Subject Re: improving GROUP BY estimation
Date
Msg-id 20518.1459451903@sss.pgh.pa.us
Whole thread Raw
In response to Re: improving GROUP BY estimation  (Dean Rasheed <dean.a.rasheed@gmail.com>)
Responses Re: improving GROUP BY estimation
Re: improving GROUP BY estimation
List pgsql-hackers
Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On 30 March 2016 at 14:03, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>> Attached is v4 of the patch

> Thanks, I think this is good to go, except that I think we need to use
> pow() rather than powl() because AIUI powl() is new to C99, and so
> won't necessarily be available on all supported platforms. I don't
> think we need worry about loss of precision, since that would only be
> an issue if rel->rows / rel->tuples were smaller than maybe 10^-14 or
> so, and it seems unlikely we'll get anywhere near that any time soon.

I took a quick look.  I concur with using pow() not powl(); the latter
is not in SUS v2 which is our baseline portability expectation, and in
fact there is *noplace* where we expect long double to work.  Moreover,
I don't believe that any of the estimates we're working with are so
accurate that a double-width power result would be a useful improvement.

Also, I wonder if it'd be a good idea to provide a guard against division
by zero --- we know rel->tuples > 0 at this point, but I'm less sure that
reldistinct can't be zero.  In the same vein, I'm worried about the first
argument of pow() being slightly negative due to roundoff error, leading
to a NaN result.

Maybe we should also consider clamping the final reldistinct estimate to
an integer with clamp_row_est().  The existing code doesn't do that but
it seems like a good idea on general principles.

Another minor gripe is the use of a random URL as justification.  This
code will still be around when that URL exists nowhere but the Wayback
Machine.  Can't we find a more formal citation to use?
        regards, tom lane



pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Recovery test failure for recovery_min_apply_delay on hamster
Next
From: Paul Ramsey
Date:
Subject: Re: Parallel Queries and PostGIS