Re: More stable query plans via more predictable column statistics - Mailing list pgsql-hackers

From Shulgin, Oleksandr
Subject Re: More stable query plans via more predictable column statistics
Date
Msg-id CACACo5RA9D+GZ3AjNLqTLRJtO4_L+Z8vAoiTXD60T2_DjLjePQ@mail.gmail.com
Whole thread Raw
In response to Re: More stable query plans via more predictable column statistics  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: More stable query plans via more predictable column statistics
List pgsql-hackers
On Wed, Mar 9, 2016 at 1:33 PM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
Hi,

On Wed, 2016-03-09 at 11:23 +0100, Shulgin, Oleksandr wrote:
> On Tue, Mar 8, 2016 at 8:16 PM, Alvaro Herrera
> <alvherre@2ndquadrant.com> wrote:
>
>         Also, I can't quite figure out why the "else" now in line 2131
>         is now "else if track_cnt != 0".  What happens if track_cnt is
>         zero?
>         The comment above the "if" block doesn't provide any guidance.
>
>
> It is there only to avoid potentially dividing zero by zero when
> calculating avgcount (which will not be used after that anyway).  I
> agree it deserves a comment.

That definitely deserves a comment. It's not immediately clear why
(track_cnt != 0) would prevent division by zero in the code. The only
way such error could happen is if ndistinct==0, because that's the
actual denominator. Which means this

    ndistinct = ndistinct * sample_cnt

would have to evaluate to 0. But ndistinct==0 can't happen as we're in
the (nonnull_cnt > 0) branch, and that guarantees (standistinct != 0).

Thus the only possibility seems to be (nonnull_cnt==toowide_cnt). Why
not to use this condition instead?

Yes, I now recall that my actual concern was that sample_cnt may calculate to 0 due to the latest condition above, but that also implies track_cnt == 0, and then we have a for loop there which will not run at all due to this, so I figured we can avoid calculating avgcount and running the loop altogether with that check.  I'm not opposed to changing the condition if that makes the code easier to understand (or dropping it altogether if calculating 0/0 is believed to be harmless anyway).

--
Alex

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: More stable query plans via more predictable column statistics
Next
From: Tom Lane
Date:
Subject: Re: More stable query plans via more predictable column statistics