Re: [HACKERS] Making clausesel.c Smarter - Mailing list pgsql-hackers

From Claudio Freire
Subject Re: [HACKERS] Making clausesel.c Smarter
Date
Msg-id CAGTBQpbbHV0f8JgRr43dVNS559Ve-OS8tCt6aBVcxvFmq52MiA@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Making clausesel.c Smarter  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [HACKERS] Making clausesel.c Smarter  (David Rowley <david.rowley@2ndquadrant.com>)
List pgsql-hackers
On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> If you ask me, I'd just leave:
>>>
>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>> DEFAULT_INEQ_SEL)
>>> + {
>>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>> implies we have no stats */
>>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>>> + }
>>> + else
>>> + {
>>> ...
>>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>> +   notnullsel = 1.0 - nullsel;
>>> +
>>> +   /* Adjust for double-exclusion of NULLs */
>>> +   s2 += nullsel;
>>> +   if (s2 <= 0.0) {
>>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>> +      {
>>> +           /* Most likely contradictory clauses found */
>>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>> +      }
>>> +      else
>>> +      {
>>> +           /* Could be a rounding error */
>>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>> +      }
>>> +   }
>>> + }
>>>
>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>> cautious) estimation of the amount of rounding error that could be
>>> there with 32-bit floats.
>>>
>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>> an estimation based on stats, maybe approximate, but one that makes
>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>> negative values are either sign of a contradiction or rounding error.
>>> The previous code left the possibility of negatives coming out of
>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>> but that doesn't seem possible coming out of scalarineqsel.
>>>
>>> But I'd like it if Tom could comment on that, since he's the one that
>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>> in 2004).
>>>
>>> Barring that, I'd just change the
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>
>>> To
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>
>>> Which makes a lot more sense.
>>
>> I think to fix this properly the selfuncs would have to return the
>> estimate, and nullfrac separately, so the nullfrac could just be
>> applied once per expression. There's already hacks in there to get rid
>> of the double null filtering. I'm not proposing that as something for
>> this patch. It would be pretty invasive to change this.
>
> There's no need, you can compute the nullfrac with nulltestsel. While
> there's some rework involved, it's not a big deal (it just reads the
> stats tuple), and it's a clean solution.

I'm marking this Waiting on Author until we figure out what to do with
those issues (the null issue quoted above, and the quadratic behavior)



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
Next
From: Andres Freund
Date:
Subject: Re: [HACKERS] [COMMITTERS] pgsql: Increase parallel bitmap scan test coverage.