Thread: [HACKERS] Making clausesel.c Smarter

[HACKERS] Making clausesel.c Smarter

From
David Rowley
Date:
I've stumbled over an interesting query out in the wild where the
query was testing a nullable timestamptz column was earlier than some
point in time, and also checking the column IS NOT NULL.

The use pattern here was that records which required processing had
this timestamp set to something other than NULL, a worker would come
along and process those, and UPDATE the record to NULL to mark the
fact that it was now processed. So what we are left with was a table
with a small number of rows with a value in this timestamp column, and
an ever-growing number of rows with a NULL value.

A highly simplified version of the query was checking for records that
required processing before some date, say:

SELECT * FROM a WHERE ts < '2017-02-24' AND ts IS NOT NULL;

Of course, the ts IS NOT NULL is not required here, but I can
understand how someone might make the mistake of adding this. The
simple solution to the problem was to have that null test removed, but
that seemingly innocent little mistake caused some pain due to very
slow running queries which held on to a nested loop plan 33 times
longer than it should have been doing. Basically what was happening
here is that clauselist_selectivity() was applying the selectivity
with the ~0.97 null_fact from pg_statistic over the top of the already
applied estimate on the number of rows below the constant timestamp.

Since the diagnosis of this problem was not instant, and some amount
of pain was suffered before the solution was found,  I wondered if it
might be worth smartening up the planner a bit in this area.

We're already pretty good at handling conditions like: SELECT * FROM a
WHERE x < 10 and x < 1; where we'll effectively ignore the x < 10
estimate since x < 1 is more restrictive, so if we just build on that
ability a bit we could get enough code to cover the above case.

I've attached a draft patch which improves the situation here.

Given the test case:

create table ts (ts timestamptz);
insert into ts select case when x%1000=0 then '2017-01-01'::date +
(x::text || ' sec')::interval else null end from
generate_series(1,1000000)  x ( x );
analyze ts;

explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is not null;

With the patch we get:

                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15938.83..15938.84 rows=1 width=8) (actual
time=101.003..101.003 rows=1 loops=1)
   ->  Seq Scan on ts  (cost=0.00..15937.00 rows=733 width=0) (actual
time=0.184..100.868 rows=1000 loops=1)
         Filter: ((ts IS NOT NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
         Rows Removed by Filter: 999000
 Planning time: 0.153 ms
 Execution time: 101.063 ms

Whereas master gives us:

                                                QUERY PLAN
-----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15937.00..15937.01 rows=1 width=8) (actual
time=119.256..119.256 rows=1 loops=1)
   ->  Seq Scan on ts  (cost=0.00..15937.00 rows=1 width=0) (actual
time=0.172..119.062 rows=1000 loops=1)
         Filter: ((ts IS NOT NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
         Rows Removed by Filter: 999000
 Planning time: 0.851 ms
 Execution time: 119.401 ms

A side effect of this is that we're now able to better detect
impossible cases such as:

postgres=# explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is null;
                                                QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15937.00..15937.01 rows=1 width=8) (actual
time=135.012..135.012 rows=1 loops=1)
   ->  Seq Scan on ts  (cost=0.00..15937.00 rows=1 width=0) (actual
time=135.007..135.007 rows=0 loops=1)
         Filter: ((ts IS NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
         Rows Removed by Filter: 1000000
 Planning time: 0.067 ms
 Execution time: 135.050 ms
(6 rows)

Master is not able to see the impossibility of this case:

postgres=# explain analyze select count(*) from ts where ts <=
'2017-02-01'::timestamptz and ts is null;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15938.83..15938.84 rows=1 width=8) (actual
time=131.681..131.681 rows=1 loops=1)
   ->  Seq Scan on ts  (cost=0.00..15937.00 rows=733 width=0) (actual
time=131.676..131.676 rows=0 loops=1)
         Filter: ((ts IS NULL) AND (ts <= '2017-02-01
00:00:00+13'::timestamp with time zone))
         Rows Removed by Filter: 1000000
 Planning time: 0.090 ms
 Execution time: 131.719 ms
(6 rows)

Now one thing I was unsure of in the patch was this "Other clauses"
concept that I've come up with. In the patch we have:

default:
addCachedSelectivityOtherClause(&cslist, var, s2);
break;

I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
imagination here won't stretch far enough to imagine a OpExpr which
passes with a NULL operand. If it did my logic is broken, and if
that's the case then I think limiting "Others" to mean F_EQSEL and
F_NEQSEL would be the workaround.

Before I do any more work on this, I'd like to know, is this something
that we might want to fix?

It would be good to improve the situation here in the back branches
too, but I'm thinking that the attached might be a little invasive for
that?

Thoughts?

Regards

David Rowley


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Making clausesel.c Smarter

From
Robert Haas
Date:
On Fri, Feb 24, 2017 at 3:30 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> It would be good to improve the situation here in the back branches
> too, but I'm thinking that the attached might be a little invasive for
> that?

My experience has been that customers - at least EnterpriseDB
customers - do not appreciate plan changes when they upgrade to a new
minor release.  Most people have production installations that are
basically working; if not, they wouldn't be in production.  Even if
they're getting a good plan only by some happy accident, they're still
getting it, and a change can cause a good plan to flop over into a bad
plan, which can easily turn into a service outage.  The people who had
a bad plan and get flipped to a good plan will be happy once they
realize what has changed, of course, but that's not really enough to
make up from the panicked calls from customers whose stuff falls over
when they try to apply the critical security update.

I think the basic think you are trying to accomplish is sensible,
though.  I haven't reviewed the patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Making clausesel.c Smarter

From
David Steele
Date:
On 2/26/17 1:41 AM, Robert Haas wrote:
> On Fri, Feb 24, 2017 at 3:30 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> It would be good to improve the situation here in the back branches
>> too, but I'm thinking that the attached might be a little invasive for
>> that?
> 
> My experience has been that customers - at least EnterpriseDB
> customers - do not appreciate plan changes when they upgrade to a new
> minor release.  Most people have production installations that are
> basically working; if not, they wouldn't be in production.  Even if
> they're getting a good plan only by some happy accident, they're still
> getting it, and a change can cause a good plan to flop over into a bad
> plan, which can easily turn into a service outage.  The people who had
> a bad plan and get flipped to a good plan will be happy once they
> realize what has changed, of course, but that's not really enough to
> make up from the panicked calls from customers whose stuff falls over
> when they try to apply the critical security update.
> 
> I think the basic think you are trying to accomplish is sensible,
> though.  I haven't reviewed the patch.

This patch applies cleanly and compiles at cccbdde.  I agree with Robert
that back patching would likely not be a good idea.

Anyone familiar with the planner available to review this patch?

It also seems like this would be a (relatively) simple patch to start
with for anyone that is interested in learning more about the planner.

-- 
-David
david@pgmasters.net



Re: [HACKERS] Making clausesel.c Smarter

From
Tom Lane
Date:
David Steele <david@pgmasters.net> writes:
> Anyone familiar with the planner available to review this patch?

FWIW, it's on my radar but I don't expect to get to it real soon,
as there's other stuff I deem higher priority.  In the meantime,
I don't want to stand in the way of someone else looking at it.
        regards, tom lane



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> I've stumbled over an interesting query out in the wild where the
> query was testing a nullable timestamptz column was earlier than some
> point in time, and also checking the column IS NOT NULL.
>
> The use pattern here was that records which required processing had
> this timestamp set to something other than NULL, a worker would come
> along and process those, and UPDATE the record to NULL to mark the
> fact that it was now processed. So what we are left with was a table
> with a small number of rows with a value in this timestamp column, and
> an ever-growing number of rows with a NULL value.
>
> A highly simplified version of the query was checking for records that
> required processing before some date, say:
>
> SELECT * FROM a WHERE ts < '2017-02-24' AND ts IS NOT NULL;
>
> Of course, the ts IS NOT NULL is not required here, but I can
> understand how someone might make the mistake of adding this. The
> simple solution to the problem was to have that null test removed, but
> that seemingly innocent little mistake caused some pain due to very
> slow running queries which held on to a nested loop plan 33 times
> longer than it should have been doing. Basically what was happening
> here is that clauselist_selectivity() was applying the selectivity
> with the ~0.97 null_fact from pg_statistic over the top of the already
> applied estimate on the number of rows below the constant timestamp.
>
> Since the diagnosis of this problem was not instant, and some amount
> of pain was suffered before the solution was found,  I wondered if it
> might be worth smartening up the planner a bit in this area.
>
> We're already pretty good at handling conditions like: SELECT * FROM a
> WHERE x < 10 and x < 1; where we'll effectively ignore the x < 10
> estimate since x < 1 is more restrictive, so if we just build on that
> ability a bit we could get enough code to cover the above case.
>
> I've attached a draft patch which improves the situation here.

I thought to take a quick look at this patch. I'll probably take a
deeper look later, but some initial comments:

-typedef struct RangeQueryClause
+typedef struct CachedSelectivityClause{
-    struct RangeQueryClause *next;        /* next in linked list */
+    struct CachedSelectivityClause *next;        /* next in linked list */    Node       *var;            /* The
commonvariable of the clauses */
 
-    bool        have_lobound;    /* found a low-bound clause yet? */
-    bool        have_hibound;    /* found a high-bound clause yet? */
+    int            selmask;        /* Bitmask of which sel types are stored */    Selectivity lobound;        /*
Selectivityof a var > something clause */    Selectivity hibound;        /* Selectivity of a var < something clause */
 
-} RangeQueryClause;


As seems customary in other code, perhaps you should define some
HAS_X() macros for dealing with the selmask.

Something like

#SELMASK_HAS_LOBOUND(selmask) (((selmask) & CACHESEL_LOBOUND) != 0)
etc..

+static void addCachedSelectivityRangeVar(CachedSelectivityClause
**cslist, Node *var,               bool varonleft, bool isLTsel, Selectivity s2);

You're changing clause -> var throughout the code when dealing with
nodes, but looking at their use, they hold neither. All those
addCachedSelectivity functions are usually passed expression nodes. If
you're renaming, perhaps you should rename to "expr".


On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Now one thing I was unsure of in the patch was this "Other clauses"
> concept that I've come up with. In the patch we have:
>
> default:
> addCachedSelectivityOtherClause(&cslist, var, s2);
> break;
>
> I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
> imagination here won't stretch far enough to imagine a OpExpr which
> passes with a NULL operand. If it did my logic is broken, and if
> that's the case then I think limiting "Others" to mean F_EQSEL and
> F_NEQSEL would be the workaround.

While not specifically applicable only to "Others", something needs
consideration here:

User-defined functions can be nonstrict. An OpExpr involving such a
user-defined function could possibly pass on null.

I would suggest avoiding making the assumption that it doesn't unless
the operator is strict.

One could argue that such an operator should provide its own
selectivity estimator, but the strictness check shouldn't be too
difficult to add, and I think it's a better choice.

So you'd have to check that:

default:
if (op_strict(expr->opno) && func_strict(expr->opfuncid))   addCachedSelectivityOtherClause(&cslist, var, s2);
else   s1 = s1 * s2;
break;

So, I went ahead and did that.

Considering this setup:

createdb pgtest
cat <<SQL | psql pgtest
CREATE TABLE testdata AS select null::integer as value from
generate_series(1, 100000);
VACUUM FREEZE testdata;
CREATE FUNCTION maskcheck(integer, integer) RETURNS boolean AS \$\$
BEGIN RETURN (coalesce(\$1,0) & coalesce(\$2,0)) = 0 ; END \$\$
LANGUAGE PLPGSQL;
CREATE OPERATOR && ( PROCEDURE = maskcheck, LEFTARG = integer,
RIGHTARG = integer, COMMUTATOR = && );
SQL

The following query:

EXPLAIN ANALYZE SELECT * FROM testdata WHERE value && 1 AND value is null;

The original patch yields:
                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------Seq Scan on
testdata (cost=0.00..26344.00 rows=1 width=4) (actual
 
time=0.367..84.861 rows=100000 loops=1)  Filter: ((value IS NULL) AND (value && 1))Planning time: 0.371 msExecution
time:88.156 ms
 
(4 rows)


After the change, however, it yields:
                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------Seq
Scanon testdata  (cost=0.00..26344.00 rows=50000 width=4)
 
(actual time=0.207..89.668 rows=100000 loops=1)  Filter: ((value IS NULL) AND (value && 1))Planning time: 0.186
msExecutiontime: 92.978 ms
 
(4 rows)


Which seems much better.



Re: Making clausesel.c Smarter

From
David Rowley
Date:
On 1 April 2017 at 16:42, Claudio Freire <klaussfreire@gmail.com> wrote:
>
> I thought to take a quick look at this patch. I'll probably take a
> deeper look later, but some initial comments:
>
> -typedef struct RangeQueryClause
> +typedef struct CachedSelectivityClause
>  {
> -    struct RangeQueryClause *next;        /* next in linked list */
> +    struct CachedSelectivityClause *next;        /* next in linked list */
>      Node       *var;            /* The common variable of the clauses */
> -    bool        have_lobound;    /* found a low-bound clause yet? */
> -    bool        have_hibound;    /* found a high-bound clause yet? */
> +    int            selmask;        /* Bitmask of which sel types are stored */
>      Selectivity lobound;        /* Selectivity of a var > something clause */
>      Selectivity hibound;        /* Selectivity of a var < something clause */
> -} RangeQueryClause;
>
>
> As seems customary in other code, perhaps you should define some
> HAS_X() macros for dealing with the selmask.
>
> Something like
>
> #SELMASK_HAS_LOBOUND(selmask) (((selmask) & CACHESEL_LOBOUND) != 0)
> etc..

Thanks for looking at the patch.

The problem with that is that some of the tests are doing more than
just checking a single flag is enabled.

Consider:

if ((cslist->selmask & (CACHESEL_LOBOUND | CACHESEL_HIBOUND)) ==
(CACHESEL_LOBOUND | CACHESEL_HIBOUND))

Of course, those could be Macro'd too, but it seems I might need to be
inventive with the names, or the meaning might not be all that clear.

It does not seem overly complex and it is isolated to a single file,
so perhaps it might be OK as is?

>
> +static void addCachedSelectivityRangeVar(CachedSelectivityClause
> **cslist, Node *var,
>                 bool varonleft, bool isLTsel, Selectivity s2);
>
> You're changing clause -> var throughout the code when dealing with
> nodes, but looking at their use, they hold neither. All those
> addCachedSelectivity functions are usually passed expression nodes. If
> you're renaming, perhaps you should rename to "expr".

hmm, this is true. I kind of followed the lead of the name of the
variable in the old RangeQueryClause struct. I have changed the names
of these to be expr in the attached, but I didn't change the name of
the Var in the new CachedSelectivityClause struct. It seems like a
change not related to this patch.

Do you think I should change that too?

>
> On Fri, Feb 24, 2017 at 7:00 AM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
> > Now one thing I was unsure of in the patch was this "Other clauses"
> > concept that I've come up with. In the patch we have:
> >
> > default:
> > addCachedSelectivityOtherClause(&cslist, var, s2);
> > break;
> >
> > I was unsure if this should only apply to F_EQSEL and F_NEQSEL. My
> > imagination here won't stretch far enough to imagine a OpExpr which
> > passes with a NULL operand. If it did my logic is broken, and if
> > that's the case then I think limiting "Others" to mean F_EQSEL and
> > F_NEQSEL would be the workaround.
>
> While not specifically applicable only to "Others", something needs
> consideration here:
>
> User-defined functions can be nonstrict. An OpExpr involving such a
> user-defined function could possibly pass on null.

Good point. I overlooked this.

> I would suggest avoiding making the assumption that it doesn't unless
> the operator is strict.
>
> One could argue that such an operator should provide its own
> selectivity estimator, but the strictness check shouldn't be too
> difficult to add, and I think it's a better choice.
>
> So you'd have to check that:
>
> default:
> if (op_strict(expr->opno) && func_strict(expr->opfuncid))
>     addCachedSelectivityOtherClause(&cslist, var, s2);
> else
>     s1 = s1 * s2;
> break;
>

I've changed it to something along those lines. I don't think the
func_strict here is required, though, so I've gone with just
op_strict().

Updated patch attached.

Thanks for reviewing it.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

Re: Making clausesel.c Smarter

From
Andres Freund
Date:
On 2017-04-03 20:59:42 +1200, David Rowley wrote:
> Updated patch attached.
> 
> Thanks for reviewing it.

Given the time in the release cycle I'm afraid that this it's too late
to get this into v10.  Does anybody disagree?  If not, it should be
moved to the next CF.

Greetings,

Andres Freund



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Mon, Apr 3, 2017 at 5:24 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-03 20:59:42 +1200, David Rowley wrote:
>> Updated patch attached.
>>
>> Thanks for reviewing it.
>
> Given the time in the release cycle I'm afraid that this it's too late
> to get this into v10.  Does anybody disagree?  If not, it should be
> moved to the next CF.
>
> Greetings,
>
> Andres Freund

While personally I'd like to see this patch make it (I got bitten by
this more than once), I can't argue with that logic with the feature
freeze almost upon us, especially given the stage the review is at
(ie: just beginning).

I'm still going to review it though.



Re: Making clausesel.c Smarter

From
David Rowley
Date:
On 4 April 2017 at 08:24, Andres Freund <andres@anarazel.de> wrote:
> On 2017-04-03 20:59:42 +1200, David Rowley wrote:
>> Updated patch attached.
>>
>> Thanks for reviewing it.
>
> Given the time in the release cycle I'm afraid that this it's too late
> to get this into v10.  Does anybody disagree?  If not, it should be
> moved to the next CF.

I strongly disagree. The time in the release cycle is the final
commitfest. This is when patches are commited to the repository. The
exception to this is that no large invasive patches should arrive new
in the final commitfest. This is not one of those.

Tom has already mentioned he'd like to look at this. If you're not
willing, then please just don't look at it, but please also don't
remove other peoples opportunity for doing so.

Please explain your logic for thinking otherwise. Have I somehow
misunderstood what the 1 week extension means?


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Making clausesel.c Smarter

From
Andres Freund
Date:
On 2017-04-04 09:22:07 +1200, David Rowley wrote:
> On 4 April 2017 at 08:24, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-04-03 20:59:42 +1200, David Rowley wrote:
> >> Updated patch attached.
> >>
> >> Thanks for reviewing it.
> >
> > Given the time in the release cycle I'm afraid that this it's too late
> > to get this into v10.  Does anybody disagree?  If not, it should be
> > moved to the next CF.
> 
> I strongly disagree. The time in the release cycle is the final
> commitfest. This is when patches are commited to the repository. The
> exception to this is that no large invasive patches should arrive new
> in the final commitfest. This is not one of those.

Right.


> Tom has already mentioned he'd like to look at this.

He also said that it'll be a while, and he hadn't yet by the time of the
original freeze date.


> Please explain your logic for thinking otherwise. Have I somehow
> misunderstood what the 1 week extension means?

Well, we have a lot of pending work, some of that has been pending for
years essentially.  And Tom hadn't looked at it yet, and he wasn't at
pgconf.us, so he wasn't actually delayed due to that...

I'm fine with not punting right now, but we need to reduce the number of
patches and focus on stuff that's realistic to get in. There's 4 days
and ~60 open CF entries.  Please help whip other patches into shape...

- Andres



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Mon, Apr 3, 2017 at 6:22 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 4 April 2017 at 08:24, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-04-03 20:59:42 +1200, David Rowley wrote:
>>> Updated patch attached.
>>>
>>> Thanks for reviewing it.
>>
>> Given the time in the release cycle I'm afraid that this it's too late
>> to get this into v10.  Does anybody disagree?  If not, it should be
>> moved to the next CF.
>
> I strongly disagree. The time in the release cycle is the final
> commitfest. This is when patches are commited to the repository. The
> exception to this is that no large invasive patches should arrive new
> in the final commitfest. This is not one of those.

Selectivity misestimations can have a huge impact on query
performance, and that could be a big deal if there's a big regression
on some queries.

But, the beta period could be enough to detect any issues there. I'll
leave that decision to committers or commitfest managers I guess.

>
> Tom has already mentioned he'd like to look at this. If you're not
> willing, then please just don't look at it, but please also don't
> remove other peoples opportunity for doing so.
>
> Please explain your logic for thinking otherwise. Have I somehow
> misunderstood what the 1 week extension means?

But Tom hasn't yet, and even if he does, we'd have to be done with the
review within this week.

In any case, I intend to go on with the review later today, and at
first look the patch seems in good shape (though it'll take a deeper
look before I make that my official review). Most other patches in
need of review are a bit complex for my knowledge of the code base,
and I already started reviewing this one. We could perhaps be done by
the end of the week.

We could defer that decision to within 4 days. If we're done, we're
done. If we're not, move to next CF.



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Mon, Apr 3, 2017 at 5:59 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> +static void addCachedSelectivityRangeVar(CachedSelectivityClause
>> **cslist, Node *var,
>>                 bool varonleft, bool isLTsel, Selectivity s2);
>>
>> You're changing clause -> var throughout the code when dealing with
>> nodes, but looking at their use, they hold neither. All those
>> addCachedSelectivity functions are usually passed expression nodes. If
>> you're renaming, perhaps you should rename to "expr".
>
> hmm, this is true. I kind of followed the lead of the name of the
> variable in the old RangeQueryClause struct. I have changed the names
> of these to be expr in the attached, but I didn't change the name of
> the Var in the new CachedSelectivityClause struct. It seems like a
> change not related to this patch.
>
> Do you think I should change that too?

I'd prefer it if all occurrences of the concept were changed, to
maintain consistency.
That would include variables used to hold expressions that refer to
these as well, as in the case of:

+                Node       *var;
+
+                if (varonleft)
+                    var = leftexpr;
+                else
+                    var = rightexpr;
+


>> I would suggest avoiding making the assumption that it doesn't unless
>> the operator is strict.
>>
>> One could argue that such an operator should provide its own
>> selectivity estimator, but the strictness check shouldn't be too
>> difficult to add, and I think it's a better choice.
>>
>> So you'd have to check that:
>>
>> default:
>> if (op_strict(expr->opno) && func_strict(expr->opfuncid))
>>     addCachedSelectivityOtherClause(&cslist, var, s2);
>> else
>>     s1 = s1 * s2;
>> break;
>>
>
> I've changed it to something along those lines. I don't think the
> func_strict here is required, though, so I've gone with just
> op_strict().

Indeed, I realized right after typing it, I thought I had corrected it
before I hit send. Seems I hadn't. Good thing you noticed.

One last observation:

+        /*
+         * An IS NOT NULL test is a no-op if there's any other strict quals,
+         * so if that's the case, then we'll only apply this, otherwise we'll
+         * ignore it.
+         */
+        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
+            s1 *= cslist->notnullsel;

In some paths, the range selectivity estimation code punts and returns
DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
present, it should still be applied. It could make a big difference if
the selectivity for the nulltest is high.



Re: Making clausesel.c Smarter

From
David Rowley
Date:
On 4 April 2017 at 11:35, Claudio Freire <klaussfreire@gmail.com> wrote:
> I'd prefer it if all occurrences of the concept were changed, to
> maintain consistency.
> That would include variables used to hold expressions that refer to
> these as well, as in the case of:
>
> +                Node       *var;
> +
> +                if (varonleft)
> +                    var = leftexpr;
> +                else
> +                    var = rightexpr;
> +

Thanks for looking again.
I didn't change that one as there's already a variable named "expr" in
the scope. I thought changing that would mean code churn that I don't
really want to add to the patch. Someone else might come along and ask
me why I'm renaming this unrelated variable.  I kinda of think that if
it was var before when it meant expr, then it's not the business of
this patch to clean that up. I didn't rename the struct member, for
example, as the meaning is no different than before.

> One last observation:
>
> +        /*
> +         * An IS NOT NULL test is a no-op if there's any other strict quals,
> +         * so if that's the case, then we'll only apply this, otherwise we'll
> +         * ignore it.
> +         */
> +        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
> +            s1 *= cslist->notnullsel;
>
> In some paths, the range selectivity estimation code punts and returns
> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
> present, it should still be applied. It could make a big difference if
> the selectivity for the nulltest is high.

I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
to exists to estimate that properly. I don't think that needs done as
part of this patch. I'd rather limit the scope since "returned with
feedback" is already knocking at the door of this patch.



-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> One last observation:
>>
>> +        /*
>> +         * An IS NOT NULL test is a no-op if there's any other strict quals,
>> +         * so if that's the case, then we'll only apply this, otherwise we'll
>> +         * ignore it.
>> +         */
>> +        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
>> +            s1 *= cslist->notnullsel;
>>
>> In some paths, the range selectivity estimation code punts and returns
>> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
>> present, it should still be applied. It could make a big difference if
>> the selectivity for the nulltest is high.
>
> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
> to exists to estimate that properly. I don't think that needs done as
> part of this patch. I'd rather limit the scope since "returned with
> feedback" is already knocking at the door of this patch.

I agree, but this patch regresses for those cases if the user
compensated with a not null test, leaving little recourse, so I'd fix
it in this patch to avoid the regression.

Maybe you're right that the null fraction should be applied regardless
of whether there's an explicit null check, though.



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>>> One last observation:
>>>
>>> +        /*
>>> +         * An IS NOT NULL test is a no-op if there's any other strict quals,
>>> +         * so if that's the case, then we'll only apply this, otherwise we'll
>>> +         * ignore it.
>>> +         */
>>> +        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
>>> +            s1 *= cslist->notnullsel;
>>>
>>> In some paths, the range selectivity estimation code punts and returns
>>> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
>>> present, it should still be applied. It could make a big difference if
>>> the selectivity for the nulltest is high.
>>
>> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
>> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
>> to exists to estimate that properly. I don't think that needs done as
>> part of this patch. I'd rather limit the scope since "returned with
>> feedback" is already knocking at the door of this patch.
>
> I agree, but this patch regresses for those cases if the user
> compensated with a not null test, leaving little recourse, so I'd fix
> it in this patch to avoid the regression.
>
> Maybe you're right that the null fraction should be applied regardless
> of whether there's an explicit null check, though.

The more I read that code, the more I'm convinced there's no way to
get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
&&rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
clauses, a case the current code handles very poorly, returning
DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
give way-off estimates if the table had lots of nulls.

Say, consider if "value" was mostly null and you write:

SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
50 AND 60;

All other cases I can think of would end with either hibound or
lobound being equal to DEFAULT_INEQ_SEL.

It seems to me, doing a git blame, that the checks in the else branch
were left only as a precaution, one that seems unnecessary.

If you ask me, I'd just leave:

+ if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
DEFAULT_INEQ_SEL)
+ {
+     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
implies we have no stats */
+     s2 = DEFAULT_RANGE_INEQ_SEL;
+ }
+ else
+ {
...
+   s2 = rqlist->hibound + rqlist->lobound - 1.0;
+   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
+   notnullsel = 1.0 - nullsel;
+
+   /* Adjust for double-exclusion of NULLs */
+   s2 += nullsel;
+   if (s2 <= 0.0) {
+      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
+      {
+           /* Most likely contradictory clauses found */
+           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
+      }
+      else
+      {
+           /* Could be a rounding error */
+           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
+      }
+   }
+ }

Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
cautious) estimation of the amount of rounding error that could be
there with 32-bit floats.

The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
an estimation based on stats, maybe approximate, but one that makes
sense (ie: preserves the monotonicity of the CPF), and as such
negative values are either sign of a contradiction or rounding error.
The previous code left the possibility of negatives coming out of
default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
but that doesn't seem possible coming out of scalarineqsel.

But I'd like it if Tom could comment on that, since he's the one that
did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
in 2004).

Barring that, I'd just change the

s2 = DEFAULT_RANGE_INEQ_SEL;

To

s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;

Which makes a lot more sense.



Re: Making clausesel.c Smarter

From
David Rowley
Date:
On 3 April 2017 at 20:59, David Rowley <david.rowley@2ndquadrant.com> wrote:
> Updated patch attached.

I did a little benchmarking on this to learn how planning time is affected.

Master = 9fa6e08d4a16f9b0461743cff35781e16308c106
Patched = 9fa6e08d4a16f9b0461743cff35781e16308c106 +
smarter_clausesel_2017-04-03.patch

Config:

All standard

Setup:
create table abcd (a int, b int, c int, d int, arr int[]);

Tests:

Test1: explain select * from abcd where a = 1 and b = 1 and c = 1 and d = 1;
Test2: explain select * from abcd where a >= 1 and a <= 10 and b = 1 and c = 1;
Test3: explain select * from abcd a1 inner join abcd a2 on a1.a = a2.a
where a1.b = 1 and a2.b = 1 and a1.c = 1 and a2.c = 1;
Test4: (output of) select 'explain select * from abcd where ' ||
string_agg('arr[' || x::Text || '] = 1',' AND ') from
generate_series(0,999) x;
Test5: (output of) select 'explain select * from abcd where ' ||
string_agg('arr[' || x::Text || '] > 1',' AND ') from
generate_series(0,999) x;
Test6: explain select * from abcd;

Tests were performed running pgbench -T 60 -n

Raw Results:

Master

Test 1
tps = 6993.677785 (excluding connections establishing)
tps = 7072.796544 (excluding connections establishing)
tps = 6877.428676 (excluding connections establishing)

Test2
tps = 6698.456888 (excluding connections establishing)
tps = 7117.481643 (excluding connections establishing)
tps = 7053.070973 (excluding connections establishing)

Test 3
tps = 5137.229119 (excluding connections establishing)
tps = 5091.548602 (excluding connections establishing)
tps = 5175.683112 (excluding connections establishing)

Test 4
tps = 23.816356 (excluding connections establishing)
tps = 27.069840 (excluding connections establishing)
tps = 27.398995 (excluding connections establishing)

Test 5
tps = 54.209078 (excluding connections establishing)
tps = 54.022850 (excluding connections establishing)
tps = 54.076590 (excluding connections establishing)

Test 6
tps = 9328.134781 (excluding connections establishing)
tps = 9181.898271 (excluding connections establishing)
tps = 9402.993637 (excluding connections establishing)

Patched

Test 1
tps = 6560.160644 (excluding connections establishing)
tps = 6714.294344 (excluding connections establishing)
tps = 6901.381552 (excluding connections establishing)

Test 2
tps = 6971.106747 (excluding connections establishing)
tps = 6591.363565 (excluding connections establishing)
tps = 6921.572060 (excluding connections establishing)

Test 3
tps = 4784.606871 (excluding connections establishing)
tps = 4980.609259 (excluding connections establishing)
tps = 4954.237649 (excluding connections establishing)

Test 4
tps = 19.374184 (excluding connections establishing)
tps = 19.836406 (excluding connections establishing)
tps = 19.047484 (excluding connections establishing)

(perf of test 4)
+   40.20%    40.20%  postgres  postgres            [.] equal.part.18
+   29.57%     0.00%  postgres  [unknown]           [.] 0xffffffff00000017
+   25.15%     0.00%  postgres  [unknown]           [.] 0x00000000ffffffff
+   20.29%     0.00%  postgres  [unknown]           [k] 0000000000000000
+   12.36%    12.36%  postgres  postgres            [.] _equalList
+   11.25%     0.00%  postgres  [unknown]           [.] 0x0000558ed98fb148
+    8.53%     8.53%  postgres  postgres            [.] _equalArrayRef
+    6.22%     6.22%  postgres  postgres            [.] process_equivalence    5.93%     5.93%  postgres  postgres
     [.] get_eclass_for_sort_expr
 
+    2.75%     0.00%  postgres  [kernel.kallsyms]   [k] 0xffffffffaf06b772

Test 5
tps = 51.691502 (excluding connections establishing)
tps = 51.296724 (excluding connections establishing)
tps = 51.366643 (excluding connections establishing)

Test 6
tps = 9528.363239 (excluding connections establishing)
tps = 9478.202725 (excluding connections establishing)
tps = 9346.753395 (excluding connections establishing)

Result Comparison

Master median tps Patch median tps comparison
Test 1 6993.7 6714.3 104.16%
Test 2 7053.1 6921.6 101.90%
Test 3 5137.2 4954.2 103.69%
Test 4 27.1 19.4 139.72%
Test 5 54.1 51.4 105.28%
Test 6 9328.1 9478.2 98.42%

Results Analyzed:

Test 1 has caused planning to slow down 4.16%. There's quite a bit of
noise from the results, but I think this indicates there is some
overhead to having to add items to the cslist and searching the cslist
when new quals are seen.

Test 2 has a lesser slowdown than test 1, as this test will excercise
the existing rqlist caching in master too. Patched does a little more
work adding the equality condition to the list too.

Test 3 similar to test 1

Test 4 adds quite an overhead and causes 0.5 million comparisons to
find the expressions in the cslist.

Test 5 shows less overhead than test 4 since the Master code has to
also do the expression caching and searching.

Test 6 is a control test


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Making clausesel.c Smarter

From
David Rowley
Date:
On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
>> <david.rowley@2ndquadrant.com> wrote:
>>>> One last observation:
>>>>
>>>> +        /*
>>>> +         * An IS NOT NULL test is a no-op if there's any other strict quals,
>>>> +         * so if that's the case, then we'll only apply this, otherwise we'll
>>>> +         * ignore it.
>>>> +         */
>>>> +        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
>>>> +            s1 *= cslist->notnullsel;
>>>>
>>>> In some paths, the range selectivity estimation code punts and returns
>>>> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
>>>> present, it should still be applied. It could make a big difference if
>>>> the selectivity for the nulltest is high.
>>>
>>> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
>>> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
>>> to exists to estimate that properly. I don't think that needs done as
>>> part of this patch. I'd rather limit the scope since "returned with
>>> feedback" is already knocking at the door of this patch.
>>
>> I agree, but this patch regresses for those cases if the user
>> compensated with a not null test, leaving little recourse, so I'd fix
>> it in this patch to avoid the regression.
>>
>> Maybe you're right that the null fraction should be applied regardless
>> of whether there's an explicit null check, though.
>
> The more I read that code, the more I'm convinced there's no way to
> get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
> &&
>  rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
> clauses, a case the current code handles very poorly, returning
> DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
> give way-off estimates if the table had lots of nulls.
>
> Say, consider if "value" was mostly null and you write:
>
> SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
> 50 AND 60;
>
> All other cases I can think of would end with either hibound or
> lobound being equal to DEFAULT_INEQ_SEL.
>
> It seems to me, doing a git blame, that the checks in the else branch
> were left only as a precaution, one that seems unnecessary.
>
> If you ask me, I'd just leave:
>
> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
> DEFAULT_INEQ_SEL)
> + {
> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
> implies we have no stats */
> +     s2 = DEFAULT_RANGE_INEQ_SEL;
> + }
> + else
> + {
> ...
> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
> +   notnullsel = 1.0 - nullsel;
> +
> +   /* Adjust for double-exclusion of NULLs */
> +   s2 += nullsel;
> +   if (s2 <= 0.0) {
> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
> +      {
> +           /* Most likely contradictory clauses found */
> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
> +      }
> +      else
> +      {
> +           /* Could be a rounding error */
> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
> +      }
> +   }
> + }
>
> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
> cautious) estimation of the amount of rounding error that could be
> there with 32-bit floats.
>
> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
> an estimation based on stats, maybe approximate, but one that makes
> sense (ie: preserves the monotonicity of the CPF), and as such
> negative values are either sign of a contradiction or rounding error.
> The previous code left the possibility of negatives coming out of
> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
> but that doesn't seem possible coming out of scalarineqsel.
>
> But I'd like it if Tom could comment on that, since he's the one that
> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
> in 2004).
>
> Barring that, I'd just change the
>
> s2 = DEFAULT_RANGE_INEQ_SEL;
>
> To
>
> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>
> Which makes a lot more sense.

I think to fix this properly the selfuncs would have to return the
estimate, and nullfrac separately, so the nullfrac could just be
applied once per expression. There's already hacks in there to get rid
of the double null filtering. I'm not proposing that as something for
this patch. It would be pretty invasive to change this.


-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Tue, Apr 4, 2017 at 8:12 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> Result Comparison
>
> Master median tps Patch median tps comparison
> Test 1 6993.7 6714.3 104.16%
> Test 2 7053.1 6921.6 101.90%
> Test 3 5137.2 4954.2 103.69%
> Test 4 27.1 19.4 139.72%
> Test 5 54.1 51.4 105.28%
> Test 6 9328.1 9478.2 98.42%
>
> Results Analyzed:
>
> Test 1 has caused planning to slow down 4.16%. There's quite a bit of
> noise from the results, but I think this indicates there is some
> overhead to having to add items to the cslist and searching the cslist
> when new quals are seen.
>
> Test 2 has a lesser slowdown than test 1, as this test will excercise
> the existing rqlist caching in master too. Patched does a little more
> work adding the equality condition to the list too.
>
> Test 3 similar to test 1
>
> Test 4 adds quite an overhead and causes 0.5 million comparisons to
> find the expressions in the cslist.
>
> Test 5 shows less overhead than test 4 since the Master code has to
> also do the expression caching and searching.
>
> Test 6 is a control test

That's consistent with the operation being quadratic.

While I was about to point it out, the old code was quadratic too, so
it sucks in both versions. This version only adds other opexprs to the
list and makes the quadratic cost easier to run into, but it's nothing
new.

I don't think there's an easy fix for that. You'd have to build a hash
table of expression nodes or something of the sort to avoid the
quadratic cost. If you can do that, by all means, but that's a much
bigger patch.



Re: Making clausesel.c Smarter

From
Claudio Freire
Date:
On Tue, Apr 4, 2017 at 8:21 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
> On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:
>> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley
>>> <david.rowley@2ndquadrant.com> wrote:
>>>>> One last observation:
>>>>>
>>>>> +        /*
>>>>> +         * An IS NOT NULL test is a no-op if there's any other strict quals,
>>>>> +         * so if that's the case, then we'll only apply this, otherwise we'll
>>>>> +         * ignore it.
>>>>> +         */
>>>>> +        else if (cslist->selmask == CACHESEL_NOTNULLTEST)
>>>>> +            s1 *= cslist->notnullsel;
>>>>>
>>>>> In some paths, the range selectivity estimation code punts and returns
>>>>> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was
>>>>> present, it should still be applied. It could make a big difference if
>>>>> the selectivity for the nulltest is high.
>>>>
>>>> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL
>>>> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test
>>>> to exists to estimate that properly. I don't think that needs done as
>>>> part of this patch. I'd rather limit the scope since "returned with
>>>> feedback" is already knocking at the door of this patch.
>>>
>>> I agree, but this patch regresses for those cases if the user
>>> compensated with a not null test, leaving little recourse, so I'd fix
>>> it in this patch to avoid the regression.
>>>
>>> Maybe you're right that the null fraction should be applied regardless
>>> of whether there's an explicit null check, though.
>>
>> The more I read that code, the more I'm convinced there's no way to
>> get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL
>> &&
>>  rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory
>> clauses, a case the current code handles very poorly, returning
>> DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could
>> give way-off estimates if the table had lots of nulls.
>>
>> Say, consider if "value" was mostly null and you write:
>>
>> SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN
>> 50 AND 60;
>>
>> All other cases I can think of would end with either hibound or
>> lobound being equal to DEFAULT_INEQ_SEL.
>>
>> It seems to me, doing a git blame, that the checks in the else branch
>> were left only as a precaution, one that seems unnecessary.
>>
>> If you ask me, I'd just leave:
>>
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> +   notnullsel = 1.0 - nullsel;
>> +
>> +   /* Adjust for double-exclusion of NULLs */
>> +   s2 += nullsel;
>> +   if (s2 <= 0.0) {
>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> +      {
>> +           /* Most likely contradictory clauses found */
>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> +      }
>> +      else
>> +      {
>> +           /* Could be a rounding error */
>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> +      }
>> +   }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>>
>> But I'd like it if Tom could comment on that, since he's the one that
>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>> in 2004).
>>
>> Barring that, I'd just change the
>>
>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>
>> To
>>
>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>
>> Which makes a lot more sense.
>
> I think to fix this properly the selfuncs would have to return the
> estimate, and nullfrac separately, so the nullfrac could just be
> applied once per expression. There's already hacks in there to get rid
> of the double null filtering. I'm not proposing that as something for
> this patch. It would be pretty invasive to change this.

There's no need, you can compute the nullfrac with nulltestsel. While
there's some rework involved, it's not a big deal (it just reads the
stats tuple), and it's a clean solution.



Re: [HACKERS] Making clausesel.c Smarter

From
Claudio Freire
Date:
On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>> If you ask me, I'd just leave:
>>>
>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>> DEFAULT_INEQ_SEL)
>>> + {
>>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>> implies we have no stats */
>>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>>> + }
>>> + else
>>> + {
>>> ...
>>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>> +   notnullsel = 1.0 - nullsel;
>>> +
>>> +   /* Adjust for double-exclusion of NULLs */
>>> +   s2 += nullsel;
>>> +   if (s2 <= 0.0) {
>>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>> +      {
>>> +           /* Most likely contradictory clauses found */
>>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>> +      }
>>> +      else
>>> +      {
>>> +           /* Could be a rounding error */
>>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>> +      }
>>> +   }
>>> + }
>>>
>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>> cautious) estimation of the amount of rounding error that could be
>>> there with 32-bit floats.
>>>
>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>> an estimation based on stats, maybe approximate, but one that makes
>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>> negative values are either sign of a contradiction or rounding error.
>>> The previous code left the possibility of negatives coming out of
>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>> but that doesn't seem possible coming out of scalarineqsel.
>>>
>>> But I'd like it if Tom could comment on that, since he's the one that
>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>> in 2004).
>>>
>>> Barring that, I'd just change the
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>
>>> To
>>>
>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>
>>> Which makes a lot more sense.
>>
>> I think to fix this properly the selfuncs would have to return the
>> estimate, and nullfrac separately, so the nullfrac could just be
>> applied once per expression. There's already hacks in there to get rid
>> of the double null filtering. I'm not proposing that as something for
>> this patch. It would be pretty invasive to change this.
>
> There's no need, you can compute the nullfrac with nulltestsel. While
> there's some rework involved, it's not a big deal (it just reads the
> stats tuple), and it's a clean solution.

I'm marking this Waiting on Author until we figure out what to do with
those issues (the null issue quoted above, and the quadratic behavior)



Re: [HACKERS] Making clausesel.c Smarter

From
David Rowley
Date:
On 7 April 2017 at 09:05, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Apr 4, 2017 at 1:00 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
>>>> If you ask me, I'd just leave:
>>>>
>>>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>>>> DEFAULT_INEQ_SEL)
>>>> + {
>>>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>>>> implies we have no stats */
>>>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>>>> + }
>>>> + else
>>>> + {
>>>> ...
>>>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>>>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>>>> +   notnullsel = 1.0 - nullsel;
>>>> +
>>>> +   /* Adjust for double-exclusion of NULLs */
>>>> +   s2 += nullsel;
>>>> +   if (s2 <= 0.0) {
>>>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>>>> +      {
>>>> +           /* Most likely contradictory clauses found */
>>>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>>>> +      }
>>>> +      else
>>>> +      {
>>>> +           /* Could be a rounding error */
>>>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>> +      }
>>>> +   }
>>>> + }
>>>>
>>>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>>>> cautious) estimation of the amount of rounding error that could be
>>>> there with 32-bit floats.
>>>>
>>>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>>>> an estimation based on stats, maybe approximate, but one that makes
>>>> sense (ie: preserves the monotonicity of the CPF), and as such
>>>> negative values are either sign of a contradiction or rounding error.
>>>> The previous code left the possibility of negatives coming out of
>>>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>>>> but that doesn't seem possible coming out of scalarineqsel.
>>>>
>>>> But I'd like it if Tom could comment on that, since he's the one that
>>>> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back
>>>> in 2004).
>>>>
>>>> Barring that, I'd just change the
>>>>
>>>> s2 = DEFAULT_RANGE_INEQ_SEL;
>>>>
>>>> To
>>>>
>>>> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>>>>
>>>> Which makes a lot more sense.
>>>
>>> I think to fix this properly the selfuncs would have to return the
>>> estimate, and nullfrac separately, so the nullfrac could just be
>>> applied once per expression. There's already hacks in there to get rid
>>> of the double null filtering. I'm not proposing that as something for
>>> this patch. It would be pretty invasive to change this.
>>
>> There's no need, you can compute the nullfrac with nulltestsel. While
>> there's some rework involved, it's not a big deal (it just reads the
>> stats tuple), and it's a clean solution.
>
> I'm marking this Waiting on Author until we figure out what to do with
> those issues (the null issue quoted above, and the quadratic behavior)

I've attached a rebased patch as the existing one no longer applies.

I've not yet had time to wrap my head around the null handling stuff.
I'll try to get to that today.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Making clausesel.c Smarter

From
David Rowley
Date:
On 4 April 2017 at 13:35, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> If you ask me, I'd just leave:
>
> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
> DEFAULT_INEQ_SEL)
> + {
> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
> implies we have no stats */
> +     s2 = DEFAULT_RANGE_INEQ_SEL;
> + }
> + else
> + {
> ...
> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
> +   notnullsel = 1.0 - nullsel;
> +
> +   /* Adjust for double-exclusion of NULLs */
> +   s2 += nullsel;
> +   if (s2 <= 0.0) {
> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
> +      {
> +           /* Most likely contradictory clauses found */
> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
> +      }
> +      else
> +      {
> +           /* Could be a rounding error */
> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
> +      }
> +   }
> + }
>
> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
> cautious) estimation of the amount of rounding error that could be
> there with 32-bit floats.
>
> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
> an estimation based on stats, maybe approximate, but one that makes
> sense (ie: preserves the monotonicity of the CPF), and as such
> negative values are either sign of a contradiction or rounding error.
> The previous code left the possibility of negatives coming out of
> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
> but that doesn't seem possible coming out of scalarineqsel.

I notice this does change the estimates for contradictory clauses such as:

create table a (value int);
insert into a select x/100 from generate_Series(1,10000) x;
analyze a;
explain analyze select * from a where value between 101 and -1;

We now get:

                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
time=1.904..1.904 rows=0 loops=1)
   Filter: ((value >= 101) AND (value <= '-1'::integer))
   Rows Removed by Filter: 10000
 Planning time: 0.671 ms
 Execution time: 1.950 ms
(5 rows)

where before we'd get:

                                          QUERY PLAN
----------------------------------------------------------------------------------------------
 Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
time=0.903..0.903 rows=0 loops=1)
   Filter: ((value >= 101) AND (value <= '-1'::integer))
   Rows Removed by Filter: 10000
 Planning time: 0.162 ms
 Execution time: 0.925 ms
(5 rows)

Which comes from the 10000 * 0.005 selectivity estimate from tuples *
DEFAULT_RANGE_INEQ_SEL

I've attached a patch to this effect.


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Making clausesel.c Smarter

From
Claudio Freire
Date:
On Fri, Apr 7, 2017 at 2:28 AM, David Rowley
<david.rowley@2ndquadrant.com> wrote:
>> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound ==
>> DEFAULT_INEQ_SEL)
>> + {
>> +     /* No point in checking null selectivity, DEFAULT_INEQ_SEL
>> implies we have no stats */
>> +     s2 = DEFAULT_RANGE_INEQ_SEL;
>> + }
>> + else
>> + {
>> ...
>> +   s2 = rqlist->hibound + rqlist->lobound - 1.0;
>> +   nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid);
>> +   notnullsel = 1.0 - nullsel;
>> +
>> +   /* Adjust for double-exclusion of NULLs */
>> +   s2 += nullsel;
>> +   if (s2 <= 0.0) {
>> +      if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6))
>> +      {
>> +           /* Most likely contradictory clauses found */
>> +           s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel;
>> +      }
>> +      else
>> +      {
>> +           /* Could be a rounding error */
>> +           s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel;
>> +      }
>> +   }
>> + }
>>
>> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly
>> cautious) estimation of the amount of rounding error that could be
>> there with 32-bit floats.
>>
>> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is
>> an estimation based on stats, maybe approximate, but one that makes
>> sense (ie: preserves the monotonicity of the CPF), and as such
>> negative values are either sign of a contradiction or rounding error.
>> The previous code left the possibility of negatives coming out of
>> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates,
>> but that doesn't seem possible coming out of scalarineqsel.
>
> I notice this does change the estimates for contradictory clauses such as:
>
> create table a (value int);
> insert into a select x/100 from generate_Series(1,10000) x;
> analyze a;
> explain analyze select * from a where value between 101 and -1;
>
> We now get:
>
>                                          QUERY PLAN
> ---------------------------------------------------------------------------------------------
>  Seq Scan on a  (cost=0.00..195.00 rows=1 width=4) (actual
> time=1.904..1.904 rows=0 loops=1)
>    Filter: ((value >= 101) AND (value <= '-1'::integer))
>    Rows Removed by Filter: 10000
>  Planning time: 0.671 ms
>  Execution time: 1.950 ms
> (5 rows)
>
> where before we'd get:
>
>                                           QUERY PLAN
> ----------------------------------------------------------------------------------------------
>  Seq Scan on a  (cost=0.00..195.00 rows=50 width=4) (actual
> time=0.903..0.903 rows=0 loops=1)
>    Filter: ((value >= 101) AND (value <= '-1'::integer))
>    Rows Removed by Filter: 10000
>  Planning time: 0.162 ms
>  Execution time: 0.925 ms
> (5 rows)
>
> Which comes from the 10000 * 0.005 selectivity estimate from tuples *
> DEFAULT_RANGE_INEQ_SEL
>
> I've attached a patch to this effect.

+                    /*
+                     * A zero or slightly negative s2 should be converted into
+                     * a small positive value; we probably are dealing with a
+                     * very tight range and got a bogus result due to roundoff
+                     * errors. However, if s2 is very negative, then we
+                     * probably have default selectivity estimates on one or
+                     * both sides of the range that we failed to recognize
+                     * above for some reason.
+                     */
+                    if (s2 <= 0.0)

That comment seems outdated

Otherwise, the patch LGTM, but I'd like to solve the quadratic
behavior too... are you going to try? Otherwise I could take a stab at
it myself. It doesn't seem very difficult.

Also, can you add a test case? Cost values could make the test
fragile, so if that gives you trouble I'll understand, but it'd be
best to try and test this if possible.



Re: [HACKERS] Making clausesel.c Smarter

From
David Steele
Date:
On 4/7/17 5:33 PM, Claudio Freire wrote:
> 
> Also, can you add a test case? Cost values could make the test
> fragile, so if that gives you trouble I'll understand, but it'd be
> best to try and test this if possible.

This submission has been moved to CF 2017-07.

-- 
-David
david@pgmasters.net



Re: [HACKERS] Making clausesel.c Smarter

From
David Rowley
Date:
On 8 April 2017 at 09:33, Claudio Freire <klaussfreire@gmail.com> wrote:
> Otherwise, the patch LGTM, but I'd like to solve the quadratic
> behavior too... are you going to try? Otherwise I could take a stab at
> it myself. It doesn't seem very difficult.

I have some ideas in my head in a fairly generic way of solving this
which I'd like to take care of in PG11.

Many thanks for your reviews and suggestions on this patch, it's much
appreciated.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Making clausesel.c Smarter

From
Daniel Gustafsson
Date:
> On 09 Apr 2017, at 12:14, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 8 April 2017 at 09:33, Claudio Freire <klaussfreire@gmail.com> wrote:
>> Otherwise, the patch LGTM, but I'd like to solve the quadratic
>> behavior too... are you going to try? Otherwise I could take a stab at
>> it myself. It doesn't seem very difficult.
>
> I have some ideas in my head in a fairly generic way of solving this
> which I'd like to take care of in PG11.

This patch was moved to the currently open Commitfest.  Given the above
comment, is the last patch in this thread still up for review, or are you
working on a new version?  Just to avoid anyone reviewing an outdated version.

cheers ./daniel


Re: [HACKERS] Making clausesel.c Smarter

From
David Rowley
Date:
On 6 September 2017 at 00:43, Daniel Gustafsson <daniel@yesql.se> wrote:
> This patch was moved to the currently open Commitfest.  Given the above
> comment, is the last patch in this thread still up for review, or are you
> working on a new version?  Just to avoid anyone reviewing an outdated version.

Hi Daniel,

I've not had time to work on a new version yet and I can't imagine
that I will for quite some time.

The idea I had to remove the quadratic search on the list was to add
to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and
use that as a comparison function in a binary search tree. The Btree
would complement List as a way of storing Nodes in no particular
order, but in a way that a Node can be found quickly. There's likely
more than a handful of places in the planner that would benefit from
this already.

It's not a 5-minute patch and it probably would see some resistance,
so won't have time to look at this soon.

If the possibility of this increasing planning time in corner cases is
going to be a problem, then it might be best to return this with
feedback for now and I'll resubmit if I get time later in the cycle.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] Making clausesel.c Smarter

From
Daniel Gustafsson
Date:
> On 06 Sep 2017, at 07:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 6 September 2017 at 00:43, Daniel Gustafsson <daniel@yesql.se> wrote:
>> This patch was moved to the currently open Commitfest.  Given the above
>> comment, is the last patch in this thread still up for review, or are you
>> working on a new version?  Just to avoid anyone reviewing an outdated version.
>
> Hi Daniel,
>
> I've not had time to work on a new version yet and I can't imagine
> that I will for quite some time.
>
> The idea I had to remove the quadratic search on the list was to add
> to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and
> use that as a comparison function in a binary search tree. The Btree
> would complement List as a way of storing Nodes in no particular
> order, but in a way that a Node can be found quickly. There's likely
> more than a handful of places in the planner that would benefit from
> this already.
>
> It's not a 5-minute patch and it probably would see some resistance,
> so won't have time to look at this soon.
>
> If the possibility of this increasing planning time in corner cases is
> going to be a problem, then it might be best to return this with
> feedback for now and I'll resubmit if I get time later in the cycle.

For now I’ve marked this “Waiting on Author” awaiting a new patch for the
community to consider.  If there is no time to hack on this in the current CF
(which seems likely given your mail above) I’ll move it to the next CF as this
one draws to a close.

cheers ./daniel




Re: [HACKERS] Making clausesel.c Smarter

From
Daniel Gustafsson
Date:
> On 07 Sep 2017, at 09:30, Daniel Gustafsson <daniel@yesql.se> wrote:
>
>> On 06 Sep 2017, at 07:13, David Rowley <david.rowley@2ndquadrant.com> wrote:
>>
>> On 6 September 2017 at 00:43, Daniel Gustafsson <daniel@yesql.se> wrote:
>>> This patch was moved to the currently open Commitfest.  Given the above
>>> comment, is the last patch in this thread still up for review, or are you
>>> working on a new version?  Just to avoid anyone reviewing an outdated version.
>>
>> Hi Daniel,
>>
>> I've not had time to work on a new version yet and I can't imagine
>> that I will for quite some time.
>>
>> The idea I had to remove the quadratic search on the list was to add
>> to or modify equal() in equalfuncs.c to have it return -1, 0, +1 and
>> use that as a comparison function in a binary search tree. The Btree
>> would complement List as a way of storing Nodes in no particular
>> order, but in a way that a Node can be found quickly. There's likely
>> more than a handful of places in the planner that would benefit from
>> this already.
>>
>> It's not a 5-minute patch and it probably would see some resistance,
>> so won't have time to look at this soon.
>>
>> If the possibility of this increasing planning time in corner cases is
>> going to be a problem, then it might be best to return this with
>> feedback for now and I'll resubmit if I get time later in the cycle.
>
> For now I’ve marked this “Waiting on Author” awaiting a new patch for the
> community to consider.  If there is no time to hack on this in the current CF
> (which seems likely given your mail above) I’ll move it to the next CF as this
> one draws to a close.

Going back on my previous thinking of moving to the next CF, I am instead
marking this as returned with feedback.  When a new version of the patch is
ready, please re-submit to a future commitfest.

cheers ./daniel

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers