Re: [PATCH] Equivalence Class Filters - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [PATCH] Equivalence Class Filters
Date
Msg-id 11204.1449419889@sss.pgh.pa.us
Whole thread Raw
In response to Re: [PATCH] Equivalence Class Filters  (David Rowley <david.rowley@2ndquadrant.com>)
Responses Re: [PATCH] Equivalence Class Filters
Re: [PATCH] Equivalence Class Filters
List pgsql-hackers
David Rowley <david.rowley@2ndquadrant.com> writes:
> On 6 December 2015 at 06:07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Another issue that would need consideration is how to avoid skewing
>> planner selectivity estimates with derived clauses that are fully
>> redundant with other clauses.

> Could you give me an example of where this is a problem?

Using the regression database, try

explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand;

explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand and a.thousand < 100;

explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand and a.thousand < 100 and b.thousand <
100;

The first two give pretty accurate estimates for the join size, the third
not so much, because it thinks b.thousand < 100 is an independent
constraint.

> I've tried fooling
> the planner into giving me bad estimates, but I've not managed to.
> # explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and
> t1.id < 1000;

That particular case works because clausesel.c's AddRangeClause figures
out that the similar inequalities are redundant and drops all but one,
on its way to (not) identifying a range constraint for t1.id.  There's
nothing comparable for constraints on different variables, though,
especially not constraints on Vars of different relations, which will
never even appear in the same restriction list.

> if so, is the current eclass code not prone to the exact same problem?

No, and I already explained why not.

>> Lastly, in most cases knowing that t2.id <= 10 is just not worth all
>> that much; it's certainly far less useful than an equality condition.

> I'd have thought that my link to a thread of a reported 1100 to 2200 times
> performance improvement might have made you think twice about claiming the
> uselessness of this idea.

I said "in most cases".  You can find example cases to support almost any
weird planner optimization no matter how expensive and single-purpose;
but that is the wrong way to think about it.  What you have to think about
is average cases, and in particular, not putting a drag on planning time
in cases where no benefit ensues.  We're not committing any patches that
give one uncommon case an 1100X speedup by penalizing every other query 10%,
or even 1%; especially not when there may be other ways to fix it.

The EC machinery wasn't designed in a vacuum.  It is interested in
deriving equality conditions because those are useful for merge and hash
joins.  It's also tightly tied in with the planner's understanding of sort
ordering and distinct-ness.  None of those considerations provide any
reason to build a similar thing for inequalities.

It may be that computing derived inequalities is something that's worth
adding, but we're going to have to take a very hard look at the costs and
benefits; it's not a slam dunk that such a patch would get committed.
        regards, tom lane



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: a word-choice question
Next
From: Konstantin Knizhnik
Date:
Subject: Re: Logical replication and multimaster