Re: [HACKERS] <> join selectivity estimate question - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: [HACKERS] <> join selectivity estimate question
Date
Msg-id CAEepm=3=NHHko3oOzpik+ggLy17AO+px3rGYrg3x_x05+Br9-A@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] <> join selectivity estimate question  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [HACKERS] <> join selectivity estimate question
List pgsql-hackers
On Fri, Jul 21, 2017 at 8:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> writes:
>> On Thu, Jul 20, 2017 at 5:30 PM, Thomas Munro
>> <thomas.munro@enterprisedb.com> wrote:
>>> Does anyone know how to test a situation where the join is reversed according to
>>> get_join_variables, or "complicated cases where we can't tell for sure"?
>
>> explain select * from pg_class c right join pg_type t on (c.reltype =
>> t.oid); would end up with  *join_is_reversed = true; Is that what you
>> want? For a semi-join however I don't know how to induce that. AFAIU,
>> in a semi-join there is only one direction in which join can be
>> specified.
>
> You just have to flip the <> clause around, eg instead of
>
> explain analyze select * from tenk1 t
>   where exists (select 1 from int4_tbl i where t.ten <> i.f1);
>
> do
>
> explain analyze select * from tenk1 t
>   where exists (select 1 from int4_tbl i where i.f1 <> t.ten);
>
> No matter what the surrounding query is like exactly, one or the
> other of those should end up "join_is_reversed".

Ahh, I see.  Thanks for the explanation.

> This would be a bit harder to trigger for equality clauses, where you'd
> have to somehow defeat the EquivalenceClass logic's tendency to rip the
> clauses apart and reassemble them according to its own whims.  But for
> neqjoinsel that's not a problem.
>
>> I didn't get the part about "complicated cases where we can't tell for sure".
>
> You could force that with mixed relation membership on one or both sides
> of the <>, for instance "(a.b + b.y) <> a.c".  I don't think it's
> especially interesting for the present purpose though, since we're going
> to end up with 1.0 selectivity in any case where examine_variable can't
> find stats.

Thanks.  Bearing all that in mind, I ran through a series of test
scenarios and discovered that my handling for JOIN_ANTI was wrong: I
thought that I had to deal with inverting the result, but I now see
that that's handled elsewhere (calc_joinrel_size_estimate() I think).
So neqjoinsel should just treat JOIN_SEMI and JOIN_ANTI exactly the
same way.

That just leaves the question of whether we should try to handle the
empty RHS and single-value RHS cases using statistics.  My intuition
is that we shouldn't, but I'll be happy to change my intuition and
code that up if that is the feedback from planner gurus.

Please find attached a new version, and a test script I used, which
shows a bunch of interesting cases.  I'll add this to the commitfest.

-- 
Thomas Munro
http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] PgFDW connection invalidation by ALTER SERVER/ALTERUSER MAPPING
Next
From: Josh Berkus
Date:
Subject: [HACKERS] Better error message for trying to drop a DB with open subscriptions?