Thread: Removing INNER JOINs

Removing INNER JOINs

From
Andreas Joseph Krogh
Date:
Hi all.
 
I came across this from Oracle: https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination
 
Needless to say, this would be very cool to have in PG:-)
 
It seems this has been discussed before [1], [2], [3], and the consesus at the time was that the proposted implementation introduced way too much planning-overhead to be worth it. Given that other RDBMS-vendors provides this, and it's on the "Cool feactures other have that we don't"-list [4], is anyone interessted in working on improving this?
 
 
1. Patch to support SEMI and ANTI join removal:
https://www.postgresql.org/message-id/flat/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com#CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com
2. Patch to Remove INNER JOINs:
https://www.postgresql.org/message-id/CAApHDvp4SsyQq5r%2Bj5iUA1rF1SuWGD5QrhoVLqOqOxVXe%3DNjxw%40mail.gmail.com
3. Removing INNER JOINs
http://www.postgresql-archive.org/Removing-INNER-JOINs-td5828650i40.html
4. https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/#top3
 
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment

Re: Removing INNER JOINs

From
Jim Finnerty
Date:
FK constraint enforcement may be deferred, but only to the end of the DML
transaction.  Can someone provide an example where a SELECT statement that
is not in that DML transaction and that can view only committed data can see
a FK-side row that doesn't join with any PK-side row?

I don't think I understand why the volatile function scenario.  Can a
volatile function somehow circumvent FK enforcement?  If not, then why is
the volatile function scenario different than any other DML with respect to
the enforcement of FK constraints?

thank you,

    /Jim F






--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Removing INNER JOINs

From
David Rowley
Date:
On 11 December 2017 at 17:33, Jim Finnerty <jfinnert@amazon.com> wrote:
> FK constraint enforcement may be deferred, but only to the end of the DML
> transaction.  Can someone provide an example where a SELECT statement that
> is not in that DML transaction and that can view only committed data can see
> a FK-side row that doesn't join with any PK-side row?
>
> I don't think I understand why the volatile function scenario.  Can a
> volatile function somehow circumvent FK enforcement?  If not, then why is
> the volatile function scenario different than any other DML with respect to
> the enforcement of FK constraints?

It would have to be in the same transaction as the DML to get wrong
results, other transactions won't see the changes until the DML
transaction commits.

Here's an example of finding unmatched rows when a join is removed in
the same transaction as the DML.

create table t1 (id int primary key);
create table t2 (value int not null);

alter table t2 add constraint t2_t1_id_fkey foreign key (value)
references t1 match full on update cascasde on delete cascade;

create index on t2 (value);

create or replace function update_t1(p_id int, p_newid int) returns bigint as $$
declare c bigint;
begin
update t1 set id = p_newid where id = p_id;
-- count would always be 0 if foreign key was cascaded immediately
select count(*) into c from t2 where not exists(select 1 from t1 where
t1.id = t2.value);
return c;
end;
$$ language plpgsql volatile;

insert into t1 values(1),(2);
insert into t2 values(1),(2);

select *,update_t1(id, id + 2) from t1 order by id;


If we removed the anti-join in the count(*) into c query, we'd incorrectly get:

# select *,update_t1(id, id + 2) from t1 order by id;
 id | update_t1
----+-----------
  1 |         0
  2 |         0
(2 rows)

when we should get:

# select *,update_t1(id, id + 2) from t1 order by id;
 id | update_t1
----+-----------
  1 |         1
  2 |         2
(2 rows)

Yes, this is an ANTI-JOIN and not an INNER JOIN as per $subject, but
the same applies. This is just a more simple example to give.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Removing INNER JOINs

From
Jim Finnerty
Date:
Great example, David.  The planner can detect whether a SELECT statement
contains a volatile function, and can disable the proposed redundant
inner-join optimization in that case.  

If necessary, the planner could also check that the FK constraint is not
DEFERRED, but if there are no volatile functions and the SELECT statement
can't see an inconsistent state created by any other transaction, I think
that just checking for volatile functions and not being inside a DML
transaction would be sufficient.

A further opportunity would be to apply this to any SELECT statement in a
DML transaction, provided that there was no prior DML statement or statement
containing a volatile function in the same transaction.

We already have a redundant outer join optimization, and I've implemented
the redundant inner join optimization in two other products before, so
adding the additional logic to support the inner join case(s) sounds
straightforward to me.  Can anyone think of any other problem scenarios?

   /Jim F



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html


Re: Removing INNER JOINs

From
Tom Lane
Date:
Jim Finnerty <jfinnert@amazon.com> writes:
> Great example, David.  The planner can detect whether a SELECT statement
> contains a volatile function, and can disable the proposed redundant
> inner-join optimization in that case.  

> If necessary, the planner could also check that the FK constraint is not
> DEFERRED, but if there are no volatile functions and the SELECT statement
> can't see an inconsistent state created by any other transaction, I think
> that just checking for volatile functions and not being inside a DML
> transaction would be sufficient.

I don't think you're thinking nearly hard enough about what would break
this.  The planner does not have much insight into the context a
statement is being used in (e.g. whether we're inside some kind of
PL function).  Nor does it get to make assumptions about whether the plan
will be used inside a transaction block or not.

            regards, tom lane


Re: Removing INNER JOINs

From
David Rowley
Date:
On 12 December 2017 at 02:38, Jim Finnerty <jfinnert@amazon.com> wrote:
> If necessary, the planner could also check that the FK constraint is not
> DEFERRED, but if there are no volatile functions and the SELECT statement
> can't see an inconsistent state created by any other transaction, I think
> that just checking for volatile functions and not being inside a DML
> transaction would be sufficient.
>
> A further opportunity would be to apply this to any SELECT statement in a
> DML transaction, provided that there was no prior DML statement or statement
> containing a volatile function in the same transaction.
>
> We already have a redundant outer join optimization, and I've implemented
> the redundant inner join optimization in two other products before, so
> adding the additional logic to support the inner join case(s) sounds
> straightforward to me.  Can anyone think of any other problem scenarios?

You should read over [1]. This was my attempt at doing this over 3
years ago. The thread might save you from going down some of the dead
ends that I ended up going down.

[1]
https://www.postgresql.org/message-id/flat/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com#CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services