Thread: Removing INNER JOINs
Hi all.
I came across this from Oracle: https://oracle-base.com/articles/misc/join-elimination#basic-join-elimination
Needless to say, this would be very cool to have in PG:-)
It seems this has been discussed before [1], [2], [3], and the consesus at the time was that the proposted implementation introduced way too much planning-overhead to be worth it. Given that other RDBMS-vendors provides this, and it's on the "Cool feactures other have that we don't"-list [4], is anyone interessted in working on improving this?
1. Patch to support SEMI and ANTI join removal:
https://www.postgresql.org/message-id/flat/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com#CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com
2. Patch to Remove INNER JOINs:
https://www.postgresql.org/message-id/CAApHDvp4SsyQq5r%2Bj5iUA1rF1SuWGD5QrhoVLqOqOxVXe%3DNjxw%40mail.gmail.com
3. Removing INNER JOINs
http://www.postgresql-archive.org/Removing-INNER-JOINs-td5828650i40.html
4. https://blog.jooq.org/2017/09/28/10-cool-sql-optimisations-that-do-not-depend-on-the-cost-model/#top3
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
FK constraint enforcement may be deferred, but only to the end of the DML transaction. Can someone provide an example where a SELECT statement that is not in that DML transaction and that can view only committed data can see a FK-side row that doesn't join with any PK-side row? I don't think I understand why the volatile function scenario. Can a volatile function somehow circumvent FK enforcement? If not, then why is the volatile function scenario different than any other DML with respect to the enforcement of FK constraints? thank you, /Jim F -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
On 11 December 2017 at 17:33, Jim Finnerty <jfinnert@amazon.com> wrote: > FK constraint enforcement may be deferred, but only to the end of the DML > transaction. Can someone provide an example where a SELECT statement that > is not in that DML transaction and that can view only committed data can see > a FK-side row that doesn't join with any PK-side row? > > I don't think I understand why the volatile function scenario. Can a > volatile function somehow circumvent FK enforcement? If not, then why is > the volatile function scenario different than any other DML with respect to > the enforcement of FK constraints? It would have to be in the same transaction as the DML to get wrong results, other transactions won't see the changes until the DML transaction commits. Here's an example of finding unmatched rows when a join is removed in the same transaction as the DML. create table t1 (id int primary key); create table t2 (value int not null); alter table t2 add constraint t2_t1_id_fkey foreign key (value) references t1 match full on update cascasde on delete cascade; create index on t2 (value); create or replace function update_t1(p_id int, p_newid int) returns bigint as $$ declare c bigint; begin update t1 set id = p_newid where id = p_id; -- count would always be 0 if foreign key was cascaded immediately select count(*) into c from t2 where not exists(select 1 from t1 where t1.id = t2.value); return c; end; $$ language plpgsql volatile; insert into t1 values(1),(2); insert into t2 values(1),(2); select *,update_t1(id, id + 2) from t1 order by id; If we removed the anti-join in the count(*) into c query, we'd incorrectly get: # select *,update_t1(id, id + 2) from t1 order by id; id | update_t1 ----+----------- 1 | 0 2 | 0 (2 rows) when we should get: # select *,update_t1(id, id + 2) from t1 order by id; id | update_t1 ----+----------- 1 | 1 2 | 2 (2 rows) Yes, this is an ANTI-JOIN and not an INNER JOIN as per $subject, but the same applies. This is just a more simple example to give. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Great example, David. The planner can detect whether a SELECT statement contains a volatile function, and can disable the proposed redundant inner-join optimization in that case. If necessary, the planner could also check that the FK constraint is not DEFERRED, but if there are no volatile functions and the SELECT statement can't see an inconsistent state created by any other transaction, I think that just checking for volatile functions and not being inside a DML transaction would be sufficient. A further opportunity would be to apply this to any SELECT statement in a DML transaction, provided that there was no prior DML statement or statement containing a volatile function in the same transaction. We already have a redundant outer join optimization, and I've implemented the redundant inner join optimization in two other products before, so adding the additional logic to support the inner join case(s) sounds straightforward to me. Can anyone think of any other problem scenarios? /Jim F -- Sent from: http://www.postgresql-archive.org/PostgreSQL-general-f1843780.html
Jim Finnerty <jfinnert@amazon.com> writes: > Great example, David. The planner can detect whether a SELECT statement > contains a volatile function, and can disable the proposed redundant > inner-join optimization in that case. > If necessary, the planner could also check that the FK constraint is not > DEFERRED, but if there are no volatile functions and the SELECT statement > can't see an inconsistent state created by any other transaction, I think > that just checking for volatile functions and not being inside a DML > transaction would be sufficient. I don't think you're thinking nearly hard enough about what would break this. The planner does not have much insight into the context a statement is being used in (e.g. whether we're inside some kind of PL function). Nor does it get to make assumptions about whether the plan will be used inside a transaction block or not. regards, tom lane
On 12 December 2017 at 02:38, Jim Finnerty <jfinnert@amazon.com> wrote: > If necessary, the planner could also check that the FK constraint is not > DEFERRED, but if there are no volatile functions and the SELECT statement > can't see an inconsistent state created by any other transaction, I think > that just checking for volatile functions and not being inside a DML > transaction would be sufficient. > > A further opportunity would be to apply this to any SELECT statement in a > DML transaction, provided that there was no prior DML statement or statement > containing a volatile function in the same transaction. > > We already have a redundant outer join optimization, and I've implemented > the redundant inner join optimization in two other products before, so > adding the additional logic to support the inner join case(s) sounds > straightforward to me. Can anyone think of any other problem scenarios? You should read over [1]. This was my attempt at doing this over 3 years ago. The thread might save you from going down some of the dead ends that I ended up going down. [1] https://www.postgresql.org/message-id/flat/CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com#CAApHDvpCBEfuc5tD=vniepAv0pU5m=q=fOQZcOdMHeei7OQPgQ@mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services