Re: Join optimization - Mailing list pgsql-general

From luis.roberto@siscobra.com.br
Subject Re: Join optimization
Date
Msg-id 601802839.11789890.1594587181511.JavaMail.zimbra@siscobra.com.br
Whole thread Raw
In response to Re: Join optimization  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-general
----- Mensagem original -----
De: "David Rowley" <dgrowleyml@gmail.com>
Para: "luis.roberto" <luis.roberto@siscobra.com.br>
Cc: "Fabrízio de Royes Mello" <fabrizio@timbira.com.br>, "pgsql-general" <pgsql-general@lists.postgresql.org>
Enviadas: Domingo, 12 de julho de 2020 5:29:08
Assunto: Re: Join optimization

On Sun, 12 Jul 2020 at 06:59, <luis.roberto@siscobra.com.br> wrote:
>
> I'm sorry for the bad example.
>
> Here is another, with some data on PG:
https://dbfiddle.uk/?rdbms=postgres_13&fiddle=ccfd1c4fa291e74a6db9db1772e2b5ac and Oracle:
https://dbfiddle.uk/?rdbms=oracle_18&fiddle=21a98f499065ad4e2c35ff4bd1487e14.

I believe what you're talking about is join removals.  It appears as
though Oracle is able to remove the inner join to the users table as
the join only serves to check the user record exists. No columns are
being selected.  The record must exist due to the foreign key
referencing users.

PostgreSQL currently can only remove left joins. Likely what you could
do here is just change your inner join into a left join. If you're
happy enough that the user record will always exist then that should
allow it to work.

The reason PostgreSQL does not currently support inner join is that by
default, foreign key constraints are only triggered at the end of the
query, (or if deferred, at the end of the transaction). WIth
PostgreSQL, it's possible for a SELECT query to see a violated foreign
key constraint.  This can happen if your SELECT query calls a function
which updates a referenced record.  The cascade of the foreign key
won't occur until the end of the statement, so the select may stumble
upon a violated foreign key.

Here's a quick example of this case:
drop table t1,t2;
create table t1 (a int primary key);
create table t2 (a int references t1 on update cascade);

insert into t1 values(1),(2);
insert into t2 values(2),(2);
create or replace function update_t1 (p_a int) returns int as $$ begin
update t1 set a = a + 1 where a = p_a; return p_a; end; $$ language
plpgsql volatile;

-- in theory, this should never return any rows as we're asking for
all rows that
-- don't exist in the referenced table. However, we do get a result
since the function
-- call updates t1 setting the row with a=2 to a=3. The cascade to t2
does not occur
-- until the end of the statement.
select update_t1(a+1),t1.a from t1 where not exists(select 1 from t2
where t1.a=t2.a);
 update_t1 | a
-----------+---
         2 | 1
(1 row)

If you're happy that you don't have any functions like that which
could momentarily cause the foreign key to appear violated, then there
shouldn't be any harm in changing the INNER JOIN on users to a LEFT
JOIN.  PostgreSQL will be able to remove the join in that case.

David

-----

Thanks for the reply David! I understand it better now.



pgsql-general by date:

Previous
From: Anto Aravinth
Date:
Subject:
Next
From: Rama Krishnan
Date:
Subject: Doubt in mvcc