Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: Removing unneeded self joins
Date
Msg-id CAPpHfdsQkY0Gu7bQ9Ekdyzo_5Xxz74TYKbwmk1on1WAASpxXcQ@mail.gmail.com
Whole thread Raw
In response to Re: Removing unneeded self joins  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: Removing unneeded self joins
Re: Removing unneeded self joins
List pgsql-hackers
On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >  > Relid replacement machinery is the most contradictory code here. We used
> >  > a utilitarian approach and implemented a simplistic variant.
> >
> >  > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >  > > they are not necessary.  [1] points that infrastructure from [2] might
> >  > > be useful.  The patchset from [2] seems committed mow.  However, I
> >  > > can't see it is directly helpful in this matter.  Could we just skip
> >  > > adding IS NOT NULL clause for the columns, that have
> >  > > pg_attribute.attnotnull set?
> >  > Thanks for the links, I will look into that case.
> To be more precise, in the attachment, you can find a diff to the main
> patch, which shows the volume of changes to achieve the desired behaviour.
> Some explains in regression tests shifted. So, I've made additional tests:
>
> DROP TABLE test CASCADE;
> CREATE TABLE test (a int, b int not null);
> CREATE UNIQUE INDEX abc ON test(b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> CREATE UNIQUE INDEX abc1 ON test(a,b);
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> DROP INDEX abc1;
> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
>
> We have almost the results we wanted to have. But in the last explain
> you can see that nothing happened with the OR clause. We should use the
> expression mutator instead of walker to handle such clauses. But It
> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> of this issue off for a while.

OK.  I think it doesn't worth to eliminate IS NULL quals with this
complexity (at least at this stage of work).

I made improvements over the code.  Mostly new comments, grammar
corrections of existing comments and small refactoring.

Also, I found that the  suggestion from David Rowley [1] to qsort
array of relations to faster find duplicates is still unaddressed.
I've implemented it.  That helps to evade quadratic complexity with
large number of relations.

Also I've incorporated improvements from Alena Rybakina except one for
skipping SJ removal when no SJ quals is found.  It's not yet clear for
me if this check fix some cases. But at least optimization got skipped
in some useful cases (as you can see in regression tests).

Links
1. https://www.postgresql.org/message-id/CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ%40mail.gmail.com
2. https://www.postgresql.org/message-id/96f66ae3-df10-4060-9844-4c9633062cd3%40yandex.ru

------
Regards,
Alexander Korotkov

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: pg_upgrade's interaction with pg_resetwal seems confusing
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [PoC] pg_upgrade: allow to upgrade publisher node