Re: Removing unneeded self joins - Mailing list pgsql-hackers

From David Rowley
Subject Re: Removing unneeded self joins
Date
Msg-id CAKJS1f8ySSsBfooH3bJK7OD3LBEbDb99d8J_FtqDd6w50p-eAQ@mail.gmail.com
Whole thread Raw
In response to Removing unneeded self joins  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Responses Re: Removing unneeded self joins  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
Re: Removing unneeded self joins  (Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru>)
List pgsql-hackers
On 17 May 2018 at 03:43, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote:
> I'd be glad to hear your thoughts on this.

(I only glanced at the patch)

I've thought and discussed this before on this list. I think the
arguments for and against it were much the same as you've received
already. If you trawl through the archives you'll see my argument for
matches quite closely to Robert regarding the nested-views. I
personally experienced this issue in my previous job, although it was
not with PostgreSQL.

I think it's worth doing this providing that we can fast-path out
quickly enough in cases where we can't possibly remove anything.
Likely the success of this patch depends on how quick that fast-path
is.

From my experience on join removals, I imagine all this can be done
just after the left join removal code has completed. I see your patch
does it much later, which I don't think is particularly great since
Paths have already been generated by that time. I think it makes sense
to do this as early as possible to save wasting planning work for
relations that will be removed.

I think all this can be done just after left joins are removed by
remove_useless_joins. You may like to move the code that exists in
that function today into a new static function named
remove_useless_left_joins, and put this new code in new static
function named remove_useless_self_joins:

1. Allocate an array root->simple_rel_array_size in size. Populate it
with a struct which is defined as struct { Index relid; Oid oid; }
2. Populate that array by looping over the simple_rel_array. Ignore
anything that's not a baserel with relkind = 'r'
3. qsort the array on Oid.
4. Make a pass over the array (up to its size - 1) looking for
elements where the current oid is the same as the next. Build a List
of RelIds containing all relids of Oids which are duplicated.
5. If no pairs. Abort.
6. Process each combination of pairs found in each Relids in the list
made in step 1. Probably start at the lowest relid.
7. For each pair:
  a. If there's a join condition, ensure all join OpExprs are equality
exprs with a mergejoinable opno (copy what left join removal check
with the opno used). Ensure Vars used in the OpExpr have the same
attrno on each side.
  b. For bonus points don't reject non-Vars used in the join
condition, but ensure they're equal and there are no non-immutable
functions inside.
  c. Ensure relation_has_unique_index_for returns true for the Vars
(and Exprs if doing b) used in the join condition.
  d. Choose the relation with the highest relid and rewrite the parse
changing the varno of all Vars to use the one of the relation with the
lowest relid.
  e. list_concat baserestictinfos from removed relation onto other relation.
  f. Check for Vars in the join condition that can contain NULLs and
lappend IS NOT NULLs into the baserestrictinfo. (Removing the join
could have removed NULL filtering)
  g. Mark highest relid relation as DEAD. (See what the left join
removal code does (you may need to do some extra work around
equivalence classes))


-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Removing unneeded self joins
Next
From: David Rowley
Date:
Subject: Re: Removing unneeded self joins