Re: Removing unneeded self joins - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Removing unneeded self joins
Date
Msg-id 16aa5d06-d8c6-416a-cb71-21663019944e@iki.fi
Whole thread Raw
In response to Re: Removing unneeded self joins  ("Andrey V. Lepikhov" <a.lepikhov@postgrespro.ru>)
Responses Re: Removing unneeded self joins  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
List pgsql-hackers
On 31/10/2020 11:26, Andrey V. Lepikhov wrote:
> +            /*
> +             * Process restrictlist to seperate out the self join quals from
> +             * the other quals. e.g x = x goes to selfjoinquals and a = b to
> +             * otherjoinquals.
> +             */
> +            split_selfjoin_quals(root, restrictlist, &selfjoinquals,
> +                                 &otherjoinquals);
> +
> +            if (list_length(selfjoinquals) == 0)
> +            {
> +                /*
> +                 * Have a chance to remove join if target list contains vars from
> +                 * the only one relation.
> +                 */
> +                if (list_length(otherjoinquals) == 0)
> +                {
> +                    /* Can't determine uniqueness without any quals. */
> +                    continue;
> +
> +                }
> +                else if (!tlist_contains_rel_exprs(root, joinrelids, inner))
> +                {
> +                    if (!innerrel_is_unique(root, joinrelids, outer->relids,
> +                                         inner, JOIN_INNER, otherjoinquals,
> +                                         false))
> +                        continue;
> +                }
> +                else
> +                    /*
> +                     * The target list contains vars from both inner and outer
> +                     * relations.
> +                     */
> +                    continue;
> +            }
> +
> +            /*
> +             * Determine if the inner table can duplicate outer rows.  We must
> +             * bypass the unique rel cache here since we're possibly using a
> +             * subset of join quals. We can use 'force_cache' = true when all
> +             * join quals are selfjoin quals.  Otherwise we could end up
> +             * putting false negatives in the cache.
> +             */
> +            else if (!innerrel_is_unique(root, joinrelids, outer->relids,
> +                                         inner, JOIN_INNER, selfjoinquals,
> +                                         list_length(otherjoinquals) == 0))
> +                continue;

I don't understand the logic here. If 'selfjoinquals' is empty, it means 
that there is no join qual between the two relations, right? How can we 
ever remove the join in that case? And how does the target list affect 
that? Can you give an example query of that?

> --- a/src/test/regress/expected/join.out
> +++ b/src/test/regress/expected/join.out
> @@ -4553,11 +4553,13 @@ explain (costs off)
>  select p.* from
>    (parent p left join child c on (p.k = c.k)) join parent x on p.k = x.k
>    where p.k = 1 and p.k = 2;
> -        QUERY PLAN        
> ---------------------------
> +                   QUERY PLAN                   
> +------------------------------------------------
>   Result
>     One-Time Filter: false
> -(2 rows)
> +   ->  Index Scan using parent_pkey on parent x
> +         Index Cond: (k = 1)
> +(4 rows)
>  
>  -- bug 5255: this is not optimizable by join removal
>  begin;

That doesn't seem like an improvement...

My general impression of this patch is that it's a lot of code to deal 
with a rare special case. At the beginning of this thread there was 
discussion on the overhead that this might add to planning queries that 
don't benefit, but adding a lot of code isn't nice either, even if the 
performance is acceptable. That's not necessarily a show-stopper, but it 
does make me much less excited about this. I'm not sure what to suggest 
to do about that, except a really vague "Can you make is simpler?"

- Heikki



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: A few new options for CHECKPOINT
Next
From: Alvaro Herrera
Date:
Subject: Re: remove spurious CREATE INDEX CONCURRENTLY wait