Re: [POC] Allow flattening of subquery with a link to upper query - Mailing list pgsql-hackers

From David Rowley
Subject Re: [POC] Allow flattening of subquery with a link to upper query
Date
Msg-id CAApHDvrZMEiQCUZkHgorVmxxOmoabCRQRC-BR2Snq0oLP6wyjg@mail.gmail.com
Whole thread Raw
In response to Re: [POC] Allow flattening of subquery with a link to upper query  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses Re: [POC] Allow flattening of subquery with a link to upper query  (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
List pgsql-hackers
On Tue, 20 Feb 2024 at 22:57, Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> explain (costs off)
> SELECT relname FROM pg_class c1
> WHERE relname = ANY (
>    SELECT a.amname from pg_am a WHERE a.oid=c1.oid GROUP BY a.amname
> );
>
> We see on master:
>   Nested Loop
>     ->  Seq Scan on pg_class c1
>     ->  Subquery Scan on "ANY_subquery"
>           Filter: (c1.relname = "ANY_subquery".amname)
>           ->  Group
>                 Group Key: a.amname
>                 ->  Sort
>                       Sort Key: a.amname
>                       ->  Seq Scan on pg_am a
>                             Filter: (oid = c1.oid)
>
> And with this patch:
>   Hash Join
>     Hash Cond: ((c1.relname = a.amname) AND (c1.oid = a.oid))
>     ->  Seq Scan on pg_class c1
>     ->  Hash
>           ->  HashAggregate
>                 Group Key: a.amname
>                 ->  Seq Scan on pg_am a

I've only glanced at the patch just so I could determine if you're
making a cost-based decision and doing this transformation only if the
de-correlation of the subquery is deemed the cheaper option. It looks
like since you're doing this in the same location that we do the other
semi / anti join transformations that there's no costing.

I agree that it would be nice to teach the planner how to do this, but
I think it just has to be a cost-based decision.  Imagine how the
transformed query would perform of pg_am had a billion rows and
pg_class had 1 row. That's quite a costly hash table build to be
probing it just once.

I didn't follow the patch, but there was a patch to push aggregate
function evaluation down [1].  I imagine this has the same problem as
if you just blindly pushed and aggregate function evaluation as deep
as you could evaluate all the aggregate's parameters and group by vars
then you may end up aggregating far more than you need to as some join
could eliminate the majority of the groups.  I think we'd need to come
up with some way to have the planner consider these types of
optimisations as alternatives to what happens today and only apply
them when we estimate that they're cheaper.  Right now a Path has no
ability to describe that it's performed GROUP BY.

David

[1] https://commitfest.postgresql.org/46/4019/



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: speed up a logical replica setup
Next
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Have pg_basebackup write "dbname" in "primary_conninfo"?