Home > mailing lists

Re: Problem about postponing gathering partial paths for topmost scan/join rel - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: Problem about postponing gathering partial paths for topmost scan/join rel
Date	July 14, 2022 14:02:58
Msg-id	2752.1657807378@antos.home Whole thread Raw
In response to	Re: Problem about postponing gathering partial paths for topmost scan/join rel (Richard Guo <guofenglinux@gmail.com>)
Responses	Re: Problem about postponing gathering partial paths for topmost scan/join rel
List	pgsql-hackers

Tree view

Richard Guo <guofenglinux@gmail.com> wrote:

> On Wed, Jul 28, 2021 at 3:42 PM Richard Guo <guofenglinux@gmail.com> wrote:
>
>  To fix this problem, I'm thinking we can leverage 'root->all_baserels'
>  to tell if we are at the topmost scan/join rel, something like:
>
>  --- a/src/backend/optimizer/path/allpaths.c
>  +++ b/src/backend/optimizer/path/allpaths.c
>  @@ -3041,7 +3041,7 @@ standard_join_search(PlannerInfo *root, int levels_needed, List *initial_rels)
>                           * partial paths.  We'll do the same for the topmost scan/join rel
>                           * once we know the final targetlist (see grouping_planner).
>                           */
>  -                       if (lev < levels_needed)
>  +                       if (!bms_equal(rel->relids, root->all_baserels))
>                                  generate_useful_gather_paths(root, rel, false);
>
>  Any thoughts?
>
> Attach a patch to include the fix described upthread. Would appreciate
> any comments on this topic.


I think I understand the idea but I'm not sure about the regression test. I
suspect that in the plan

EXPLAIN (COSTS OFF)
SELECT count(*) FROM tenk1 a JOIN tenk1 b ON a.two =3D b.two
  FULL JOIN tenk1 c ON b.two =3D c.two;
                         QUERY PLAN

------------------------------------------------------------
 Aggregate
   ->  Hash Full Join
         Hash Cond: (b.two =3D c.two)
         ->  Gather
               Workers Planned: 4
               ->  Parallel Hash Join
                     Hash Cond: (a.two =3D b.two)
                     ->  Parallel Seq Scan on tenk1 a
                     ->  Parallel Hash
                           ->  Parallel Seq Scan on tenk1 b
         ->  Hash
               ->  Gather
                     Workers Planned: 4
                     ->  Parallel Seq Scan on tenk1 c


the Gather node is located below the "Hash Full Join" node only because that
kind of join currently cannot be executed by parallel workers. If the parallel
"Hash Full Join" gets implemented (I've noticed but not checked in detail
[1]), it might break this test.

I'd prefer a test that demonstrates that the Gather node at the top of the
"subproblem plan" is useful purely from the *cost* perspective, rather than
due to executor limitation.


[1] https://commitfest.postgresql.org/38/2903/

--
Antonin Houska
Web: https://www.cybertec-postgresql.com

pgsql-hackers by date:

From: Dilip Kumar
Date: 14 July 2022, 13:12:11
Subject: Re: Handle infinite recursion in logical replication setup

From: Thomas Munro
Date: 14 July 2022, 14:26:10
Subject: Re: EINTR in ftruncate()

Re: Problem about postponing gathering partial paths for topmost scan/join rel - Mailing list pgsql-hackers

Previous

Next