Re: Parallelize correlated subqueries that execute within each worker - Mailing list pgsql-hackers
From | James Coleman |
---|---|
Subject | Re: Parallelize correlated subqueries that execute within each worker |
Date | |
Msg-id | CAAaqYe8fF=qc2Z+ttmrBkCMhV976Sy83uS1w9CyXMV7=X7NSMg@mail.gmail.com Whole thread Raw |
In response to | Re: Parallelize correlated subqueries that execute within each worker (Antonin Houska <ah@cybertec.at>) |
Responses |
Re: Parallelize correlated subqueries that execute within each worker
|
List | pgsql-hackers |
On Mon, Feb 6, 2023 at 11:39 AM Antonin Houska <ah@cybertec.at> wrote: > > James Coleman <jtc331@gmail.com> wrote: > > Which this patch we do in fact now see (as expected) rels with > > non-empty lateral_relids showing up in generate_[useful_]gather_paths. > > And the partial paths can now have non-empty required outer rels. I'm > > not able to come up with a plan that would actually be caught by those > > checks; I theorize that because of the few places we actually call > > generate_[useful_]gather_paths we are in practice already excluding > > those, but for now I've left these as a conditional rather than an > > assertion because it seems like the kind of guard we'd want to ensure > > those methods are safe. > > Maybe we can later (in separate patches) relax the restrictions imposed on > partial path creation a little bit, so that more parameterized partial paths > are created. > > One particular case that should be rejected by your checks is a partial index > path, which can be parameterized, but I couldn't construct a query that makes > your checks fire. Maybe the reason is that a parameterized index path is > mostly used on the inner side of a NL join, however no partial path can be > used there. (The problem is that each worker evaluating the NL join would only > see a subset of the inner relation, which whould lead to incorrect results.) > > So I'd also choose conditions rather than assert statements. Thanks for confirming. > > Following are my (minor) findings: > > In generate_gather_paths() you added this test > > /* > * Delay gather path creation until the level in the join tree where all > * params used in a worker are generated within that worker. > */ > if (!bms_is_subset(required_outer, rel->relids)) > return; > > but I'm not sure if required_outer can contain anything of rel->relids. How > about using bms_is_empty(required) outer, or even this? > > if (required_outer) > return; > > Similarly, > > /* We can't pass params to workers. */ > if (!bms_is_subset(PATH_REQ_OUTER(cheapest_partial_path), rel->relids)) > > might look like > > if (!bms_is_empty(PATH_REQ_OUTER(cheapest_partial_path))) > > or > > if (PATH_REQ_OUTER(cheapest_partial_path)) I'm not sure about this change. Deciding is difficult given the fact that we don't seem to currently generate these paths, but I don't see a reason why lateral_relids can't be present on the rel, and if so, then we need to check to see if they're a subset of relids we can satisfy rather than checking that they don't exist. > In particular, build_index_paths() does the following when setting > outer_relids (which eventually becomes (path->param_info->ppi_req_outer): > > /* Enforce convention that outer_relids is exactly NULL if empty */ > if (bms_is_empty(outer_relids)) > outer_relids = NULL; > > > Another question is whether in this call > > simple_gather_path = (Path *) > create_gather_path(root, rel, cheapest_partial_path, rel->reltarget, > required_outer, rowsp); > > required_outer should be passed to create_gather_path(). Shouldn't it rather > be PATH_REQ_OUTER(cheapest_partial_path) that you test just above? Again, > build_index_paths() initializes outer_relids this way > > outer_relids = bms_copy(rel->lateral_relids); > > but then it may add some more relations: > > /* OK to include this clause */ > index_clauses = lappend(index_clauses, iclause); > outer_relids = bms_add_members(outer_relids, > rinfo->clause_relids); > > So I think that PATH_REQ_OUTER(cheapest_partial_path) in > generate_gather_paths() can eventually contain more relations than > required_outer, and therefore it's safer to check the first. Yes, this is a good catch. Originally I didn't know about PATH_REQ_OUTER, and I'd missed using it in these places. > > Similar comments might apply to generate_useful_gather_paths(). Here I also > suggest to move this test > > /* We can't pass params to workers. */ > if (!bms_is_subset(PATH_REQ_OUTER(subpath), rel->relids)) > continue; > > to the top of the loop because it's relatively cheap. Moved. Attached is v9. James Coleman
Attachment
pgsql-hackers by date: