Re: Parallelize correlated subqueries that execute within each worker - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Parallelize correlated subqueries that execute within each worker |
Date | |
Msg-id | 8459b9bd-6248-4a43-a281-e40adb537242@enterprisedb.com Whole thread Raw |
In response to | Re: Parallelize correlated subqueries that execute within each worker (James Coleman <jtc331@gmail.com>) |
List | pgsql-hackers |
Hi, I was going through the "needs review" patches with no recent messages, trying to figure out what is needed to move them forward, and this one caught my eye because I commented on it before. And it's also a bit sad example, because it started in 2021 and is moving at glacier speed :-( I read through the thread, to understand how the design changed over time, and I like the current approach (proposed by Robert) much more than the initial idea of adding new flag next to parallel_safe etc. And in general the patch looks reasonably simple and clean, but my knowledge of PARAM intricacies is pretty much nil, so I'm hesitant to claim the patch is correct. And I'm not sure what exactly needs to happen to validate the approach :-( The regression tests currently fail, due to a different plan for one of the new queries in select_parallel. I guess it's due to some committed patch, and it looks like a sensible change, but I haven't looked closely. Also, I do get this warning when building with GCC 12.2.0 on Debian: clauses.c: In function ‘max_parallel_hazard_walker’: clauses.c:961:49: warning: ‘save_safe_param_ids’ may be used uninitialized [-Wmaybe-uninitialized] 961 | context->safe_param_ids = save_safe_param_ids; | ~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~ clauses.c:943:29: note: ‘save_safe_param_ids’ was declared here 943 | List *save_safe_param_ids; | ^~~~~~~~~~~~~~~~~~~ It's harmless, the compiler simply does not realize the two blocks (where save_safe_param_ids is set and used) have exactly the same if conditions, but it's a bit confusing for people too. I was wondering if this could affect some queries in TPC-H, but the only query affected seems to be Q2 - where it helps, cutting the time in half, but Q2 is generally pretty fast / the expensive part was already parallelized quite well (i.e. the correlated subquery is fairly cheap). However, it's not difficult to construct a query where this helps a lot. If the correlated subquery does something expensive (e.g. aggregation of non-trivial amounts of data), this would help. So I wonder if e.g. TPC-DS would benefit from this more ... A couple review comments about the code: 1) new fields in max_parallel_hazard_context should have comments: + bool check_params; + Bitmapset **required_params; 2) Do we need both is_parallel_safe and is_parallel_safe_with_params? ISTM the main difference is the for() loop, so why not add an extra parameter to is_parallel_safe() and skip that loop if it's null? Or, if we worry about external code, keep is_parallel_safe_with_params() and define is_parallel_safe() as is_parallel_safe_with_params(...,NULL)? 3) Isn't it a bit weird that is_parallel_safe_with_params() actually sets check_params=false, which seems like it doesn't actually check parameters? I'm a bit confused / unsure if this is a bug or how it actually checks parameters. If it's correct, it may need a comment. 4) The only place setting check_params is max_parallel_hazard, which is called only for the whole Query from standard_planner(). But it does not set required_params - can't this be an issue if the pointer happens to be random garbage? 5) It probably needs a pgindent run, there's a bunch of rather long lines and those are likely to get wrapped and look weird. 6) Is the comment in max_parallel_hazard_walker() still accurate? It talks about PARAM_EXTERN and PARAM_EXEC, but the patch removes the PARAM_EXTERN check entirely. So maybe the comment needs updating? 7) I don't like the very complex if condition very much, it's hard to understand. I'd split that into two separate conditions, and add a short comment for each of them. I.e. instead of: if (param->paramkind != PARAM_EXEC || !(context->check_params || context->required_params != NULL)) return false; I'd do /* ... comment ... */ if (param->paramkind != PARAM_EXEC) return false; /* ... comment ... */ if (!(context->check_params || context->required_params != NULL)) return false; or something like that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: