Re: Fix BUG #17335: Duplicate result rows in Gather node - Mailing list pgsql-hackers
From | Yura Sokolov |
---|---|
Subject | Re: Fix BUG #17335: Duplicate result rows in Gather node |
Date | |
Msg-id | f26a2c2bd3926e15c96b832cf8d22fb066ea89f0.camel@postgrespro.ru Whole thread Raw |
In response to | Re: Fix BUG #17335: Duplicate result rows in Gather node (Yura Sokolov <y.sokolov@postgrespro.ru>) |
Responses |
Re: Fix BUG #17335: Duplicate result rows in Gather node
|
List | pgsql-hackers |
В Пн, 24/01/2022 в 16:24 +0300, Yura Sokolov пишет: > В Вс, 23/01/2022 в 14:56 +0300, Yura Sokolov пишет: > > В Чт, 20/01/2022 в 09:32 +1300, David Rowley пишет: > > > On Fri, 31 Dec 2021 at 00:14, Yura Sokolov <y.sokolov@postgrespro.ru> wrote: > > > > Suggested quick (and valid) fix in the patch attached: > > > > - If Append has single child, then copy its parallel awareness. > > > > > > I've been looking at this and I've gone through changing my mind about > > > what's the right fix quite a number of times. > > > > > > My current thoughts are that I don't really like the fact that we can > > > have plans in the following shape: > > > > > > Finalize Aggregate > > > -> Gather > > > Workers Planned: 1 > > > -> Partial Aggregate > > > -> Parallel Hash Left Join > > > Hash Cond: (gather_append_1.fk = gather_append_2.fk) > > > -> Index Scan using gather_append_1_ix on gather_append_1 > > > Index Cond: (f = true) > > > -> Parallel Hash > > > -> Parallel Seq Scan on gather_append_2 > > > > > > It's only made safe by the fact that Gather will only use 1 worker. > > > To me, it just seems too fragile to assume that's always going to be > > > the case. I feel like this fix just relies on the fact that > > > create_gather_path() and create_gather_merge_path() do > > > "pathnode->num_workers = subpath->parallel_workers;". If someone > > > decided that was to work a different way, then we risk this breaking > > > again. Additionally, today we have Gather and GatherMerge, but we may > > > one day end up with more node types that gather results from parallel > > > workers, or even a completely different way of executing plans. > > > > It seems strange parallel_aware and parallel_safe flags neither affect > > execution nor are properly checked. > > > > Except parallel_safe is checked in ExecSerializePlan which is called from > > ExecInitParallelPlan, which is called from ExecGather and ExecGatherMerge. > > But looks like this check doesn't affect execution as well. > > > > > I think a safer way to fix this is to just not remove the > > > Append/MergeAppend node if the parallel_aware flag of the only-child > > > and the Append/MergeAppend don't match. I've done that in the > > > attached. > > > > > > I believe the code at the end of add_paths_to_append_rel() can remain as is. > > > > I found clean_up_removed_plan_level also called from set_subqueryscan_references. > > Is there a need to patch there as well? > > > > And there is strange state: > > - in the loop by subpaths, pathnode->node.parallel_safe is set to AND of > > all its subpath's parallel_safe > > (therefore there were need to copy it in my patch version), > > - that means, our AppendPath is parallel_aware but not parallel_safe. > > It is ridiculous a bit. > > > > And it is strange AppendPath could have more parallel_workers than sum of > > its children parallel_workers. > > > > So it looks like whole machinery around parallel_aware/parallel_safe has > > no enough consistency. > > > > Either way, I attach you version of fix with my tests as new patch version. > > Looks like volatile "Memory Usage:" in EXPLAIN brokes 'make check' > sporadically. > > Applied replacement in style of memoize.sql test. > > Why there is no way to disable "Buckets: %d Buffers: %d Memory Usage: %dkB" > output in show_hash_info? And another attempt to fix tests volatility.
Attachment
pgsql-hackers by date: