Re: Gather Merge - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Gather Merge |
Date | |
Msg-id | CAEepm=3zn1bmzZqkB9YxZ5=wWYdYkKScHVrGv=X9=Pn2bR+E9g@mail.gmail.com Whole thread Raw |
In response to | Re: Gather Merge (Rushabh Lathia <rushabh.lathia@gmail.com>) |
Responses |
Re: Gather Merge
|
List | pgsql-hackers |
On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote: > On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <thomas.munro@enterprisedb.com> > wrote: >> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group >> + * Portions Copyright (c) 1994, Regents of the University of California >> >> Shouldn't this say just "(c) 2016, PostgreSQL Global Development >> Group"? > > Fixed. The year also needs updating to 2016 in nodeGatherMerge.h. >> + /* Per-tuple heap maintenance cost */ >> + run_cost += path->path.rows * comparison_cost * 2.0 * logN; >> >> Why multiply by two? The comment above this code says "about log2(N) >> comparisons to delete the top heap entry and another log2(N) >> comparisons to insert its successor". In fact gather_merge_getnext >> calls binaryheap_replace_first, which replaces the top element without >> any comparisons at all and then performs a sift-down in log2(N) >> comparisons to find its new position. There is no per-tuple "delete" >> involved. We "replace" the top element with the value it already had, >> just to trigger the sift-down, because we know that our comparator >> function might have a new opinion of the sort order of this element. >> Very clever! The comment and the 2.0 factor in cost_gather_merge seem >> to be wrong though -- or am I misreading the code? >> > See cost_merge_append. That just got tweaked in commit 34ca0905. > Looking at the plan I realize that this is happening because wrong costing > for Gather Merge. Here in the plan we can see the row estimated by > Gather Merge is wrong. This is because earlier patch GM was considering > rows = subpath->rows, which is not true as subpath is partial path. So > we need to multiple it with number of worker. Attached patch also fixed > this issues. I also run the TPC-H benchmark with the patch and results > are same as earlier. In create_grouping_paths: + double total_groups = gmpath->rows * gmpath->parallel_workers; This hides a variable of the same name in the enclosing scope. Maybe confusing? In some other places like create_ordered_paths: + double total_groups = path->rows * path->parallel_workers; Though it probably made sense to use this variable name in create_grouping_paths, wouldn't total_rows be better here? It feels weird to be working back to a total row count estimate from the partial one by simply multiplying by path->parallel_workers. Gather Merge will underestimate the total rows when parallel_workers < 4, if using partial row estimates ultimately from cost_seqscan which assume some leader contribution. I don't have a better idea though. Reversing cost_seqscan's logic certainly doesn't seem right. I don't know how to make them agree on the leader's contribution AND give principled answers, since there seems to be some kind of cyclic dependency in the costing logic (cost_seqscan really needs to be given a leader contribution estimate from its superpath which knows whether it will allow the leader to pull tuples greedily/fairly or not, but that superpath hasn't been created yet; cost_gather_merge needs the row count from its subpath). Or maybe I'm just confused. -- Thomas Munro http://www.enterprisedb.com
pgsql-hackers by date: