Home > mailing lists

Re: Gather Merge - Mailing list pgsql-hackers

From	Rushabh Lathia
Subject	Re: Gather Merge
Date	November 24, 2016 15:12:35
Msg-id	CAGPqQf37KFFW=Q3PGhTLmWD1sTovCs+KykNAP4PjS21z0Szkgg@mail.gmail.com Whole thread Raw
In response to	Re: Gather Merge (Rushabh Lathia <rushabh.lathia@gmail.com>)
Responses	Re: Gather Merge
List	pgsql-hackers

Tree view

On Wed, Nov 16, 2016 at 3:10 PM, Rushabh Lathia <rushabh.lathia@gmail.com> wrote:

On Mon, Nov 14, 2016 at 3:51 PM, Thomas Munro <thomas.munro@enterprisedb.com> wrote:
On Sat, Nov 12, 2016 at 1:56 AM, Rushabh Lathia
<rushabh.lathia@gmail.com> wrote:
> On Fri, Nov 4, 2016 at 8:30 AM, Thomas Munro <thomas.munro@enterprisedb.com>
> wrote:
>> + * Portions Copyright (c) 1996-2015, PostgreSQL Global Development Group
>> + * Portions Copyright (c) 1994, Regents of the University of California
>>
>> Shouldn't this say just "(c) 2016, PostgreSQL Global Development
>> Group"?
>
> Fixed.

The year also needs updating to 2016 in nodeGatherMerge.h.

Oops sorry, fixed now.

>> + /* Per-tuple heap maintenance cost */
>> + run_cost += path->path.rows * comparison_cost * 2.0 * logN;
>>
>> Why multiply by two? The comment above this code says "about log2(N)
>> comparisons to delete the top heap entry and another log2(N)
>> comparisons to insert its successor". In fact gather_merge_getnext
>> calls binaryheap_replace_first, which replaces the top element without
>> any comparisons at all and then performs a sift-down in log2(N)
>> comparisons to find its new position. There is no per-tuple "delete"
>> involved. We "replace" the top element with the value it already had,
>> just to trigger the sift-down, because we know that our comparator
>> function might have a new opinion of the sort order of this element.
>> Very clever! The comment and the 2.0 factor in cost_gather_merge seem
>> to be wrong though -- or am I misreading the code?
>>
> See cost_merge_append.

That just got tweaked in commit 34ca0905.

Fixed.

> Looking at the plan I realize that this is happening because wrong costing
> for Gather Merge. Here in the plan we can see the row estimated by
> Gather Merge is wrong. This is because earlier patch GM was considering
> rows = subpath->rows, which is not true as subpath is partial path. So
> we need to multiple it with number of worker. Attached patch also fixed
> this issues. I also run the TPC-H benchmark with the patch and results
> are same as earlier.

In create_grouping_paths:
+ double total_groups = gmpath->rows *
gmpath->parallel_workers;

This hides a variable of the same name in the enclosing scope. Maybe confusing?

In some other places like create_ordered_paths:
+ double total_groups = path->rows * path->parallel_workers;

Though it probably made sense to use this variable name in
create_grouping_paths, wouldn't total_rows be better here?

Initially I just copied from the other places. I agree with you that
create_ordered_paths - total_rows make more sense.

It feels weird to be working back to a total row count estimate from
the partial one by simply multiplying by path->parallel_workers.
Gather Merge will underestimate the total rows when parallel_workers <
4, if using partial row estimates ultimately from cost_seqscan which
assume some leader contribution. I don't have a better idea though.
Reversing cost_seqscan's logic certainly doesn't seem right. I don't
know how to make them agree on the leader's contribution AND give
principled answers, since there seems to be some kind of cyclic
dependency in the costing logic (cost_seqscan really needs to be given
a leader contribution estimate from its superpath which knows whether
it will allow the leader to pull tuples greedily/fairly or not, but
that superpath hasn't been created yet; cost_gather_merge needs the
row count from its subpath). Or maybe I'm just confused.

Yes, I agree with you. But we can't really do changes into cost_seqscan.
Another option I can think of is just calculate the rows for gather merge,
by using the reverse formula which been used into cost_seqscan. So
we can completely remote the rows argument from the create_gather_merge_path()
and then inside create_gather_merge_path() - calculate the total_rows
using same formula which been used into cost_seqscan. This is working
fine - but not quite sure about the approach. So I attached that part of changes
as separate patch. Any suggestions?

With offline discussion with Thomas, I realized that this won't work. It will

work only if the subplan is seqscan - so this logic won't be enough to

estimate the rows. I guess as Thomas told earlier, this is not problem

with GatherMerge implementation as such - so we will keep this as separate.

Apart from this my colleague Neha Sharma, reported the server crash with the patch.
It was hitting below Assert into create_gather_merge_path().

Assert(pathkeys);

Basically when query is something like "select * from foo where a = 1 order by a;"

here query has sortclause but planner won't generate sort key because

where equality clause is on same column. Fix is about making sure of pathkeys
before calling create_gather_merge_path().

PFA latest patch with fix as well as few cosmetic changes.

--
Rushabh Lathia
www.EnterpriseDB.com

Rushabh Lathia

Attachment

gather_merge_v5.patch

pgsql-hackers by date:

From: Amit Langote
Date: 24 November 2016, 14:13:44
Subject: Re: Declarative partitioning - another take

From: Mithun Cy
Date: 24 November 2016, 15:16:55
Subject: Re: Patch: Implement failover on libpq connect level.

Re: Gather Merge - Mailing list pgsql-hackers

Attachment

Previous

Next