Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CA+TgmoZrjAB0bPzbtKgjP2uAP_6XAyuZenU54QuM7XGE_k2Q1g@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
Responses Re: [HACKERS] Parallel Append implementation  (Amit Khandekar <amitdkhan.pg@gmail.com>)
List pgsql-hackers
On Fri, Mar 10, 2017 at 12:17 AM, Amit Khandekar <amitdkhan.pg@gmail.com> wrote:
> I agree that the two-lists approach will consume less memory than
> bitmapset. Keeping two lists will effectively have an extra pointer
> field which will add up to the AppendPath size, but this size will not
> grow with the number of subpaths, whereas the Bitmapset will grow.

Sure.  You'll use about one BIT of memory per subpath.  I'm kind of
baffled as to why we're treating this as an issue worth serious
discussion; the amount of memory involved is clearly very small.  Even
for an appendrel with 1000 children, that's 125 bytes of memory.
Considering the amount of memory we're going to spend planning that
appendrel overall, that's not significant.

However, Ashutosh's response made me think of something: one thing is
that we probably do want to group all of the non-partial plans at the
beginning of the Append so that they get workers first, and put the
partial plans afterward.  That's because the partial plans can always
be accelerated by adding more workers as they become available, but
the non-partial plans are just going to take as long as they take - so
we want to start them as soon as possible.  In fact, what we might
want to do is actually sort the non-partial paths in order of
decreasing cost, putting the most expensive one first and the others
in decreasing order after that - and then similarly afterward with the
partial paths.  If we did that, we wouldn't need to store a bitmapset
OR two separate lists.  We could just store the index of the first
partial plan in the list.  Then you can test whether a path is partial
by checking whether this_index >= first_partial_index.

One problem with that is that, since the leader has about a 4ms head
start on the other workers, it would tend to pick the most expensive
path to run locally before any other worker had a chance to make a
selection, and that's probably not what we want.  To fix that, let's
have the leader start at the end of the list of plans and work
backwards towards the beginning, so that it prefers cheaper and
partial plans over decisions that would force it to undertake a large
amount of work itself.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Stephen Frost
Date:
Subject: Re: [HACKERS] GSOC Introduction / Eliminate O(N^2) scaling fromrw-conflict tracking in serializable transactions
Next
From: "Sven R. Kunze"
Date:
Subject: Re: [HACKERS] SQL/JSON in PostgreSQL