Re: [HACKERS] Parallel Append implementation - Mailing list pgsql-hackers

From Robert Haas
Subject Re: [HACKERS] Parallel Append implementation
Date
Msg-id CA+TgmoaNXzX8eibigFhSU=TbEB+JEp=BTdQAqRHDRWQmq9vKbQ@mail.gmail.com
Whole thread Raw
In response to Re: [HACKERS] Parallel Append implementation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
Responses Re: [HACKERS] Parallel Append implementation  (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>)
List pgsql-hackers
On Thu, Mar 9, 2017 at 7:42 AM, Ashutosh Bapat
<ashutosh.bapat@enterprisedb.com> wrote:
>>
>> +        if (rel->partial_pathlist != NIL &&
>> +            (Path *) linitial(rel->partial_pathlist) == subpath)
>> +            partial_subplans_set = bms_add_member(partial_subplans_set, i);
>>
>> This seems like a scary way to figure this out.  What if we wanted to
>> build a parallel append subpath with some path other than the
>> cheapest, for some reason?  I think you ought to record the decision
>> that set_append_rel_pathlist makes about whether to use a partial path
>> or a parallel-safe path, and then just copy it over here.
>
> I agree that assuming that a subpath is non-partial path if it's not
> cheapest of the partial paths is risky. In fact, we can not assume
> that even when it's not one of the partial_paths since it could have
> been kicked out or was never added to the partial path list like
> reparameterized path. But if we have to save the information about
> which of the subpaths are partial paths and which are not in
> AppendPath, it would take some memory, noticeable for thousands of
> partitions, which will leak if the path doesn't make into the
> rel->pathlist.

True, but that's no different from the situation for any other Path
node that has substructure.  For example, an IndexPath has no fewer
than 5 list pointers in it.  Generally we assume that the number of
paths won't be large enough for the memory used to really matter, and
I think that will also be true here.  And an AppendPath has a list of
subpaths, and if I'm not mistaken, those list nodes consume more
memory than the tracking information we're thinking about here will.

I think you're thinking about this issue because you've been working
on partitionwise join where memory consumption is a big issue, but
there are a lot of cases where that isn't really a big deal.

> The purpose of that information is to make sure that we
> allocate only one worker to that plan. I suggested that we use
> path->parallel_workers for the same, but it seems that's not
> guaranteed to be reliable. The reasons were discussed upthread. Is
> there any way to infer whether we can allocate more than one workers
> to a plan by looking at the corresponding path?

I think it would be smarter to track it some other way.  Either keep
two lists of paths, one of which is the partial paths and the other of
which is the parallel-safe paths, or keep a bitmapset indicating which
paths fall into which category.  I am not going to say there's no way
we could make it work without either of those things -- looking at the
parallel_workers flag might be made to work, for example -- but the
design idea I had in mind when I put this stuff into place was that
you keep them separate in other ways, not by the data they store
inside them.  I think it will be more robust if we keep to that
principle.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: [HACKERS] Gather Merge
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] CREATE/ALTER ROLE PASSWORD ('value' USING 'method')