Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Parallel Seq Scan
Date
Msg-id CA+Tgmobf_9hVOToSptM05oEe6cM5ohBt=GyNfb-o+MJt9nQSYw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan
Re: Parallel Seq Scan
List pgsql-hackers
On Sat, Sep 26, 2015 at 10:16 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sat, Sep 26, 2015 at 8:38 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> QueryDesc's totaltime is for instrumentation information for plugin's
>>> like pg_stat_statements and we need only the total buffer usage
>>> of each worker to make it work as the other information is already
>>> collected in master backend, so I think that should work as I have
>>> written.
>>
>> I don't think that's right at all.  First, an extension can choose to
>> look at any part of the Instrumentation, not just the buffer usage.
>> Secondly, the buffer usage inside QueryDesc's totaltime isn't the same
>> as the global pgBufferUsage.
>
> Oh... but I'm wrong.  As long as our local pgBufferUsage gets update
> correctly to incorporate the data from the other workers, the
> InstrStopNode(queryDesc->totaltime) will suck in those statistics.
> And the only other things getting updated are nTuples (which shouldn't
> count anything that the workers did), firsttuple (similarly), and
> counter (where the behavior is a bit more arguable, but just counting
> the master's wall-clock time is at least defensible).  So now I think
> you're right: this should be OK.

OK, so here's a patch extracted from your
parallel_seqscan_partialseqscan_v18.patch with a fairly substantial
amount of rework by me:

- I left out the Funnel node itself; this is just the infrastructure
portion of the patch.  I also left out the stop-the-executor early
stuff and the serialization of PARAM_EXEC values.  I want to have
those things, but I think they need more thought and study first.
- I reorganized the code a fair amount into a former that I thought
was clearer, and certainly is closer to what I did previously in
parallel.c.  I found your version had lots of functions with lots of
parameters, and I found that made the logic difficult to follow, at
least for me.  As part of that, I munged the interface a bit so that
execParallel.c returns a structure with a bunch of pointers in it
instead of separately returning each one as an out parameter.  I think
that's cleaner.  If we need to add more stuff in the future, that way
we don't break existing callers.
- I reworked the interface with instrument.c and tried to preserve
something of an abstraction boundary there.  I also changed the way
that stuff accumulated statistics to include more things; I couldn't
see any reason to make it as narrow as you had it.
- I did a bunch of cosmetic cleanup, including changing function names
and rewriting comments.
- I replaced your code for serializing and restoring a ParamListInfo
with my version.
- I fixed the code so that it can handle collecting instrumentation
data from multiple nodes, bringing all the data back to the leader and
associating it with the right plan node.  This involved giving every
plan node a unique ID, as discussed with Tom on another recent thread.

After I did all that, I wrote some test code, which is also attached
here, that adds a new GUC force_parallel_worker.  If you set that GUC,
when you run a query, it'll run the query in a parallel worker and
feed the results back to the master.  I've tested this and it seems to
work, at least on the queries where you'd expect it to work.  It's
just test code, so it doesn't have error checking or make any attempt
not to push down queries that will fail in parallel mode. But you can
use it to see what happens.  You can also run queries under EXPLAIN
ANALYZE this way and, lo and behold, the worker stats show up attached
to the correct plan nodes.

I intend to commit this patch (but not the crappy test code, of
course) pretty soon, and then I'm going to start working on the
portion of the patch that actually adds the Funnel node, which I think
you are working on renaming to Gather.  I think that getting that part
committed is likely to be pretty straightforward; it doesn't need to
do a lot more than call this stuff and tell it to go do its thing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment

pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Next
From: Peter Eisentraut
Date:
Subject: Re: unclear about row-level security USING vs. CHECK