Re: Parallel Seq Scan - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Parallel Seq Scan
Date
Msg-id CAA4eK1Lsrid=OGR155JY1wZkpJz8STZkPM7kOJsuWb+OguKqdw@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Parallel Seq Scan  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Parallel Seq Scan  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Sat, Sep 26, 2015 at 6:07 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, Sep 25, 2015 at 7:46 AM, Amit Kapila <amit.kapila16@gmail.com> wrote:
> > I have a question here which is why this format doesn't have a similar
> > problem
> > as the current version, basically in current patch the second read of
> > SerializedParamExternData can be misaligned and for same reason in your
> > patch the second read of Oid could by misaligned?
>
> memcpy() can cope with unaligned data; structure member assignment can't.
>

So doesn't coping means, it anyways have to have to pay the performance
penality to make it equivalent to aligned address access.  Apart from that,
today I had read about memcpy's behaviour incase of unaligned address,
it seems from some of the information on net that it could be unsafe [1],[2]. 

> I've worked some of this code over fairly heavily today and I'm pretty
> happy with how my copy of execParallel.c looks now, but I've run into
> one issue where I wanted to check with you. There are three places
> where Instrumentation can be attached to a query: a ResultRelInfo's
> ri_TrigInstrument (which doesn't matter for us because we don't
> support parallel write queries, and triggers don't run on reads), a
> PlanState's instrument, and a QueryDesc's total time.
>
> Your patch makes provision to copy ONE Instrumentation structure per
> worker back to the parallel leader.  I assumed this must be the
> QueryDesc's totaltime, but it looks like it's actually the PlanState
> of the top node passed to the worker.  That's of course no good if we
> ever push more than one node down to the worker, which we may very
> well want to do in the initial version, and surely want to do
> eventually.  We can't just deal with the top node and forget all the
> others.  Is that really what's happening here, or am I confused?
>

Yes, you have figured out correctly, I was under impression that we
will have single node execution in worker for first version and then
will extend it later.

QueryDesc's totaltime is for instrumentation information for plugin's
like pg_stat_statements and we need only the total buffer usage
of each worker to make it work as the other information is already
collected in master backend, so I think that should work as I have
written.

> Assuming I'm not confused, I'm planning to see about fixing this...
>

Can't we just traverse the queryDesc->planstate tree and fetch/add
all the instrument information if there are multiple nodes?

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel Seq Scan
Next
From: Fabien COELHO
Date:
Subject: Re: pgbench stats per script & other stuff