Re: Parallel Sequence Scan doubts - Mailing list pgsql-hackers

From Craig Ringer
Subject Re: Parallel Sequence Scan doubts
Date
Msg-id 53F8AF3A.6010906@2ndquadrant.com
Whole thread Raw
In response to Parallel Sequence Scan doubts  (Haribabu Kommi <kommi.haribabu@gmail.com>)
Responses Re: Parallel Sequence Scan doubts
List pgsql-hackers
On 08/21/2014 02:47 PM, Haribabu Kommi wrote:
> Corrected subject.
> 
> Hi Hackers,
> 
> Implementation of "Parallel Sequence Scan"

I think you mean "Parallel Sequential Scan".

Scanning a sequence in parallel doesn't make much sense.

> 1."Parallel Sequence Scan" can achieved by using the background
> workers doing the job of actual sequence scan including the
> qualification check also.

Only if the qualifiers are stable/immutable I think.

Not even necessarily stable functions - consider use of the
fmgr/executor state contexts to carry information over between calls,
and what parallel execution would do to that.

> 3. In the executor Init phase, Try to copy the necessary data required
> by the workers and start the workers.

Copy how?

Back-ends can only communicate with each other over shared memory,
signals, and using sockets.

Presumably you'd use a dynamic shared memory segment, but it's going to
be "interesting" to copy that kind of state over. Some of the work
Robert has proposed to add support for data structures that are native
to dynamic shmem might help, I guess...

> 4. In the executor run phase, just get the tuples which are sent by
> the workers and process them further in the plan node execution.

Again, how do you propose to copy these back to the main bgworker?

That's been one of the things that's limited parallel scan when it's
been looked at before.

> 1. Data structures that are required to be copied from backend to
> worker are currentTransactionState, Snapshot, GUC, ComboCID, Estate
> and etc.

That's a big "etc". Huge, in fact.

Any function can reference any global variable. Even an immutable
function might use globals for cache etc - and might, for example, set
them up using an executor start hook. You cannot really make any
assumptions about what functions access what memory.

So it's not possible, as far as I can understand, to define a simple
subset of state that must be copied.

Nor is it necessarily correct to discard the copied state at the end of
the run even if you can copy it. Code may well depend on that state
being updated and preserved across calls.

> I see some problems in copying "Estate" data structure into the shared
> memory because it contains so many pointers. There is a need of some
> infrastructure to copy these data structures into the shared memory.

It's not just a matter of copying them into/via shmem.

It's about their meaning. Does it even make sense to copy the executor
state to another backend? If so, you can't copy it back, so what do you
do at the end of the scans?

> Any suggestions?

Before you try to design anything more on this, study the *large* amount
of discussion that has happened on this topic on this mailing list over
the last years.

This is not a simple or easy task, and it's not one you should approach
without studying what's already been worked on, built, contemplated, etc.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



pgsql-hackers by date:

Previous
From: Fabrízio de Royes Mello
Date:
Subject: Re: [GSoC2014] Patch ALTER TABLE ... SET LOGGED
Next
From: Tom Lane
Date:
Subject: Re: proposal: rounding up time value less than its unit.