Thread: An idea for parallelizing COPY within one backend

An idea for parallelizing COPY within one backend

From
"Florian G. Pflug"
Date:
As far as I can see the main difficulty in making COPY run faster (on
the server) is that pretty involved conversion from plain-text lines
into tuples. Trying to get rid of this conversion by having the client
send something that resembles the data stored in on-disk tuples is not a
good answer, either, because it ties the client too closely to
backend-version specific implementation details.

But those problems only arise if the *client* needs to deal with the
binary format. What I envision is parallelizing that conversion step on
the server, controlled by a backend process, kind of like a filter
between the server and the client.

Upon reception of a COPY INTO command, a backend would
.) Retrieve all catalog information required to convert a plain-text
line into a tuple
.) Fork off a "dealer" and N "worker" processes that take over the
client connection. The "dealer" distributes lines received from the
client to the N workes, while the original backend receives them
as tuples back from the workers.

Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.

Now I'm eagerly waiting to hear all the reasons why this idea is broken
as hell ;-)
regards, Florian Pflug


Re: An idea for parallelizing COPY within one backend

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> ...
> Neither the "dealer", nor the "workers" would need access to the either
> the shared memory or the disk, thereby not messing with the "one backend
> is one transaction is one session" dogma.
> ...

Unfortunately, this idea has far too narrow a view of what a datatype
input function might do.  Just for starters, consider "enum" input,
which certainly requires catalog access.  We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.
        regards, tom lane


Re: An idea for parallelizing COPY within one backend

From
Dimitri Fontaine
Date:
Hi,

Le mercredi 27 février 2008, Florian G. Pflug a écrit :
> Upon reception of a COPY INTO command, a backend would
> .) Fork off a "dealer" and N "worker" processes that take over the
> client connection. The "dealer" distributes lines received from the
> client to the N workes, while the original backend receives them
> as tuples back from the workers.

This looks so much like what pgloader does now (version 2.3.0~dev2, release
candidate) at the client side, when configured for it, that I can't help
answering the mail :)http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading section_threads = N
split_file_reading= False 

Of course, the backends still have to parse the input given by pgloader, which
only pre-processes data. I'm not sure having the client prepare the data some
more (binary format or whatever) is a wise idea, as you mentionned and wrt
Tom's follow-up. But maybe I'm all wrong, so I'm all ears!

Regards,
--
dim

Re: An idea for parallelizing COPY within one backend

From
Simon Riggs
Date:
On Wed, 2008-02-27 at 09:09 +0100, Dimitri Fontaine wrote:
> Hi,
> 
> Le mercredi 27 février 2008, Florian G. Pflug a écrit :
> > Upon reception of a COPY INTO command, a backend would
> > .) Fork off a "dealer" and N "worker" processes that take over the
> > client connection. The "dealer" distributes lines received from the
> > client to the N workes, while the original backend receives them
> > as tuples back from the workers.
> 
> This looks so much like what pgloader does now (version 2.3.0~dev2, release 
> candidate) at the client side, when configured for it, that I can't help 
> answering the mail :)
>  http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading
>   section_threads = N
>   split_file_reading = False
> 
> Of course, the backends still have to parse the input given by pgloader, which 
> only pre-processes data. I'm not sure having the client prepare the data some 
> more (binary format or whatever) is a wise idea, as you mentionned and wrt 
> Tom's follow-up. But maybe I'm all wrong, so I'm all ears!

ISTM the external parallelization approach is more likely to help us
avoid bottlenecks, so I support Dimitri's approach.

We also need error handling which pgloader also has. 

Writing error handling and parallelization into COPY isn't going to be
easy, and not very justifiable either if we already have both.

There might be a reason to re-write it in C one day, but that will be
fairly easy task if we ever need to do it.

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 



Re: An idea for parallelizing COPY within one backend

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> ...
>> Neither the "dealer", nor the "workers" would need access to the either
>> the shared memory or the disk, thereby not messing with the "one backend
>> is one transaction is one session" dogma.
>> ...
> 
> Unfortunately, this idea has far too narrow a view of what a datatype
> input function might do.  Just for starters, consider "enum" input,
> which certainly requires catalog access.  We have also explicitly
> acknowledged the idea that datatype I/O functions might try to store
> typmod-related data in some special catalog somewhere.

Hm... how many in-core datatypes are there which need catalog access in
their input or output functions? Maybe we could change the API for
i/o functions in a way that allows us to request all needed information
to be cached?

regards, Florian Pflug


Re: An idea for parallelizing COPY within one backend

From
"Florian G. Pflug"
Date:
Dimitri Fontaine wrote:
> Of course, the backends still have to parse the input given by pgloader, which 
> only pre-processes data. I'm not sure having the client prepare the data some 
> more (binary format or whatever) is a wise idea, as you mentionned and wrt 
> Tom's follow-up. But maybe I'm all wrong, so I'm all ears!

As far as I understand, pgloader starts N threads or processes that open 
up N individual connections to the server. In that case, moving then 
text->binary conversion from the backend into the loader won't give any
additional performace I'd say.

The reason that I'd love some within-one-backend solution is that I'd 
allow you to utilize more than one CPU for a restore within a *single* 
transaction. This is something that a client-side solution won't be able 
to deliver, unless major changes to the architecture of postgres happen 
first...

regards, Florian Pflug



Re: An idea for parallelizing COPY within one backend

From
"A.M."
Date:
On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:

> Dimitri Fontaine wrote:
>> Of course, the backends still have to parse the input given by  
>> pgloader, which only pre-processes data. I'm not sure having the  
>> client prepare the data some more (binary format or whatever) is a  
>> wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm  
>> all wrong, so I'm all ears!
>
> As far as I understand, pgloader starts N threads or processes that  
> open up N individual connections to the server. In that case, moving  
> then text->binary conversion from the backend into the loader won't  
> give any
> additional performace I'd say.
>
> The reason that I'd love some within-one-backend solution is that  
> I'd allow you to utilize more than one CPU for a restore within a  
> *single* transaction. This is something that a client-side solution  
> won't be able to deliver, unless major changes to the architecture  
> of postgres happen first...

It seems like multiple backends should be able to take advantage of  
2PC for transaction safety.

Cheers,
M


Re: An idea for parallelizing COPY within one backend

From
Brian Hurt
Date:
Tom Lane wrote:<br /><blockquote cite="mid13568.1204087614@sss.pgh.pa.us" type="cite"><pre wrap="">"Florian G. Pflug"
<aclass="moz-txt-link-rfc2396E" href="mailto:fgp@phlo.org"><fgp@phlo.org></a> writes: </pre><blockquote
type="cite"><prewrap="">...
 
Neither the "dealer", nor the "workers" would need access to the either
the shared memory or the disk, thereby not messing with the "one backend
is one transaction is one session" dogma.
...   </pre></blockquote><pre wrap="">
Unfortunately, this idea has far too narrow a view of what a datatype
input function might do.  Just for starters, consider "enum" input,
which certainly requires catalog access.  We have also explicitly
acknowledged the idea that datatype I/O functions might try to store
typmod-related data in some special catalog somewhere.
        regards, tom lane
 </pre></blockquote> Would it be possible to determine when the copy is starting that this case holds, and not use the
parallelparsing idea in those cases?<br /><br /> I'm a big user of copy, generally into very simple tables- few
indexes,simple column types (numeric, varchar, and int almost exclusively), no fancy features.  A parallel copy input
inthe "simple" cases would be of great advantage to me, even if it doesn't parallelize "complicated" cases.<br /><br />
Brian<br/><br /> 

Re: An idea for parallelizing COPY within one backend

From
Alvaro Herrera
Date:
A.M. wrote:
>
> On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:

>> The reason that I'd love some within-one-backend solution is that I'd 
>> allow you to utilize more than one CPU for a restore within a *single* 
>> transaction. This is something that a client-side solution won't be 
>> able to deliver, unless major changes to the architecture of postgres 
>> happen first...
>
> It seems like multiple backends should be able to take advantage of 2PC 
> for transaction safety.

Yeah, but it wouldn't take advantage of, say, the hack to disable WAL
when the table was created in the same transaction.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: An idea for parallelizing COPY within one backend

From
"Heikki Linnakangas"
Date:
A.M. wrote:
> 
> On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote:
> 
>> Dimitri Fontaine wrote:
>>> Of course, the backends still have to parse the input given by 
>>> pgloader, which only pre-processes data. I'm not sure having the 
>>> client prepare the data some more (binary format or whatever) is a 
>>> wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm 
>>> all wrong, so I'm all ears!
>>
>> As far as I understand, pgloader starts N threads or processes that 
>> open up N individual connections to the server. In that case, moving 
>> then text->binary conversion from the backend into the loader won't 
>> give any
>> additional performace I'd say.
>>
>> The reason that I'd love some within-one-backend solution is that I'd 
>> allow you to utilize more than one CPU for a restore within a *single* 
>> transaction. This is something that a client-side solution won't be 
>> able to deliver, unless major changes to the architecture of postgres 
>> happen first...
> 
> It seems like multiple backends should be able to take advantage of 2PC 
> for transaction safety.

Yes, whatever is coordinating the multiple backends (a master backend? i 
haven't followed this thread closely) would then have to have logic to 
finish the prepared transactions if you crash after you've committed one 
but not all of them. IOW, it would need a mini transaction log of its own.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: An idea for parallelizing COPY within one backend

From
"Florian G. Pflug"
Date:
Brian Hurt wrote:
> Tom Lane wrote:
>> "Florian G. Pflug" <fgp@phlo.org> writes:
>>> ...
>>> Neither the "dealer", nor the "workers" would need access to the either
>>> the shared memory or the disk, thereby not messing with the "one backend
>>> is one transaction is one session" dogma.
>>> ...
>> Unfortunately, this idea has far too narrow a view of what a datatype
>> input function might do.  Just for starters, consider "enum" input,
>> which certainly requires catalog access.  We have also explicitly
>> acknowledged the idea that datatype I/O functions might try to store
>> typmod-related data in some special catalog somewhere.

> Would it be possible to determine when the copy is starting that this 
> case holds, and not use the parallel parsing idea in those cases?

In theory, yes. In pratice, I don't want to be the one who has to answer 
to an angry user who just suffered a major drop in COPY performance 
after adding an ENUM column to his table.

I was thinking more along the line of letting a datatype specify a
function "void* ioprepare(typmod)" which returns some opaque object
specifying all that the input and output function needs to know.
We could than establish the rule that input/output functions may not 
access the catalog, and instead pass them a pointer to that opaque object.

All pretty pie-in-the-sky at the moment, though...

regards, Florian Pflug



Re: An idea for parallelizing COPY within one backend

From
"Tom Dunstan"
Date:
On Wed, Feb 27, 2008 at 9:26 PM, Florian G. Pflug <fgp@phlo.org> wrote:
>  I was thinking more along the line of letting a datatype specify a
>  function "void* ioprepare(typmod)" which returns some opaque object
>  specifying all that the input and output function needs to know.
>  We could than establish the rule that input/output functions may not
>  access the catalog, and instead pass them a pointer to that opaque object.

Callers of IO functions don't always know which type they're dealing
with - we had to go to some lengths to pass type information along
with the enum value itself so that it could be looked up in the
syscache in the output function. I think the main culprits are the
P/Ls, but I think there was a security related concern about passing
the type through to the IO function as well. If you want to do
something like this, it would certainly be possible to cache the enum
info for a particular type, but you might want to have a separate set
of io functions just for this case.

On the plus side, if such a cache were to be used by IO generally, we
could reimplement enums to just store the ordinal on disk and save a
couple of bytes, like I wanted but was unable to do the first time
around. :)

Enums are an easy case, though, as there's very little data to deal
with. I don't know about other UDTs out there - do any require more
extensive catalog access?

Cheers

Tom


Re: An idea for parallelizing COPY within one backend

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Yeah, but it wouldn't take advantage of, say, the hack to disable WAL
> when the table was created in the same transaction.

In the context of a parallelizing pg_restore this would be fairly easy
to get around.  We could probably teach the thing to combine table
creation and loading steps into one action (transaction) in most cases.
If that couldn't work because of some weird dependency chain, the
data loading transaction could be done as
BEGIN;TRUNCATE table;COPY table FROM stdin;...COMMIT;

which I believe already invokes the no-WAL optimization, and could
certainly be made to do so if not.

Obviously, pg_restore would have to be aware of whether or not it had
created that table in the current run (else it mustn't TRUNCATE),
but it would be tracking more or less exactly that info anyway to handle
dependency ordering.
        regards, tom lane


Re: An idea for parallelizing COPY within one backend

From
Andrew Dunstan
Date:

Florian G. Pflug wrote:
>
>> Would it be possible to determine when the copy is starting that this 
>> case holds, and not use the parallel parsing idea in those cases?
>
> In theory, yes. In pratice, I don't want to be the one who has to 
> answer to an angry user who just suffered a major drop in COPY 
> performance after adding an ENUM column to his table.
>
>

I am yet to be convinced that this is even theoretically a good path to 
follow. Any sufficiently large table could probably be partitioned and 
then we could use the parallelism that is being discussed for pg_restore 
without any modification to the backend at all. Similar tricks could be 
played by an external bulk loader for third party data sources.

cheers

andrew


Re: An idea for parallelizing COPY within one backend

From
Brian Hurt
Date:
Andrew Dunstan wrote:

>
>
> Florian G. Pflug wrote:
>
>>
>>> Would it be possible to determine when the copy is starting that 
>>> this case holds, and not use the parallel parsing idea in those cases?
>>
>>
>> In theory, yes. In pratice, I don't want to be the one who has to 
>> answer to an angry user who just suffered a major drop in COPY 
>> performance after adding an ENUM column to his table.
>>
>>
>
> I am yet to be convinced that this is even theoretically a good path 
> to follow. Any sufficiently large table could probably be partitioned 
> and then we could use the parallelism that is being discussed for 
> pg_restore without any modification to the backend at all. Similar 
> tricks could be played by an external bulk loader for third party data 
> sources.
>

I was just floating this as an idea- I don't know enough about the 
backend to know if it was a good idea or not, it sounds like "not".

Brian



Re: An idea for parallelizing COPY within one backend

From
"Florian G. Pflug"
Date:
Andrew Dunstan wrote:
> Florian G. Pflug wrote:
>>> Would it be possible to determine when the copy is starting that this 
>>> case holds, and not use the parallel parsing idea in those cases?
>>
>> In theory, yes. In pratice, I don't want to be the one who has to 
>> answer to an angry user who just suffered a major drop in COPY 
>> performance after adding an ENUM column to his table.
>>
> I am yet to be convinced that this is even theoretically a good path to 
> follow. Any sufficiently large table could probably be partitioned and 
> then we could use the parallelism that is being discussed for pg_restore 
> without any modification to the backend at all. Similar tricks could be 
> played by an external bulk loader for third party data sources.

That assumes that some specific bulkloader like pg_restore, pgloader
or similar is used to perform the load. Plain libpq-users would either 
need to duplicate the logic these loaders contain, or wouldn't be able 
to take advantage of fast loads.

Plus, I'd see this as a kind of testbed for gently introducing 
parallelism into postgres backends (especially thinking about sorting 
here). CPU gain more and more cores, so in the long run I fear that we 
will have to find ways to utilize more than one of those to execute a 
single query.

But of course the architectural details need to be sorted out before any 
credible judgement about the feasability of this idea can be made...

regards, Florian Pflug



Re: An idea for parallelizing COPY within one backend

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> Plus, I'd see this as a kind of testbed for gently introducing 
> parallelism into postgres backends (especially thinking about sorting 
> here).

This thinking is exactly what makes me scream loudly and run in the
other direction.  I don't want threads introduced into the backend,
whether "gently" or otherwise.  The portability and reliability hits
that we'll take are too daunting.  Threads that invoke user-defined
code (as anything involved with datatype-specific operations must)
are especially fearsome, as there is precisely 0 chance of that code
being thread-safe.
        regards, tom lane


Re: An idea for parallelizing COPY within one backend

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> Plus, I'd see this as a kind of testbed for gently introducing 
>> parallelism into postgres backends (especially thinking about sorting 
>> here).
> 
> This thinking is exactly what makes me scream loudly and run in the
> other direction.  I don't want threads introduced into the backend,
> whether "gently" or otherwise.  The portability and reliability hits
> that we'll take are too daunting.  Threads that invoke user-defined
> code (as anything involved with datatype-specific operations must)
> are especially fearsome, as there is precisely 0 chance of that code
> being thread-safe.

Exactly my thinking. That is why I was looking for a way to introduce 
parallelism *without* threading. Though it's not so much the 
user-defined code that scares me, but rather the portability issues. The 
differences between NPTL and non-NPTL threads on linux alone make me 
shudder...

Was I was saying is that there might be a chance to get some parallelism 
without threading, by executing well-defined pieces of code with 
controlled dependencies in separate processes. COPY seemed like an ideal 
testbed for that idea, since the conversion of received lines into 
tuples seemed reasonable self-contained, and with little outside 
dependencies. If the idea can't be made to work there, it probably won't 
work anywhere. If it turns out that it does (with an API change for 
input/output functions) however, then it *might* be possible to apply it 
to other relatively self-contained parts in the future...

To restate, I don't want threaded backends. Not in the foreseeable 
future at least. But I'd still love to see a single transaction using 
more than one core.

regards, Florian Pflug