Thread: An idea for parallelizing COPY within one backend
As far as I can see the main difficulty in making COPY run faster (on the server) is that pretty involved conversion from plain-text lines into tuples. Trying to get rid of this conversion by having the client send something that resembles the data stored in on-disk tuples is not a good answer, either, because it ties the client too closely to backend-version specific implementation details. But those problems only arise if the *client* needs to deal with the binary format. What I envision is parallelizing that conversion step on the server, controlled by a backend process, kind of like a filter between the server and the client. Upon reception of a COPY INTO command, a backend would .) Retrieve all catalog information required to convert a plain-text line into a tuple .) Fork off a "dealer" and N "worker" processes that take over the client connection. The "dealer" distributes lines received from the client to the N workes, while the original backend receives them as tuples back from the workers. Neither the "dealer", nor the "workers" would need access to the either the shared memory or the disk, thereby not messing with the "one backend is one transaction is one session" dogma. Now I'm eagerly waiting to hear all the reasons why this idea is broken as hell ;-) regards, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > ... > Neither the "dealer", nor the "workers" would need access to the either > the shared memory or the disk, thereby not messing with the "one backend > is one transaction is one session" dogma. > ... Unfortunately, this idea has far too narrow a view of what a datatype input function might do. Just for starters, consider "enum" input, which certainly requires catalog access. We have also explicitly acknowledged the idea that datatype I/O functions might try to store typmod-related data in some special catalog somewhere. regards, tom lane
Hi, Le mercredi 27 février 2008, Florian G. Pflug a écrit : > Upon reception of a COPY INTO command, a backend would > .) Fork off a "dealer" and N "worker" processes that take over the > client connection. The "dealer" distributes lines received from the > client to the N workes, while the original backend receives them > as tuples back from the workers. This looks so much like what pgloader does now (version 2.3.0~dev2, release candidate) at the client side, when configured for it, that I can't help answering the mail :)http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading section_threads = N split_file_reading= False Of course, the backends still have to parse the input given by pgloader, which only pre-processes data. I'm not sure having the client prepare the data some more (binary format or whatever) is a wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm all wrong, so I'm all ears! Regards, -- dim
On Wed, 2008-02-27 at 09:09 +0100, Dimitri Fontaine wrote: > Hi, > > Le mercredi 27 février 2008, Florian G. Pflug a écrit : > > Upon reception of a COPY INTO command, a backend would > > .) Fork off a "dealer" and N "worker" processes that take over the > > client connection. The "dealer" distributes lines received from the > > client to the N workes, while the original backend receives them > > as tuples back from the workers. > > This looks so much like what pgloader does now (version 2.3.0~dev2, release > candidate) at the client side, when configured for it, that I can't help > answering the mail :) > http://pgloader.projects.postgresql.org/dev/pgloader.1.html#_parallel_loading > section_threads = N > split_file_reading = False > > Of course, the backends still have to parse the input given by pgloader, which > only pre-processes data. I'm not sure having the client prepare the data some > more (binary format or whatever) is a wise idea, as you mentionned and wrt > Tom's follow-up. But maybe I'm all wrong, so I'm all ears! ISTM the external parallelization approach is more likely to help us avoid bottlenecks, so I support Dimitri's approach. We also need error handling which pgloader also has. Writing error handling and parallelization into COPY isn't going to be easy, and not very justifiable either if we already have both. There might be a reason to re-write it in C one day, but that will be fairly easy task if we ever need to do it. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> ... >> Neither the "dealer", nor the "workers" would need access to the either >> the shared memory or the disk, thereby not messing with the "one backend >> is one transaction is one session" dogma. >> ... > > Unfortunately, this idea has far too narrow a view of what a datatype > input function might do. Just for starters, consider "enum" input, > which certainly requires catalog access. We have also explicitly > acknowledged the idea that datatype I/O functions might try to store > typmod-related data in some special catalog somewhere. Hm... how many in-core datatypes are there which need catalog access in their input or output functions? Maybe we could change the API for i/o functions in a way that allows us to request all needed information to be cached? regards, Florian Pflug
Dimitri Fontaine wrote: > Of course, the backends still have to parse the input given by pgloader, which > only pre-processes data. I'm not sure having the client prepare the data some > more (binary format or whatever) is a wise idea, as you mentionned and wrt > Tom's follow-up. But maybe I'm all wrong, so I'm all ears! As far as I understand, pgloader starts N threads or processes that open up N individual connections to the server. In that case, moving then text->binary conversion from the backend into the loader won't give any additional performace I'd say. The reason that I'd love some within-one-backend solution is that I'd allow you to utilize more than one CPU for a restore within a *single* transaction. This is something that a client-side solution won't be able to deliver, unless major changes to the architecture of postgres happen first... regards, Florian Pflug
On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote: > Dimitri Fontaine wrote: >> Of course, the backends still have to parse the input given by >> pgloader, which only pre-processes data. I'm not sure having the >> client prepare the data some more (binary format or whatever) is a >> wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm >> all wrong, so I'm all ears! > > As far as I understand, pgloader starts N threads or processes that > open up N individual connections to the server. In that case, moving > then text->binary conversion from the backend into the loader won't > give any > additional performace I'd say. > > The reason that I'd love some within-one-backend solution is that > I'd allow you to utilize more than one CPU for a restore within a > *single* transaction. This is something that a client-side solution > won't be able to deliver, unless major changes to the architecture > of postgres happen first... It seems like multiple backends should be able to take advantage of 2PC for transaction safety. Cheers, M
Tom Lane wrote:<br /><blockquote cite="mid13568.1204087614@sss.pgh.pa.us" type="cite"><pre wrap="">"Florian G. Pflug" <aclass="moz-txt-link-rfc2396E" href="mailto:fgp@phlo.org"><fgp@phlo.org></a> writes: </pre><blockquote type="cite"><prewrap="">... Neither the "dealer", nor the "workers" would need access to the either the shared memory or the disk, thereby not messing with the "one backend is one transaction is one session" dogma. ... </pre></blockquote><pre wrap=""> Unfortunately, this idea has far too narrow a view of what a datatype input function might do. Just for starters, consider "enum" input, which certainly requires catalog access. We have also explicitly acknowledged the idea that datatype I/O functions might try to store typmod-related data in some special catalog somewhere. regards, tom lane </pre></blockquote> Would it be possible to determine when the copy is starting that this case holds, and not use the parallelparsing idea in those cases?<br /><br /> I'm a big user of copy, generally into very simple tables- few indexes,simple column types (numeric, varchar, and int almost exclusively), no fancy features. A parallel copy input inthe "simple" cases would be of great advantage to me, even if it doesn't parallelize "complicated" cases.<br /><br /> Brian<br/><br />
A.M. wrote: > > On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote: >> The reason that I'd love some within-one-backend solution is that I'd >> allow you to utilize more than one CPU for a restore within a *single* >> transaction. This is something that a client-side solution won't be >> able to deliver, unless major changes to the architecture of postgres >> happen first... > > It seems like multiple backends should be able to take advantage of 2PC > for transaction safety. Yeah, but it wouldn't take advantage of, say, the hack to disable WAL when the table was created in the same transaction. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
A.M. wrote: > > On Feb 27, 2008, at 9:11 AM, Florian G. Pflug wrote: > >> Dimitri Fontaine wrote: >>> Of course, the backends still have to parse the input given by >>> pgloader, which only pre-processes data. I'm not sure having the >>> client prepare the data some more (binary format or whatever) is a >>> wise idea, as you mentionned and wrt Tom's follow-up. But maybe I'm >>> all wrong, so I'm all ears! >> >> As far as I understand, pgloader starts N threads or processes that >> open up N individual connections to the server. In that case, moving >> then text->binary conversion from the backend into the loader won't >> give any >> additional performace I'd say. >> >> The reason that I'd love some within-one-backend solution is that I'd >> allow you to utilize more than one CPU for a restore within a *single* >> transaction. This is something that a client-side solution won't be >> able to deliver, unless major changes to the architecture of postgres >> happen first... > > It seems like multiple backends should be able to take advantage of 2PC > for transaction safety. Yes, whatever is coordinating the multiple backends (a master backend? i haven't followed this thread closely) would then have to have logic to finish the prepared transactions if you crash after you've committed one but not all of them. IOW, it would need a mini transaction log of its own. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Brian Hurt wrote: > Tom Lane wrote: >> "Florian G. Pflug" <fgp@phlo.org> writes: >>> ... >>> Neither the "dealer", nor the "workers" would need access to the either >>> the shared memory or the disk, thereby not messing with the "one backend >>> is one transaction is one session" dogma. >>> ... >> Unfortunately, this idea has far too narrow a view of what a datatype >> input function might do. Just for starters, consider "enum" input, >> which certainly requires catalog access. We have also explicitly >> acknowledged the idea that datatype I/O functions might try to store >> typmod-related data in some special catalog somewhere. > Would it be possible to determine when the copy is starting that this > case holds, and not use the parallel parsing idea in those cases? In theory, yes. In pratice, I don't want to be the one who has to answer to an angry user who just suffered a major drop in COPY performance after adding an ENUM column to his table. I was thinking more along the line of letting a datatype specify a function "void* ioprepare(typmod)" which returns some opaque object specifying all that the input and output function needs to know. We could than establish the rule that input/output functions may not access the catalog, and instead pass them a pointer to that opaque object. All pretty pie-in-the-sky at the moment, though... regards, Florian Pflug
On Wed, Feb 27, 2008 at 9:26 PM, Florian G. Pflug <fgp@phlo.org> wrote: > I was thinking more along the line of letting a datatype specify a > function "void* ioprepare(typmod)" which returns some opaque object > specifying all that the input and output function needs to know. > We could than establish the rule that input/output functions may not > access the catalog, and instead pass them a pointer to that opaque object. Callers of IO functions don't always know which type they're dealing with - we had to go to some lengths to pass type information along with the enum value itself so that it could be looked up in the syscache in the output function. I think the main culprits are the P/Ls, but I think there was a security related concern about passing the type through to the IO function as well. If you want to do something like this, it would certainly be possible to cache the enum info for a particular type, but you might want to have a separate set of io functions just for this case. On the plus side, if such a cache were to be used by IO generally, we could reimplement enums to just store the ordinal on disk and save a couple of bytes, like I wanted but was unable to do the first time around. :) Enums are an easy case, though, as there's very little data to deal with. I don't know about other UDTs out there - do any require more extensive catalog access? Cheers Tom
Alvaro Herrera <alvherre@commandprompt.com> writes: > Yeah, but it wouldn't take advantage of, say, the hack to disable WAL > when the table was created in the same transaction. In the context of a parallelizing pg_restore this would be fairly easy to get around. We could probably teach the thing to combine table creation and loading steps into one action (transaction) in most cases. If that couldn't work because of some weird dependency chain, the data loading transaction could be done as BEGIN;TRUNCATE table;COPY table FROM stdin;...COMMIT; which I believe already invokes the no-WAL optimization, and could certainly be made to do so if not. Obviously, pg_restore would have to be aware of whether or not it had created that table in the current run (else it mustn't TRUNCATE), but it would be tracking more or less exactly that info anyway to handle dependency ordering. regards, tom lane
Florian G. Pflug wrote: > >> Would it be possible to determine when the copy is starting that this >> case holds, and not use the parallel parsing idea in those cases? > > In theory, yes. In pratice, I don't want to be the one who has to > answer to an angry user who just suffered a major drop in COPY > performance after adding an ENUM column to his table. > > I am yet to be convinced that this is even theoretically a good path to follow. Any sufficiently large table could probably be partitioned and then we could use the parallelism that is being discussed for pg_restore without any modification to the backend at all. Similar tricks could be played by an external bulk loader for third party data sources. cheers andrew
Andrew Dunstan wrote: > > > Florian G. Pflug wrote: > >> >>> Would it be possible to determine when the copy is starting that >>> this case holds, and not use the parallel parsing idea in those cases? >> >> >> In theory, yes. In pratice, I don't want to be the one who has to >> answer to an angry user who just suffered a major drop in COPY >> performance after adding an ENUM column to his table. >> >> > > I am yet to be convinced that this is even theoretically a good path > to follow. Any sufficiently large table could probably be partitioned > and then we could use the parallelism that is being discussed for > pg_restore without any modification to the backend at all. Similar > tricks could be played by an external bulk loader for third party data > sources. > I was just floating this as an idea- I don't know enough about the backend to know if it was a good idea or not, it sounds like "not". Brian
Andrew Dunstan wrote: > Florian G. Pflug wrote: >>> Would it be possible to determine when the copy is starting that this >>> case holds, and not use the parallel parsing idea in those cases? >> >> In theory, yes. In pratice, I don't want to be the one who has to >> answer to an angry user who just suffered a major drop in COPY >> performance after adding an ENUM column to his table. >> > I am yet to be convinced that this is even theoretically a good path to > follow. Any sufficiently large table could probably be partitioned and > then we could use the parallelism that is being discussed for pg_restore > without any modification to the backend at all. Similar tricks could be > played by an external bulk loader for third party data sources. That assumes that some specific bulkloader like pg_restore, pgloader or similar is used to perform the load. Plain libpq-users would either need to duplicate the logic these loaders contain, or wouldn't be able to take advantage of fast loads. Plus, I'd see this as a kind of testbed for gently introducing parallelism into postgres backends (especially thinking about sorting here). CPU gain more and more cores, so in the long run I fear that we will have to find ways to utilize more than one of those to execute a single query. But of course the architectural details need to be sorted out before any credible judgement about the feasability of this idea can be made... regards, Florian Pflug
"Florian G. Pflug" <fgp@phlo.org> writes: > Plus, I'd see this as a kind of testbed for gently introducing > parallelism into postgres backends (especially thinking about sorting > here). This thinking is exactly what makes me scream loudly and run in the other direction. I don't want threads introduced into the backend, whether "gently" or otherwise. The portability and reliability hits that we'll take are too daunting. Threads that invoke user-defined code (as anything involved with datatype-specific operations must) are especially fearsome, as there is precisely 0 chance of that code being thread-safe. regards, tom lane
Tom Lane wrote: > "Florian G. Pflug" <fgp@phlo.org> writes: >> Plus, I'd see this as a kind of testbed for gently introducing >> parallelism into postgres backends (especially thinking about sorting >> here). > > This thinking is exactly what makes me scream loudly and run in the > other direction. I don't want threads introduced into the backend, > whether "gently" or otherwise. The portability and reliability hits > that we'll take are too daunting. Threads that invoke user-defined > code (as anything involved with datatype-specific operations must) > are especially fearsome, as there is precisely 0 chance of that code > being thread-safe. Exactly my thinking. That is why I was looking for a way to introduce parallelism *without* threading. Though it's not so much the user-defined code that scares me, but rather the portability issues. The differences between NPTL and non-NPTL threads on linux alone make me shudder... Was I was saying is that there might be a chance to get some parallelism without threading, by executing well-defined pieces of code with controlled dependencies in separate processes. COPY seemed like an ideal testbed for that idea, since the conversion of received lines into tuples seemed reasonable self-contained, and with little outside dependencies. If the idea can't be made to work there, it probably won't work anywhere. If it turns out that it does (with an API change for input/output functions) however, then it *might* be possible to apply it to other relatively self-contained parts in the future... To restate, I don't want threaded backends. Not in the foreseeable future at least. But I'd still love to see a single transaction using more than one core. regards, Florian Pflug