Thread: parallel pg_restore
I am working on getting parallel pg_restore working. I'm currently getting all the scaffolding working, and hope to have a naive prototype posted within about a week. The major question is how to choose the restoration order so as to maximize efficiency both on the server and in reading the archive. My thoughts are currently running something like this: * when an item is completed, reduce the dependency count for each item that depends on it by 1. * when an item hasa dependency count of 0 it is available for execution, and gets moved to the head of the queue. * when a new workerspot becomes available, if there not currently a data load running then pick the first available data load, otherwise pick the first available item. This would mean that loading a table would probably be immediately followed by creation of its indexes, including PK and UNIQUE constraints, thus taking possible advantage of synchronised scans, data in file system buffers, etc. Another question is what we should do if the user supplies an explicit order with --use-list. I'm inclined to say we should stick strictly with the supplied order. Or maybe that should be an option. Thoughts and comments welcome. cheers andrew
Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > >> I am working on getting parallel pg_restore working. I'm currently >> getting all the scaffolding working, and hope to have a naive prototype >> posted within about a week. >> > > >> The major question is how to choose the restoration order so as to >> maximize efficiency both on the server and in reading the archive. >> > > One of the first software design principles I ever learned was to > separate policy from mechanism. ISTM in this first cut you ought to > concentrate on mechanism and let the policy just be something dumb > (but coded separately from the infrastructure). We can refine it after > that. > Indeed, that's exactly what I'm doing. However, given that time for the 8.4 window is short, I thought it would be sensible to get people thinking about what the policy might be, while I get on with the mechanism. > >> Another question is what we should do if the user supplies an explicit >> order with --use-list. I'm inclined to say we should stick strictly with >> the supplied order. Or maybe that should be an option. >> > > Hmm. I think --use-list is used more for selecting a subset of items > to restore than for forcing a nondefault restore order. Forcing the > order used to be a major purpose, but that was years ago before we > had the dependency-driven-restore-order code working. So I'd vote that > the default behavior is to still allow parallel restore when this option > is used, and we should provide an orthogonal option that disables use of > parallel restore. > > You'd really want the latter anyway for some cases, ie, when you don't > want the restore trying to hog the machine. Maybe the right form for > the extra option is just a limit on how many connections to use. Set it > to one to force the exact restore order, and to other values to throttle > how much of the machine the restore tries to eat. > My intention is to have single-thread restore remain the default, at least for this go round, and have the user be able to choose --multi-thread=nn to specify the number of concurrent connections to use. > One problem here though is that you'd need to be sure you behave sanely > when there is a dependency chain passing through an object that's not to > be restored. The ordering of the rest of the chain still ought to honor > the dependencies I think. > > > Right. I think we'd need to fake doing a full restore and omit actually restoring items not on the passed in list. That should be simple enough. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > I am working on getting parallel pg_restore working. I'm currently > getting all the scaffolding working, and hope to have a naive prototype > posted within about a week. > The major question is how to choose the restoration order so as to > maximize efficiency both on the server and in reading the archive. One of the first software design principles I ever learned was to separate policy from mechanism. ISTM in this first cut you ought to concentrate on mechanism and let the policy just be something dumb (but coded separately from the infrastructure). We can refine it after that. > Another question is what we should do if the user supplies an explicit > order with --use-list. I'm inclined to say we should stick strictly with > the supplied order. Or maybe that should be an option. Hmm. I think --use-list is used more for selecting a subset of items to restore than for forcing a nondefault restore order. Forcing the order used to be a major purpose, but that was years ago before we had the dependency-driven-restore-order code working. So I'd vote that the default behavior is to still allow parallel restore when this option is used, and we should provide an orthogonal option that disables use of parallel restore. You'd really want the latter anyway for some cases, ie, when you don't want the restore trying to hog the machine. Maybe the right form for the extra option is just a limit on how many connections to use. Set it to one to force the exact restore order, and to other values to throttle how much of the machine the restore tries to eat. One problem here though is that you'd need to be sure you behave sanely when there is a dependency chain passing through an object that's not to be restored. The ordering of the rest of the chain still ought to honor the dependencies I think. regards, tom lane
Le lundi 22 septembre 2008, Andrew Dunstan a écrit : > > You'd really want the latter anyway for some cases, ie, when you don't > > want the restore trying to hog the machine. Maybe the right form for > > the extra option is just a limit on how many connections to use. Set it > > to one to force the exact restore order, and to other values to throttle > > how much of the machine the restore tries to eat. > > My intention is to have single-thread restore remain the default, at > least for this go round, and have the user be able to choose > --multi-thread=nn to specify the number of concurrent connections to use. What about the make famous -j option? -j [jobs], --jobs[=jobs] Specifies the number of jobs (commands) to run simultaneously. If there is more than one -j option, the last one is effective. If the -j option is given without an argument, make will not limit the number of jobs that can run simultaneously. Regards, -- dim
On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote: > > My intention is to have single-thread restore remain the default, at > > least for this go round, and have the user be able to choose > > --multi-thread=nn to specify the number of concurrent connections to use. > > What about the make famous -j option? > > -j [jobs], --jobs[=jobs] > Specifies the number of jobs (commands) to run simultaneously. If > there is more than one -j option, the last one is effective. If > the -j option is given without an argument, make will not limit > the number of jobs that can run simultaneously. +1 -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Sun, 2008-09-21 at 18:15 -0400, Tom Lane wrote: > Andrew Dunstan <andrew@dunslane.net> writes: > > I am working on getting parallel pg_restore working. I'm currently > > getting all the scaffolding working, and hope to have a naive prototype > > posted within about a week. > > > The major question is how to choose the restoration order so as to > > maximize efficiency both on the server and in reading the archive. > > One of the first software design principles I ever learned was to > separate policy from mechanism. ISTM in this first cut you ought to > concentrate on mechanism and let the policy just be something dumb > (but coded separately from the infrastructure). We can refine it after > that. Agreed. We musn't make too many built in assumptions about the best way to parallelise the restore. For example, running all CREATE INDEX at same time may help I/O on the scan but it may also swamp memory and force additional I/O as a result. We might need a setting for total memory available, so pg_restore can try not to run tasks that will exceed that across settings. Preferably this wouldn't be just a pg_restore setting. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote: > > >>> My intention is to have single-thread restore remain the default, at >>> least for this go round, and have the user be able to choose >>> --multi-thread=nn to specify the number of concurrent connections to use. >>> >> What about the make famous -j option? >> >> -j [jobs], --jobs[=jobs] >> Specifies the number of jobs (commands) to run simultaneously. If >> there is more than one -j option, the last one is effective. If >> the -j option is given without an argument, make will not limit >> the number of jobs that can run simultaneously. >> > > +1 > > If that's the preferred name I have no problem. I'm not sure about the default argument part, though. First, I'm not sure out getopt infrastructure actually provides for optional arguments, and I am not going to remove it in pg_restore to get around such a problem, at least now. More importantly, I'm not convinced it's a good idea. It seems more like a footgun that will potentially try to launch thousands of simultaneous restore connections. I should have thought that optimal performance would be reached at some small multiple (say maybe 2?) of the number of CPUs on the server. You could achieve unlimited parallelism by saying something like --jobs=99999, but I'd rather that were done very explicitly instead of as the default value of the parameter. cheers andrew
On Mon, 2008-09-22 at 11:38 -0400, Andrew Dunstan wrote: > > Simon Riggs wrote: > > On Mon, 2008-09-22 at 09:53 +0200, Dimitri Fontaine wrote: > > > > > >>> My intention is to have single-thread restore remain the default, at > >>> least for this go round, and have the user be able to choose > >>> --multi-thread=nn to specify the number of concurrent connections to use. > >>> > >> What about the make famous -j option? > >> > >> -j [jobs], --jobs[=jobs] > >> Specifies the number of jobs (commands) to run simultaneously. If > >> there is more than one -j option, the last one is effective. If > >> the -j option is given without an argument, make will not limit > >> the number of jobs that can run simultaneously. > >> > > > > +1 > > > > > > If that's the preferred name I have no problem. I'm not sure about the > default argument part, though. > > First, I'm not sure out getopt infrastructure actually provides for > optional arguments, and I am not going to remove it in pg_restore to get > around such a problem, at least now. > > More importantly, I'm not convinced it's a good idea. It seems more like > a footgun that will potentially try to launch thousands of simultaneous > restore connections. I should have thought that optimal performance > would be reached at some small multiple (say maybe 2?) of the number of > CPUs on the server. You could achieve unlimited parallelism by saying > something like --jobs=99999, but I'd rather that were done very > explicitly instead of as the default value of the parameter. OK, sounds best. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Mon, 22 Sep 2008 17:24:28 +0100 Simon Riggs <simon@2ndQuadrant.com> wrote: > > More importantly, I'm not convinced it's a good idea. It seems more > > like a footgun that will potentially try to launch thousands of > > simultaneous restore connections. I should have thought that > > optimal performance would be reached at some small multiple (say > > maybe 2?) of the number of CPUs on the server. You could achieve > > unlimited parallelism by saying something like --jobs=99999, but > > I'd rather that were done very explicitly instead of as the default > > value of the parameter. > > OK, sounds best. > I will not argue vehemently here but I will say that "jobs" doesn't seem correct. The term "workers" seems more appropriate. Sincerely, Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Le lundi 22 septembre 2008, Joshua Drake a écrit : > I will not argue vehemently here but I will say that "jobs" doesn't > seem correct. The term "workers" seems more appropriate. Mmmm, it sounds like it depends on the implementation (and how all workers will share the same serializable transaction or just be independant jobs), but my point here is more about giving the user a name they are used to. Like in "oh, pg_restore -j, I see, thanks". Now, if your argument is that the make concept of job does not match the parallel pg_restore concept of workers, I'll simply bow to your choice: baring other "limits", English not being my natural language makes it hard for me to follow there ;) Regards, -- dim
On Mon, 2008-09-22 at 09:30 -0700, Joshua Drake wrote: > On Mon, 22 Sep 2008 17:24:28 +0100 > Simon Riggs <simon@2ndQuadrant.com> wrote: > > > > More importantly, I'm not convinced it's a good idea. It seems more > > > like a footgun that will potentially try to launch thousands of > > > simultaneous restore connections. I should have thought that > > > optimal performance would be reached at some small multiple (say > > > maybe 2?) of the number of CPUs on the server. You could achieve > > > unlimited parallelism by saying something like --jobs=99999, but > > > I'd rather that were done very explicitly instead of as the default > > > value of the parameter. > > > > OK, sounds best. > > > > I will not argue vehemently here but I will say that "jobs" doesn't > seem correct. The term "workers" seems more appropriate. Agreed, but most utilities have "j" free but not w, p, t or other letters that might be synonyms. j is at least used for exactly this purpose in other tools. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: >> I will not argue vehemently here but I will say that "jobs" doesn't >> seem correct. The term "workers" seems more appropriate. >> > > Agreed, but most utilities have "j" free but not w, p, t or other > letters that might be synonyms. > > j is at least used for exactly this purpose in other tools. > > There are in fact very few letters available, as we've been fairly profligate in our use of option letters in the pg_dump suite. j and m happen to be two of those that are available. I honestly don't have a terribly strong opinion about what it should be called. I can live with jobs or multi-threads. cheers andrew
Andrew Dunstan wrote: > There are in fact very few letters available, as we've been fairly > profligate in our use of option letters in the pg_dump suite. > > j and m happen to be two of those that are available. --max-workers Max makes sense because the number of workers won't be consistent, a worker may not have a job to do. It is also consistent with auto_vacuum_max_workers. Joshua D. Drake Sincerely, Joshua D. Drake
Joshua D. Drake wrote: >Andrew Dunstan wrote: >>There are in fact very few letters available, as we've been fairly >>profligate in our use of option letters in the pg_dump suite. >>j and m happen to be two of those that are available. >--max-workers Perhaps, but please do not use that as justification for using -m. That would be equally silly as abbreviating "number of workers" to -n. -- Sincerely, Stephen R. van den Berg. Experience is something you don't get until just after you need it.
On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote: > j and m happen to be two of those that are available. > > I honestly don't have a terribly strong opinion about what it should be > called. I can live with jobs or multi-threads. Perhaps we can use -j for jobs and -m for memory, so we can set memory available across all threads with a single total value. I can live with jobs or multi-threads also, whichever we decide. Neither one is confusing to explain. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 23 Sep 2008 09:14:33 +0200 "Stephen R. van den Berg" <srb@cuci.nl> wrote: > Joshua D. Drake wrote: > >Andrew Dunstan wrote: > >>There are in fact very few letters available, as we've been fairly > >>profligate in our use of option letters in the pg_dump suite. > > >>j and m happen to be two of those that are available. > > >--max-workers > > Perhaps, but please do not use that as justification for using -m. > That would be equally silly as abbreviating "number of workers" to -n. Actually I came up with it because it coincides with existing terminology. Autovacuum has the concept of max_workers. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
On Tue, 23 Sep 2008 08:44:19 +0100 Simon Riggs <simon@2ndQuadrant.com> wrote: > > On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote: > > > j and m happen to be two of those that are available. > > > > I honestly don't have a terribly strong opinion about what it > > should be called. I can live with jobs or multi-threads. > > Perhaps we can use -j for jobs and -m for memory, so we can set memory > available across all threads with a single total value. > > I can live with jobs or multi-threads also, whichever we decide. > Neither one is confusing to explain. > Memory? Where did that come from. Andrew is that in your spec? Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote: > On Tue, 23 Sep 2008 08:44:19 +0100 > Simon Riggs <simon@2ndQuadrant.com> wrote: > > > > > On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote: > > > > > j and m happen to be two of those that are available. > > > > > > I honestly don't have a terribly strong opinion about what it > > > should be called. I can live with jobs or multi-threads. > > > > Perhaps we can use -j for jobs and -m for memory, so we can set memory > > available across all threads with a single total value. > > > > I can live with jobs or multi-threads also, whichever we decide. > > Neither one is confusing to explain. > > > > Memory? Where did that come from. Andrew is that in your spec? No, but it's in mine. As I said upthread, no point in making it more parallel than memory allows. Different operations need more/less memory than others, so we must think about that also. We can quickly work out how big a table is, so we can work out how much memory it will need to perform sorts for index builds and thus how many parallel builds can sensibly take place. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Simon Riggs wrote: > On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote: > >> On Tue, 23 Sep 2008 08:44:19 +0100 >> Simon Riggs <simon@2ndQuadrant.com> wrote: >> >> >>> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote: >>> >>> >>>> j and m happen to be two of those that are available. >>>> >>>> I honestly don't have a terribly strong opinion about what it >>>> should be called. I can live with jobs or multi-threads. >>>> >>> Perhaps we can use -j for jobs and -m for memory, so we can set memory >>> available across all threads with a single total value. >>> >>> I can live with jobs or multi-threads also, whichever we decide. >>> Neither one is confusing to explain. >>> >>> >> Memory? Where did that come from. Andrew is that in your spec? >> > > No, but it's in mine. As I said upthread, no point in making it more > parallel than memory allows. Different operations need more/less memory > than others, so we must think about that also. We can quickly work out > how big a table is, so we can work out how much memory it will need to > perform sorts for index builds and thus how many parallel builds can > sensibly take place. > > If that ever happens it will certainly not be in this go round. In fact, we have some anecdotal evidence that the point of dimishing returns is not reached until a fairly high degree of parallelism is used (Joshua's and my client has been using 24 threads, I believe). In any case, my agenda goes something like this: * get it working with a basic selection algorithm on Unix (nearly done - keep your eyes open for a patch soon) *start testing * get it working on Windows * improve the selection algorithm * harden code If we get all that done by November we'll have done well. And we know that in some cases just this much can lead to reductions in restore time of the order of 80%. cheers andrew
On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote: > If we get all that done by November we'll have done well. And we know > that in some cases just this much can lead to reductions in restore > time > of the order of 80%. Agreed. Go for it. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
On Tue, 23 Sep 2008 16:50:43 -0400 Andrew Dunstan <andrew@dunslane.net> wrote: > > > Simon Riggs wrote: > > On Tue, 2008-09-23 at 12:43 -0700, Joshua Drake wrote: > > > >> On Tue, 23 Sep 2008 08:44:19 +0100 > >> Simon Riggs <simon@2ndQuadrant.com> wrote: > >> > >> > >>> On Mon, 2008-09-22 at 15:05 -0400, Andrew Dunstan wrote: > >>> > >>> > >>>> j and m happen to be two of those that are available. > >>>> > >>>> I honestly don't have a terribly strong opinion about what it > >>>> should be called. I can live with jobs or multi-threads. > >>>> > >>> Perhaps we can use -j for jobs and -m for memory, so we can set > >>> memory available across all threads with a single total value. > >>> > >>> I can live with jobs or multi-threads also, whichever we decide. > >>> Neither one is confusing to explain. > >>> > >>> > >> Memory? Where did that come from. Andrew is that in your spec? > >> > > > > No, but it's in mine. As I said upthread, no point in making it more > > parallel than memory allows. Different operations need more/less > > memory than others, so we must think about that also. We can > > quickly work out how big a table is, so we can work out how much > > memory it will need to perform sorts for index builds and thus how > > many parallel builds can sensibly take place. > > > > > > If that ever happens it will certainly not be in this go round. > > In fact, we have some anecdotal evidence that the point of dimishing > returns is not reached until a fairly high degree of parallelism is > used (Joshua's and my client has been using 24 threads, I believe). Against 8 cores but yes. Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
Simon Riggs wrote: > On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote: > >> If we get all that done by November we'll have done well. And we know >> that in some cases just this much can lead to reductions in restore >> time >> of the order of 80%. > > Agreed. Go for it. > Just as an FYI, by far the number one bottle neck on the multiple work restores I was doing was CPU. RAM and IO were never the problem. Sincerely, Joshua D. Drake
On Tue, 2008-09-23 at 22:17 -0700, Joshua D. Drake wrote: > Simon Riggs wrote: > > On Tue, 2008-09-23 at 16:50 -0400, Andrew Dunstan wrote: > > > >> If we get all that done by November we'll have done well. And we know > >> that in some cases just this much can lead to reductions in restore > >> time > >> of the order of 80%. > > > > Agreed. Go for it. > Just as an FYI, by far the number one bottle neck on the multiple work > restores I was doing was CPU. RAM and IO were never the problem. It would be useful to see a full breakdown of those results. There's always a bottleneck on something for any particular task and we shouldn't presume the problem is only on CPU, for all data on all systems. CPU parallelism is the most pressing problem, I agree, but I think we will quickly hit problems without memory limits. But I agree with Andrew that this will be a nice problem to have and not everything is possible by Nov 1. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
Hi, Le mardi 23 septembre 2008, Andrew Dunstan a écrit : > In any case, my agenda goes something like this: > > * get it working with a basic selection algorithm on Unix (nearly > done - keep your eyes open for a patch soon) > * start testing > * get it working on Windows > * improve the selection algorithm > * harden code I'm not sure whether your work will feature single table restore splitting, but if it's the case, you could consider having a look at what I've done in pgloader. The parallel loading work there was asked for by Simon Riggs and Greg Smith and you could test two different parallel algorithms. The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose what to implement in pg_restore, so I still hope it'll get usefull someday :) Regards, -- dim
Dimitri Fontaine wrote: > Hi, > > Le mardi 23 septembre 2008, Andrew Dunstan a écrit : > >> In any case, my agenda goes something like this: >> >> * get it working with a basic selection algorithm on Unix (nearly >> done - keep your eyes open for a patch soon) >> * start testing >> * get it working on Windows >> * improve the selection algorithm >> * harden code >> > > I'm not sure whether your work will feature single table restore splitting, > but if it's the case, you could consider having a look at what I've done in > pgloader. The parallel loading work there was asked for by Simon Riggs and > Greg Smith and you could test two different parallel algorithms. > The aim was to have a "simple" testbed allowing PostgreSQL hackers to choose > what to implement in pg_restore, so I still hope it'll get usefull someday :) > > > No. The proposal will perform exactly the same set of steps as single-threaded pg_restore, but in parallel. The individual steps won't be broken up. Quite apart from anything else, parallel data loading of individual tables will defeat clustering, as well as making it impossible to avoid WAL logging of the load (which I have made provision for). The fact that custom archives are compressed by default would in fact make parallel loading of individual tables' data difficult with the present format. We'd have to do something like expanding it on the client (which might not even have enough disk space) and then split it before loading it to the server. That's pretty yucky. Alternatively, each loader thread would need to start decompressing the data from the start and thow away data until it got to the point it wanted to start restoring from. Also pretty yucky. Far better would be to provide for multiple data members in the archive and teach pg_dump to split large tables as it writes the archive. Then pg_restore would need comparatively little adjustment. Also, of course, you can split tables yourself by partitioning them. That would buy you parallel data load with what I am doing now, with no extra work. In any case, data loading is very far from being the only problem. One of my clients has long running restores where the data load takes about 20% or so of the time - the rest is in index creation and the like. No amount of table splitting will make a huge difference to them, but parallel processing will. As against that, if your problem is in loading one huge table, this won't help you much. However, this is not a pattern I see much - most of my clients seem to have several large tables plus a boatload of indexes. They will benefit a lot. cheers andrew
On Wed, 24 Sep 2008 07:52:52 +0100 Simon Riggs <simon@2ndQuadrant.com> wrote: > > Just as an FYI, by far the number one bottle neck on the multiple > > work restores I was doing was CPU. RAM and IO were never the > > problem. > > It would be useful to see a full breakdown of those results. Its in the archives. We had a ginormous discussion about it about 6 months back (-hackers). Joshua D. Drake -- The PostgreSQL Company since 1997: http://www.commandprompt.com/ PostgreSQL Community Conference: http://www.postgresqlconference.org/ United States PostgreSQL Association: http://www.postgresql.us/
Le mercredi 24 septembre 2008, Andrew Dunstan a écrit : > No. The proposal will perform exactly the same set of steps as > single-threaded pg_restore, but in parallel. The individual steps won't > be broken up. Ok, good for a solid trustworthy parallelism restore. Which is exactly what we want. Just out of curiosity, do you plan to use Postgres-R helper backends infrastructure? > Quite apart from anything else, parallel data loading of individual > tables will defeat clustering, as well as making it impossible to avoid > WAL logging of the load (which I have made provision for). Depends whether the different workers are able to work from the same transaction or not, I imagine. Some work has been done to allow multiple backends to be working in the exact same transaction (Simon's snapclone and Postgres-R helper backend infrastructure), so one of them could TRUNCATE the table and give a go signal to workers to fill the table. In the same transaction. Ok, I need to wake up now... :) > The fact that custom archives are compressed by default would in fact > make parallel loading of individual tables' data difficult with the > present format. We'd have to do something like expanding it on the > client (which might not even have enough disk space) and then split it > before loading it to the server. That's pretty yucky. Alternatively, > each loader thread would need to start decompressing the data from the > start and thow away data until it got to the point it wanted to start > restoring from. Also pretty yucky. Another alternative is the round-robin reader implemented in pgloader, where all the archive reading is done by a single worker, which then split what it read to any number of coworkers, filling next queue(s) while previous one(s) are busy COPYing to the server. > Far better would be to provide for multiple data members in the archive > and teach pg_dump to split large tables as it writes the archive. Then > pg_restore would need comparatively little adjustment. Well, that's another possibility, but I tend to prefer having the parallelism mecanics into the restore side of things. It may be only an illusion, but this way I have far more trust into my backups. > Also, of course, you can split tables yourself by partitioning them. > That would buy you parallel data load with what I am doing now, with no > extra work. And that's excellent :) > In any case, data loading is very far from being the only problem. One > of my clients has long running restores where the data load takes about > 20% or so of the time - the rest is in index creation and the like. No > amount of table splitting will make a huge difference to them, but > parallel processing will. Oh yes, I'm running into this too (not on the same level but still). Parallel seqscan should help creating indexes in parallel without having the disks going crazy for read - write - read - write etc sequences, and posix advices should help a lot here too. Does the dependancy tracker in pg_restore allows to consider FK creation are dependant on matching PK being already there? > As against that, if your problem is in loading > one huge table, this won't help you much. However, this is not a pattern > I see much - most of my clients seem to have several large tables plus a > boatload of indexes. They will benefit a lot. The use case given by Greg Smith at the time was loading a multi terabyte table on a raid array with a lot of spindles. It then become impossible for a single CPU to take full profit of the available write bandwith. No idea how common this situation is in the field, though. Regards, -- dim
Dimitri Fontaine wrote: > Le mercredi 24 septembre 2008, Andrew Dunstan a écrit : > >> No. The proposal will perform exactly the same set of steps as >> single-threaded pg_restore, but in parallel. The individual steps won't >> be broken up. >> > > Ok, good for a solid trustworthy parallelism restore. Which is exactly what we > want. Just out of curiosity, do you plan to use Postgres-R helper backends > infrastructure? > The is purely a patch to pg_restore. No backend changes at all (and if I did it would not use anything that isn't in core anyway). > >> Quite apart from anything else, parallel data loading of individual >> tables will defeat clustering, as well as making it impossible to avoid >> WAL logging of the load (which I have made provision for). >> > > Depends whether the different workers are able to work from the same > transaction or not, I imagine. Some work has been done to allow multiple > backends to be working in the exact same transaction (Simon's snapclone and > Postgres-R helper backend infrastructure), so one of them could TRUNCATE the > table and give a go signal to workers to fill the table. In the same > transaction. > Ok, I need to wake up now... :) > > Again, I am not doing anything on the backend. I am following Tom's original suggestion of simply having pg_restore run steps in parallel, with no backend changes. Also, you ignored the point about clustered data. Maybe that doesn't matter to some people, but it does to others. This is designed to provide the same result as a single threaded pg_restore. Splitting data will break that. >> The fact that custom archives are compressed by default would in fact >> make parallel loading of individual tables' data difficult with the >> present format. We'd have to do something like expanding it on the >> client (which might not even have enough disk space) and then split it >> before loading it to the server. That's pretty yucky. Alternatively, >> each loader thread would need to start decompressing the data from the >> start and thow away data until it got to the point it wanted to start >> restoring from. Also pretty yucky. >> > > Another alternative is the round-robin reader implemented in pgloader, where > all the archive reading is done by a single worker, which then split what it > read to any number of coworkers, filling next queue(s) while previous one(s) > are busy COPYing to the server. > > >> Far better would be to provide for multiple data members in the archive >> and teach pg_dump to split large tables as it writes the archive. Then >> pg_restore would need comparatively little adjustment. >> > > Well, that's another possibility, but I tend to prefer having the parallelism > mecanics into the restore side of things. It may be only an illusion, but > this way I have far more trust into my backups. > Having pg_dump do the split would mean you get it for free, pretty much. Rejecting that for a solution that could well be a bottleneck at restore time would require lots more than just a feeling. I don't see how it would give you any less reason to trust your backups. > >> Also, of course, you can split tables yourself by partitioning them. >> That would buy you parallel data load with what I am doing now, with no >> extra work. >> > > And that's excellent :) > > >> In any case, data loading is very far from being the only problem. One >> of my clients has long running restores where the data load takes about >> 20% or so of the time - the rest is in index creation and the like. No >> amount of table splitting will make a huge difference to them, but >> parallel processing will. >> > > Oh yes, I'm running into this too (not on the same level but still). > Parallel seqscan should help creating indexes in parallel without having the > disks going crazy for read - write - read - write etc sequences, and posix > advices should help a lot here too. > Does the dependancy tracker in pg_restore allows to consider FK creation are > dependant on matching PK being already there? > I believe so. If not, that's a bug and we should fix it IMNSHO. > >> As against that, if your problem is in loading >> one huge table, this won't help you much. However, this is not a pattern >> I see much - most of my clients seem to have several large tables plus a >> boatload of indexes. They will benefit a lot. >> > > The use case given by Greg Smith at the time was loading a multi terabyte > table on a raid array with a lot of spindles. It then become impossible for a > single CPU to take full profit of the available write bandwith. No idea how > common this situation is in the field, though. > > > I still think the multiple data members of the archive approach would be best here. One that allowed you to tell pg_dump to split every nn rows, or every nn megabytes. Quite apart from any parallelism issues, that could help enormously when there is a data problem as happens from time to time, and can get quite annoying if it's in the middle of a humungous data load. cheers andrew
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Le 24 sept. 08 à 18:56, Andrew Dunstan a écrit : > The is purely a patch to pg_restore. No backend changes at all (and > if I did it would not use anything that isn't in core anyway). Ok, good. I'm eager to see what -core hackers will want to do with Postgres-R patches, but that shouldn't be a reason to distract them, sorry... > Also, you ignored the point about clustered data. Maybe that doesn't > matter to some people, but it does to others. This is designed to > provide the same result as a single threaded pg_restore. Splitting > data will break that. I'm not sure I understand what you mean by "clustered data" here, in fact... > Having pg_dump do the split would mean you get it for free, pretty > much. Rejecting that for a solution that could well be a bottleneck > at restore time would require lots more than just a feeling. I don't > see how it would give you any less reason to trust your backups. Well, when pg_restore's COPY fail, the table is not loaded and you get an ERROR, and if you're running with the -1 option, the restore stops here and you get a nice ROLLBACK. With this later option, even if pg_dump did split your tables, the ROLLBACK still happens. Now, what happens when only one part of the data cannot be restored but you didn't pg_restore -1. I guess you're simply left with a partially restored table. How will you know which part contains the error? How will you replay the restoring of this part only? It the answer is to play with the restore catalogue, ok, if that's not it, I'm feeling the dumps are now less trustworthy with the split option than they were before. Of course all this remains hypothetical as your work is not including such a feature, which as we see is yet to be designed. > I still think the multiple data members of the archive approach > would be best here. One that allowed you to tell pg_dump to split > every nn rows, or every nn megabytes. Quite apart from any > parallelism issues, that could help enormously when there is a data > problem as happens from time to time, and can get quite annoying if > it's in the middle of a humungous data load. Agreed, but it depends a lot on the ways to control the part that failed, IMVHO. And I think we'd prefer to have a version of COPY FROM with the capability to continue loading on failure... Regards, - -- dim -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iEYEARECAAYFAkjakLQACgkQlBXRlnbh1bm4jgCg0WenIOsaHwD9GDpI6C2mhVYB pdwAoJYesvDYByQbSxqMjIEZOR9KiVXu =AVy3 -----END PGP SIGNATURE-----