Re: patch for parallel pg_dump - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: patch for parallel pg_dump |
Date | |
Msg-id | CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tFZVqAwKEqgQ@mail.gmail.com Whole thread Raw |
In response to | Re: patch for parallel pg_dump ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
List | pgsql-hackers |
On Tue, Mar 13, 2012 at 9:59 AM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> (I'm also unconvinced that sorting by relation size is a good idea >> anyway. Anything that makes the dump order less predictable gets >> push-back, IME.) > > Given that people often use diff on files from pg_dump, > unpredictable ordering can be a bad thing. On the other hand, that > is not something you would probably want to do with the output of a > *parallel* dump, so if it only affect that, it probably makes sense. > It seems like a reasonable heuristic to avoid having all but some > big table done, and having to wait for that while the other > processors are sitting idle. Yeah, I think it's a good heuristic. Finishing the dump in the minimum possible time is sufficiently similar to the knapsack problem as to make me suspect that there is no easy way to be certain of getting the optimal dump order (and we don't even have perfect information, since we know little about the characteristics of the underlying storage). But dumping tables in descending order figures to get many easy cases right - e.g. suppose there are 200 tables of size X and 1 table of size 100*X, and we have 3 workers to play with. If we dump the tables in an essentially random order (relative to size) then the overall time will get longer the more little tables we dump before we start the big one. Now, if we have tables of sizes 10*X, 9*X, 8*X, 6*X, and 5*X and two workers, then the first worker will get the 10*X table, the second worker will get the 9*X table, then the second worker will start the 8*X table, then the first worker will get the 6*X and 5*X tables and, assuming dump time is a uniform function of table size, we'll finish after 21 time units. Had we been smarter, we could have assigned the 9*X, 6*X, and 5*X tables to one worker and the 10*X and 8*X tables to the other and finished in just 20 time units. There's probably a way to construct a more extreme example of this, but I think in practice if there's any loss due to this kind of effect it will be small, and descending-size order certainly seems more likely to be right than leading it to chance. A bigger problem is dumping relations A and B at the same time might involve a lot more I/O contention than dumping relations A and C at the same time if, say, A and B are on the same tablespace and C is not. I have no idea what to do about that in general, but for a first version of this feature I think it's fine to punt. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: