Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch for parallel pg_dump
Date
Msg-id CA+TgmobwxqsagXKtyQ1S8+gMpqxF_MLXv=4350tFZVqAwKEqgQ@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
On Tue, Mar 13, 2012 at 9:59 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> (I'm also unconvinced that sorting by relation size is a good idea
>> anyway.  Anything that makes the dump order less predictable gets
>> push-back, IME.)
>
> Given that people often use diff on files from pg_dump,
> unpredictable ordering can be a bad thing.  On the other hand, that
> is not something you would probably want to do with the output of a
> *parallel* dump, so if it only affect that, it probably makes sense.
> It seems like a reasonable heuristic to avoid having all but some
> big table done, and having to wait for that while the other
> processors are sitting idle.

Yeah, I think it's a good heuristic.  Finishing the dump in the
minimum possible time is sufficiently similar to the knapsack problem
as to make me suspect that there is no easy way to be certain of
getting the optimal dump order (and we don't even have perfect
information, since we know little about the characteristics of the
underlying storage).  But dumping tables in descending order figures
to get many easy cases right - e.g. suppose there are 200 tables of
size X and 1 table of size 100*X, and we have 3 workers to play with.
If we dump the tables in an essentially random order (relative to
size) then the overall time will get longer the more little tables we
dump before we start the big one.

Now, if we have tables of sizes 10*X, 9*X, 8*X, 6*X, and 5*X and two
workers, then the first worker will get the 10*X table, the second
worker will get the 9*X table, then the second worker will start the
8*X table, then the first worker will get the 6*X and 5*X tables and,
assuming dump time is a uniform function of table size, we'll finish
after 21 time units.  Had we been smarter, we could have assigned the
9*X, 6*X, and 5*X tables to one worker and the 10*X and 8*X tables to
the other and finished in just 20 time units.  There's probably a way
to construct a more extreme example of this, but I think in practice
if there's any loss due to this kind of effect it will be small, and
descending-size order certainly seems more likely to be right than
leading it to chance.

A bigger problem is dumping relations A and B at the same time might
involve a lot more I/O contention than dumping relations A and C at
the same time if, say, A and B are on the same tablespace and C is
not.  I have no idea what to do about that in general, but for a first
version of this feature I think it's fine to punt.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: pg_upgrade and statistics
Next
From: Robert Haas
Date:
Subject: Re: LIST OWNED BY...