Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison - Mailing list pgsql-hackers

From Tom Lane
Subject Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison
Date
Msg-id 10396.1204070637@sss.pgh.pa.us
Whole thread Raw
In response to Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison  ("Joshua D. Drake" <jd@commandprompt.com>)
Responses Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison  ("Joshua D. Drake" <jd@commandprompt.com>)
List pgsql-hackers
"Joshua D. Drake" <jd@commandprompt.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> How exactly are you allocating tasks to threads in this prototype,
>> anyway?

> Right there is no balance here. Let me explain what I did. I performed
> a pg_restore -l to get the TOC file. I then broke that file up into
> five other files.

> prefix = schema (minus indexes, constraints)
> data = data
> pk = primary keys 
> index = indexes
> triggers_constraints = well triggers and cosntraints (foreign keys in
> this instance)

> The first step of the script loads prefix. It then splits the data
> file into -n- number of files and launches -n- number of
> pg_restore processes with -L.

> It runs through all data, then starts on pk in the exact same manner
> and then indxex etc...

So you have four serialization points not just one; at each one the
slowest subtask forces everyone else to wait, even if there's work that
could potentially be done on other tables.  This is fine for a
quick-and-dirty proof of concept but it's certainly not how we'd want to
implement the real thing.  But I doubt you can get much further without
putting some actual dependency awareness into it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Required make version
Next
From: "Joshua D. Drake"
Date:
Subject: Re: multi-worker pg_restore was: 8.3 / 8.2.6 restore comparison