Home > mailing lists

parallel pg_restore - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	parallel pg_restore
Date	September 21, 2008 16:29:05
Msg-id	48D6A076.90107@dunslane.net Whole thread Raw
Responses	Re: parallel pg_restore
List	pgsql-hackers

Tree view

I am working on getting parallel pg_restore working. I'm currently 
getting all the scaffolding working, and hope to have a naive prototype 
posted within about a week.

The major question is how to choose the restoration order so as to 
maximize efficiency both on the server and in reading the archive. My 
thoughts are currently running something like this:
   * when an item is completed, reduce the dependency count for each     item that depends on it by 1.   * when an item
hasa dependency count of 0 it is available for     execution, and gets moved to the head of the queue.   * when a new
workerspot becomes available, if there not currently a     data load running then pick the first available data load,
 otherwise pick the first available item.
 

This would mean that loading a table would probably be immediately 
followed by creation of its indexes, including PK and UNIQUE 
constraints, thus taking possible advantage of synchronised scans, data 
in file system buffers, etc.  
Another question is what we should do if the user supplies an explicit 
order with --use-list. I'm inclined to say we should stick strictly with 
the supplied order. Or maybe that should be an option.

Thoughts and comments welcome.

cheers

andrew

pgsql-hackers by date:

From: "Dmitry Koterov"
Date: 21 September 2008, 16:22:26
Subject: Re: Predictable order of SQL commands in pg_dump

From: Tom Lane
Date: 21 September 2008, 19:54:35
Subject: Re: Assert Levels

parallel pg_restore - Mailing list pgsql-hackers

Previous

Next