Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Joachim Wieland
Subject Re: patch for parallel pg_dump
Date
Msg-id CACw0+12F6c642g1Uy52VGecDfcHeWLaxCN7yrwv8WF5sDrbYXA@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: patch for parallel pg_dump  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Mon, Jan 30, 2012 at 12:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> But the immediate problem is that pg_dump.c is heavily reliant on
> global variables, which isn't going to fly if we want this code to use
> threads on Windows (or anywhere else).  It's also bad style.

Technically, since most of pg_dump.c dumps the catalog and since this
isn't done in parallel but only in the master process, most functions
need not be changed for the parallel restore. Only those that are
called from the worker threads need to be changed, this has been done
in e.g. dumpBlobs(), the change that you quoted upthread.

> But it
> seems possible that we might someday want to dump from one database
> and restore into another database at the same time, so maybe we ought
> to play it safe and use different variables for those things.

Actually I've tried that but in the end concluded that it's best to
have at most one database connection in an ArchiveHandle if you don't
want to do a lot more refactoring. Besides the normal connection
parameters like host, port, ... there's also std_strings, encoding,
savedPassword, currUser/currSchema, lo_buf, remoteVersion ... that
wouldn't be obvious where they belonged to.

Speaking about refactoring, I'm happy to also throw in the idea to
make the dump and restore more symmetrical than they are now. I kinda
disliked RestoreOptions* being a member of the ArchiveHandle without
having something similar for the dump. Ideally I'd say there should be
a DumpOptions and a RestoreOptions field (with a "struct Connection"
being part of them containing all the different connection
parameters). They could be a union if you wanted to allow only one
connection, or not if you want more than one.


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [GENERAL] Why extract( ... from timestamp ) is not immutable?
Next
From: Marko Kreen
Date:
Subject: Re: Speed dblink using alternate libpq tuple storage