Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch for parallel pg_dump
Date
Msg-id CA+TgmoZhcHMk9WHb0Py5-GsRXLzinRO-boqZN3oTnx+=mk_XXQ@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  (Joachim Wieland <joe@mcknight.de>)
Responses Re: patch for parallel pg_dump  (Joachim Wieland <joe@mcknight.de>)
List pgsql-hackers
On Tue, Jan 31, 2012 at 12:55 AM, Joachim Wieland <joe@mcknight.de> wrote:
> On Mon, Jan 30, 2012 at 12:20 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> But the immediate problem is that pg_dump.c is heavily reliant on
>> global variables, which isn't going to fly if we want this code to use
>> threads on Windows (or anywhere else).  It's also bad style.
>
> Technically, since most of pg_dump.c dumps the catalog and since this
> isn't done in parallel but only in the master process, most functions
> need not be changed for the parallel restore. Only those that are
> called from the worker threads need to be changed, this has been done
> in e.g. dumpBlobs(), the change that you quoted upthread.

If we're going to go to the trouble of cleaning this up, I'd prefer to
rationalize it rather than doing just the absolute bare minimum amount
of stuff needed to make it appear to work.

>> But it
>> seems possible that we might someday want to dump from one database
>> and restore into another database at the same time, so maybe we ought
>> to play it safe and use different variables for those things.
>
> Actually I've tried that but in the end concluded that it's best to
> have at most one database connection in an ArchiveHandle if you don't
> want to do a lot more refactoring. Besides the normal connection
> parameters like host, port, ... there's also std_strings, encoding,
> savedPassword, currUser/currSchema, lo_buf, remoteVersion ... that
> wouldn't be obvious where they belonged to.

And just for added fun and excitement, they all have inconsistent
naming conventions and inadequate documentation.

I think if we need more refactoring in order to support multiple
database connections, we should go do that refactoring.  The current
situation is not serving anyone well.

> Speaking about refactoring, I'm happy to also throw in the idea to
> make the dump and restore more symmetrical than they are now. I kinda
> disliked RestoreOptions* being a member of the ArchiveHandle without
> having something similar for the dump. Ideally I'd say there should be
> a DumpOptions and a RestoreOptions field (with a "struct Connection"
> being part of them containing all the different connection
> parameters). They could be a union if you wanted to allow only one
> connection, or not if you want more than one.

I like the idea of a struct Connection.  I think that could make a lot of sense.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: foreign key locks, 2nd attempt
Next
From: Yeb Havinga
Date:
Subject: Re: [v9.2] Add GUC sepgsql.client_label