Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch for parallel pg_dump
Date
Msg-id CA+TgmoY3T3apag-7Gw6i-VOpaUwOuGneVQECkwNYBumNcUGJ2A@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: patch for parallel pg_dump  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Jan 27, 2012 at 10:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland <joe@mcknight.de> wrote:
>> So this is the parallel pg_dump patch, generalizing the existing
>> parallel restore and allowing parallel dumps for the directory archive
>> format, the patch works on Windows and Unix.
>
> This patch introduces a large amount of notational churn, as perhaps
> well-illustrated by this hunk:
>
>  static int
> ! dumpBlobs(Archive *AHX, void *arg)
>  {
> +       /*
> +        * This is a data dumper routine, executed in a child for
> parallel backup,
> +        * so it must not access the global g_conn but AH->connection instead.
> +        */
> +       ArchiveHandle *AH = (ArchiveHandle *) AHX;
>
> It seems pretty grotty to have a static function that gets an argument
> of the wrong type, and then just immediately turns around and casts it
> to something else.  It's not exactly obvious that that's even safe,
> especially if you don't know that ArchiveHandle is a struct whose
> first element is an Archive.  But even if you do know that subclassing
> is intended, that doesn't prove that the particular Archive object is
> always going to be an ArchiveHandle under the hood.  If it is, why not
> just pass it as an ArchiveHandle to begin with?  I think this needs to
> be revised in some way.  At least in the few cases I checked, the only
> point of getting at the ArchiveHandle this way was to find
> AH->connection, which suggests to me either that AH->connection should
> be in the "public" section, inside Archive rather than ArchiveHandle,
> or else that we ought to just pass the connection object to this
> function (and all of its friends who have similar issues) as a
> separate argument.  Either way, I think that would make this patch
> both cleaner and less-invasive.  In fact we might want to pull out
> just that change and commit it separately to simplify review of the
> remaining work.
>
> It's not clear to me why fmtQualifiedId needs to move to dumputils.c.
> The way you have it, fmtQualifiedId() is now with fmtId(), but no
> longer with fmtCopyColumnList(), the only other similarly named
> function in that directory.  That may be more logical, or it may not,
> but rearranging the code like this makes it a lot harder to review,
> and I would prefer that we make such changes as separate commits if
> we're going to do them, so that diff can do something sensible with
> the changes that are integral to the patch.

Woops, hit send a little too soon there.  I'll try to make some more
substantive comments after looking at this more.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: patch for parallel pg_dump
Next
From: karavelov@mail.bg
Date:
Subject: TS: Limited cover density ranking