Re: patch for parallel pg_dump - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: patch for parallel pg_dump |
Date | |
Msg-id | CA+TgmobwKLtWDCDdvTZ1AbnqDPhA6+WewpDMzpvcEFufDvsTgg@mail.gmail.com Whole thread Raw |
In response to | patch for parallel pg_dump (Joachim Wieland <joe@mcknight.de>) |
Responses |
Re: patch for parallel pg_dump
Re: patch for parallel pg_dump |
List | pgsql-hackers |
On Sun, Jan 15, 2012 at 1:01 PM, Joachim Wieland <joe@mcknight.de> wrote: > So this is the parallel pg_dump patch, generalizing the existing > parallel restore and allowing parallel dumps for the directory archive > format, the patch works on Windows and Unix. This patch introduces a large amount of notational churn, as perhaps well-illustrated by this hunk: static int ! dumpBlobs(Archive *AHX, void *arg) { + /* + * This is a data dumper routine, executed in a child for parallel backup, + * so it must not access the global g_conn but AH->connection instead. + */ + ArchiveHandle *AH = (ArchiveHandle *) AHX; It seems pretty grotty to have a static function that gets an argument of the wrong type, and then just immediately turns around and casts it to something else. It's not exactly obvious that that's even safe, especially if you don't know that ArchiveHandle is a struct whose first element is an Archive. But even if you do know that subclassing is intended, that doesn't prove that the particular Archive object is always going to be an ArchiveHandle under the hood. If it is, why not just pass it as an ArchiveHandle to begin with? I think this needs to be revised in some way. At least in the few cases I checked, the only point of getting at the ArchiveHandle this way was to find AH->connection, which suggests to me either that AH->connection should be in the "public" section, inside Archive rather than ArchiveHandle, or else that we ought to just pass the connection object to this function (and all of its friends who have similar issues) as a separate argument. Either way, I think that would make this patch both cleaner and less-invasive. In fact we might want to pull out just that change and commit it separately to simplify review of the remaining work. It's not clear to me why fmtQualifiedId needs to move to dumputils.c. The way you have it, fmtQualifiedId() is now with fmtId(), but no longer with fmtCopyColumnList(), the only other similarly named function in that directory. That may be more logical, or it may not, but rearranging the code like this makes it a lot harder to review, and I would prefer that we make such changes as separate commits if we're going to do them, so that diff can do something sensible with the changes that are integral to the patch. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: