Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch for parallel pg_dump
Date
Msg-id CA+TgmoZ_KzWnthyZhbe6pa4pdRe8pDq3dP2SZCKyQV6RnHmtRw@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  (Joachim Wieland <joe@mcknight.de>)
Responses Re: patch for parallel pg_dump
List pgsql-hackers
On Sun, Apr 1, 2012 at 12:35 PM, Joachim Wieland <joe@mcknight.de> wrote:
> On Wed, Mar 28, 2012 at 2:20 PM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> My main comment about the current patch is that it looks like it's
>> touching pg_restore parallel code by moving some stuff into parallel.c.
>> If that's really the case and its voluminous, maybe this patch would
>> shrink a bit if we could do the code moving in a first patch.  That
>> would be mostly mechanical.  Then the interesting stuff would apply on
>> top of that.  That would make review easier.
>
> Unfortunately this is not really the case. What is being moved out of
> pg_backup_archiver.c and into parallel.c is either the shutdown logic
> that has been applied only a few days ago or is necessary to change
> the parallel restore logic from one-thread-per-dump-object to the
> message passing framework where a worker starts in the beginning and
> then receives a new dump object from the master every time it's idle.

Hmm.  It looks to me like the part-two patch still contains a bunch of
code rearrangement.  For example, the current code for
pg_backup_archiver.c patch contains this:

typedef struct ParallelState
{       int                     numWorkers;       ParallelStateEntry *pse;
} ParallelState;

In the revised patch, that's removed, and parallel.c instead contains this:

typedef struct _parallel_state
{     int                     numWorkers;     ParallelSlot *parallelSlot;
} ParallelState;

Perhaps I am missing something, but it looks to me like that's the
same code, except that for some reason the identifiers have been
renamed.  I see little point in renaming the struct from ParallelState
to _parallel_state (in fact, I like the new name less) or changing
ParallelStateEntry to ParallelSlot (which is no worse, but it's not
better either).

On a similar note, what's the point of changing struct Archive to have
int numWorkers instead of int number_of_jobs, and furthermore
shuffling the declaration around to a different part of the struct?
If that's really an improvement, it should be a separate patch, but my
guess is that it is just rearranging deck chairs.  It gives rise to
subsidiary diff hunks like this:
       /*        * If we're going to do parallel restore, there are some restrictions.        */
!       parallel_mode = (ropt->number_of_jobs > 1 && ropt->useDB);       if (parallel_mode)       {               /* We
haven'tgot round to making this work for all 
archive formats */
--- 271,277 ----       /*        * If we're going to do parallel restore, there are some restrictions.        */
!       parallel_mode = (AH->public.numWorkers > 1 && ropt->useDB);       if (parallel_mode)       {               /*
Wehaven't got round to making this work for all 
archive formats */

On another note, I am not sure that I like the messaging protocol
you've designed.  It seems to me that this has little chance of being
reliable:

+ void (*on_exit_msg_func)(const char *modulename, const char *fmt,
va_list ap) = vwrite_msg;

I believe the idea here is that you're going to capture the dying gasp
of the worker thread and send it to the master instead of printing it
out.  But that doesn't seem very reliable.  There's code all over the
place (and, in particular, in pg_dump.c) that assumes that we may
write messages at any time, and in particular that we may emit
multiple messages just before croaking.  Even if you were to hook
vwrite_msg, I'm not sure that would do it, because there are certainly
situations in which libpq can print out errors directly, for example,
or a system library might cough something up.  I'm thinking that the
boss process should really probably be capturing stdout and stderr and
reading messages from there, and interpreting any messages that it
sees as non-fatal (but still reporting them back to the user) unless
the worker exits unexpectedly (which the master can detect by noticing
that the connection has been closed).

Also, I like the idea of making it possible to use assertions in
front-end code.  But I think that if we're going to do that, we should
make it work in general, not just for things that include
pg_backup_archiver.h.  It seems to me that the way to do this would be
to move the definitions of Assert(), AssertMacro(), AssertArg(),
AssertState(), Trap(), TrapMacro(), and ExceptionalCondition() out of
postgres.h and into a new header file, say, pg_assert.h.  postgres.h
can include that automatically, and people writing front-end code can
include it if they're so inclined.  The only difficulty is where and
how to define ExceptionalCondition().  The obvious place seems to be
dumputils.c, which seems to be our unofficial place to stick stuff
that may be needed by a variety of frontend code.  Really, I think we
ought to think about creating a library that is more explicitly for
that purpose (libpgfrontend?) or else figuring out how to incorporate
it into libpq, but that's a project for another day.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: "Albe Laurenz"
Date:
Subject: Re: pgsql_fdw, FDW for PostgreSQL server
Next
From: Joachim Wieland
Date:
Subject: Re: patch for parallel pg_dump