Re: patch for parallel pg_dump - Mailing list pgsql-hackers

From Robert Haas
Subject Re: patch for parallel pg_dump
Date
Msg-id CA+Tgmoa9=9yRvRZBLq1BdFzVL-K_3FC7uFNDuMoxBhdywd9Ltg@mail.gmail.com
Whole thread Raw
In response to Re: patch for parallel pg_dump  (Joachim Wieland <joe@mcknight.de>)
Responses Re: patch for parallel pg_dump
Re: patch for parallel pg_dump
List pgsql-hackers
On Tue, Apr 3, 2012 at 10:40 AM, Joachim Wieland <joe@mcknight.de> wrote:
>> On a similar note, what's the point of changing struct Archive to have
>> int numWorkers instead of int number_of_jobs, and furthermore
>> shuffling the declaration around to a different part of the struct?
>
> number_of_jobs was in the struct RestoreOptions before, a structure
> that is not used when doing a dump. I moved it to the Archive as it is
> used by both dump and restore and since all other code talks about
> "workers" I changed the name to "numWorkers".

Gah.  Somehow I feel that splitting up this patch into two pieces
hasn't made anything any better.

>> On another note, I am not sure that I like the messaging protocol
>> you've designed.  It seems to me that this has little chance of being
>> reliable:
>>
>> + void (*on_exit_msg_func)(const char *modulename, const char *fmt,
>> va_list ap) = vwrite_msg;
>>
>> I believe the idea here is that you're going to capture the dying gasp
>> of the worker thread and send it to the master instead of printing it
>> out.  But that doesn't seem very reliable.  There's code all over the
>> place (and, in particular, in pg_dump.c) that assumes that we may
>> write messages at any time, and in particular that we may emit
>> multiple messages just before croaking.
>
> I guess you're talking about the code in pg_dump that reads in the
> database schema and the details of all the different objects in the
> schema. This code is run before forking off workers and is always
> executed in the master.

OK, but it seems like a pretty fragile assumption that none of the
workers will ever manage to emit any other error messages.  We don't
rely on this kind of assumption in the backend, which is a lot
better-structured and less spaghetti-like than the pg_dump code.

>> Also, I like the idea of making it possible to use assertions in
>> front-end code.  But I think that if we're going to do that, we should
>> make it work in general, not just for things that include
>> pg_backup_archiver.h.
>
> I completely agree. Assertions helped a lot dealing with concurrent
> code. How do you want to tackle this for now? Want me to create a
> separate header pg_assert.h as part of my patch? Or is it okay to
> factor it out later and include it from the general header then?

I'll just go do it, barring objections.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: patch for parallel pg_dump
Next
From: Robert Haas
Date:
Subject: invalid search_path complaints