Re: patch for parallel pg_dump - Mailing list pgsql-hackers
From | Joachim Wieland |
---|---|
Subject | Re: patch for parallel pg_dump |
Date | |
Msg-id | CACw0+13zHdG+HRhe+sKPtS3GJOWtLQWaKT7NiB3-xVFF0_MKag@mail.gmail.com Whole thread Raw |
In response to | Re: patch for parallel pg_dump (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: patch for parallel pg_dump
Re: patch for parallel pg_dump |
List | pgsql-hackers |
On Sat, Mar 10, 2012 at 9:51 AM, Robert Haas <robertmhaas@gmail.com> wrote: > - const char *owner, bool withOids, > + const char *owner, > + unsigned long int relpages, bool withOids, > > The new argument to ArchiveEntry() is unused. Removing it would > declutter things a good bit. How do you mean it's unused? pg_dump_sort.c uses relpages to dump the largest tables first. What you don't want to see in a parallel dump is a worker starting to dump a large table while everybody else is already idle... > The backend can have a wrapper function around this that calls ereport > using the error_string and error_code, and any front-end code that > wants to use this can do so directly. I tried this actually (patch attached) but then I wanted to test it and couldn't find anything that used pgpipe() on Windows. pg_basebackup/pg_basebackup.c is using it but it's in an #ifndef WIN32 and the same is true for postmaster/syslogger.c. Am I missing something or has this Windows implementation become stale by now? I'll append the patch but haven't adapted the pg_dump patch yet to use it. Should we still go forward the way you proposed? > +/* > + * The parallel error handler is called for any die_horribly() in a > child or master process. > + * It then takes control over shutting down the rest of the gang. > + */ > > I think this needs to be revised to take control in exit_nicely(), > maybe by using on_exit_nicely(). Trapping die_horribly() won't catch > everything. It's actually not designed to catch everything. This whole error handler thing is only there to report a single error to the user which is hopefully the root cause of why everybody is shutting down. Assume for example that we cannot get a lock on one table in a worker. Then the worker would die_horribly() saying that it cannot get a lock. The master would receive that message and shut down. Shutting down for the master means killing all the other workers. The master terminates because a worker died. And all the other workers die because the master killed them. Yet the root cause for the termination was the fact that one of the workers couldn't get a lock, and this is the one and only message that the user should see. If a child terminates without leaving a message, the master will still detect it and just say "a worker process died unexpectedly" (this part was actually broken, but now it's fixed :-) ) > Actually, looking > further, it seems that you already have such logic in > getMessageFromWorker(), so I'm unclear why do_wait needs to be passed > down to readMessageFromPipe() at all. That was a very good catch! Thanks! > +extern const char *fmtQualifiedId(struct Archive *fout, > + > const char *schema, const char *id); > > I don't think we want to expose struct Archive to dumputils.h. Can we > find a different place to put this? Right now it's there because fmtId() is there. fmtQualifiedId is used by dumputils.c, parallel.c and pg_dump.c, making the headers dumputils.h, parallel.h and pg_dump.h obvious candidates for the prototype. parallel.h doesn't make much sense. We can put it in pg_dump.h if you think that's better, but then we should also move fmtId() and fmtQualifiedId() into pg_dump.c... Or we change fmtQualifiedId to take an int and then we always pass the archive version instead of the Archive* ? > +enum escrow_action { GET, SET }; > +static void > +parallel_error_handler_escrow_data(enum escrow_action act, > ParallelState *pstate) > +{ > + static ParallelState *s_pstate = NULL; > + > + if (act == SET) > + s_pstate = pstate; > + else > + *pstate = *s_pstate; > +} > > This seems like a mighty complicated way to implement a global variable. Well, we talked about that before, when you complained that you couldn't get rid of the global g_conn because of the exit handler. You're right that in fact it is an indirect global variable here but it's clearly limited to the use of the error handler and you can be sure that nobody other than this function writes to it or accesses it without calling this function. If you want to make the ParallelState global then the immediate next question is why would you still pass it around as an argument to the various functions and not just access the global variable instead from everywhere... (I have accepted and implemented all other proposals that I didn't comment on here) Joachim
Attachment
pgsql-hackers by date: