Re: patch for parallel pg_dump - Mailing list pgsql-hackers
From | Joachim Wieland |
---|---|
Subject | Re: patch for parallel pg_dump |
Date | |
Msg-id | CACw0+108i3FPXkJiNNK_BB5CwY4My6kpkN3nPJfc1LiuGsufzA@mail.gmail.com Whole thread Raw |
In response to | Re: patch for parallel pg_dump (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: patch for parallel pg_dump
Re: patch for parallel pg_dump |
List | pgsql-hackers |
On Tue, Apr 3, 2012 at 9:34 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Sun, Apr 1, 2012 at 12:35 PM, Joachim Wieland <joe@mcknight.de> wrote: >> Unfortunately this is not really the case. What is being moved out of >> pg_backup_archiver.c and into parallel.c is either the shutdown logic >> that has been applied only a few days ago or is necessary to change >> the parallel restore logic from one-thread-per-dump-object to the >> message passing framework where a worker starts in the beginning and >> then receives a new dump object from the master every time it's idle. > > Hmm. It looks to me like the part-two patch still contains a bunch of > code rearrangement. For example, the current code for > pg_backup_archiver.c patch contains this: > > typedef struct ParallelState > { > int numWorkers; > ParallelStateEntry *pse; > } ParallelState; > > In the revised patch, that's removed, and parallel.c instead contains this: > > typedef struct _parallel_state > { > int numWorkers; > ParallelSlot *parallelSlot; > } ParallelState; Yes, this is what I referred to as the part of the shutdown logic that we only applied a few days ago. I basically backported what I had in parallel.c into pg_backup_archiver.c which is where all the parallel logic lives at the moment. Moving it out of pg_backup_archiver.c and into a new parallel.c file means that I'd either have to move the declaration to a header or create accessor functions and declare these in a header. I actually tried it and both solutions created more lines than they would save later on, especially with the lines that will remove this temporary arrangement again... The current parallel restore engine already has a "ParallelSlot" structure but uses it in a slightly different way. That's why I created the one in the shutdown logic as "ParallelStateEntry" for now. This will be gone with the final patch and at the end there will only be a "ParallelSlot" left that will serve both purposes. That's why you see this renaming (and the removal of the current ParallelSlot structure). "struct _parallel_state" won't be used anywhere, except for forward declarations in headers. I just used it because that seemed to be the naming scheme, other structures are called similarly, e.g.: $ grep "struct _" pg_backup_archiver.c typedef struct _restore_args typedef struct _parallel_slot typedef struct _outputContext I'm fine with any name, just let me know what you prefer. > On a similar note, what's the point of changing struct Archive to have > int numWorkers instead of int number_of_jobs, and furthermore > shuffling the declaration around to a different part of the struct? number_of_jobs was in the struct RestoreOptions before, a structure that is not used when doing a dump. I moved it to the Archive as it is used by both dump and restore and since all other code talks about "workers" I changed the name to "numWorkers". > On another note, I am not sure that I like the messaging protocol > you've designed. It seems to me that this has little chance of being > reliable: > > + void (*on_exit_msg_func)(const char *modulename, const char *fmt, > va_list ap) = vwrite_msg; > > I believe the idea here is that you're going to capture the dying gasp > of the worker thread and send it to the master instead of printing it > out. But that doesn't seem very reliable. There's code all over the > place (and, in particular, in pg_dump.c) that assumes that we may > write messages at any time, and in particular that we may emit > multiple messages just before croaking. I guess you're talking about the code in pg_dump that reads in the database schema and the details of all the different objects in the schema. This code is run before forking off workers and is always executed in the master. pg_dump only forks when all the catalog data has been read so only actual TABLE DATA and BLOBs are dumped from the workers. I claim that in at least 90% the functions involved here use exit_horribly() and output a clear message about why they're dying. If they don't but just die, the master will say "worker died unexpectedly". As you said a few mails before, any function exiting at this stage should rather call exit_horribly() to properly clean up after itself. The advantage of using the message passing system for the last error message is that you get exactly one message and it's very likely that it accurately describes what happened to a worker to make it stop. > Also, I like the idea of making it possible to use assertions in > front-end code. But I think that if we're going to do that, we should > make it work in general, not just for things that include > pg_backup_archiver.h. I completely agree. Assertions helped a lot dealing with concurrent code. How do you want to tackle this for now? Want me to create a separate header pg_assert.h as part of my patch? Or is it okay to factor it out later and include it from the general header then?
pgsql-hackers by date: