Re: [HACKERS] Proposal : For Auto-Prewarm. - Mailing list pgsql-hackers
From | Thom Brown |
---|---|
Subject | Re: [HACKERS] Proposal : For Auto-Prewarm. |
Date | |
Msg-id | CAA-aLv73Y1jaN20g7VBBpsx-YZoQov+8z3Sg3N2ouR2hhZZp-A@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Proposal : For Auto-Prewarm. (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [HACKERS] Proposal : For Auto-Prewarm.
|
List | pgsql-hackers |
On 22 June 2017 at 22:52, Robert Haas <robertmhaas@gmail.com> wrote: > On Thu, Jun 15, 2017 at 12:35 AM, Mithun Cy <mithun.cy@enterprisedb.com> wrote: >> [ new patch ] > > I think this is looking better. I have some suggestions: > > * I suggest renaming launch_autoprewarm_dump() to > autoprewarm_start_worker(). I think that will be clearer. Remember > that user-visible names, internal names, and the documentation should > all match. +1 I like related functions and GUCs to be similarly named so that they have the same prefix. > > * I think the GUC could be pg_prewarm.autoprewarm rather than > pg_prewarm.enable_autoprewarm. It's shorter and, I think, no less > clear. +1 I also think pg_prewarm.dump_interval should be renamed to pg_prewarm.autoprewarm_interval. > > * In the documentation, don't say "This is a SQL callable function > to....". This is a list of SQL-callable functions, so each thing in > the list is one. Just delete this from the beginning of each > sentence. I've made a pass at the documentation and ended up removing those intros. I haven't made any GUC/function renaming changes, but I have rewritten some paragraphs for clarity. Updated patch attached. One thing I couldn't quite make sense of is: "The autoprewarm process will start loading blocks recorded in $PGDATA/autoprewarm.blocks until there is a free buffer left in the buffer pool." Is this saying "until there is a single free buffer remaining in shared buffers"? I haven't corrected or clarified this as I don't understand it. Also, I find it a bit messy that launch_autoprewarm_dump() doesn't detect an autoprewarm process already running. I'd want this to return NULL or an error if called for a 2nd time. > > * The reason for the AT_PWARM_* naming is not very obvious. Does AT > mean "at" or "auto" or something else? How about > AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY, > AUTOPREWARM_INTERVAL_DEFAULT? > > * Instead of defining apw_sigusr1_handler, I think you could just use > procsignal_sigusr1_handler. Instead of defining apw_sigterm_handler, > perhaps you could just use die(). got_sigterm would go away, and > you'd just CHECK_FOR_INTERRUPTS(). > > * The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse > reset_apw_state(), which might be better named detach_apw_shmem(). > Similarly, init_apw_state() could be init_apw_shmem(). > > * Instead of load_one_database(), I suggest > autoprewarm_database_main(). That is more parallel to > autoprewarm_main(), which you have elsewhere, and makes it more > obvious that it's the main entrypoint for some background worker. > > * Instead of launch_and_wait_for_per_database_worker(), I suggest > autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I > suggest autoprewarm_buffers(). The motivation for changing prewarm > to autoprewarm is that we want the names here to be clearly distinct > from the other parts of pg_prewarm that are not related to > autoprewarm. The motivation for changing buffer_pool to buffers is > just that it's a little shorter. Personally I also like the sound it > of it better, but YMMV. > > * prewarm_buffer_pool() ends with a useless return statement. I > suggest removing it. > > * Instead of creating our own buffering system via buffer_file_write() > and buffer_file_flush(), why not just use the facilities provided by > the operating system? fopen() et. al. provide buffering, and we have > AllocateFile() to provide a FILE *; it's just like > OpenTransientFile(), which you are using, but you'll get the buffering > stuff for free. Maybe there's some reason why this won't work out > nicely, but off-hand it seems like it might. It looks like you are > already using AllocateFile() to read the dump, so using it to write > the dump as well seems like it would be logical. > > * I think that it would be cool if, when autoprewarm completed, it > printed a message at LOG rather than DEBUG1, and with a few more > details, like "autoprewarm successfully prewarmed %d of %d > previously-loaded blocks". This would require some additional > tracking that you haven't got right now; you'd have to keep track not > only of the number of blocks read from the file but how many of those > some worker actually loaded. You could do that with an extra counter > in the shared memory area that gets incremented by the per-database > workers. > > * dump_block_info_periodically() calls ResetLatch() immediately before > WaitLatch; that's backwards. See the commit message for commit > 887feefe87b9099eeeec2967ec31ce20df4dfa9b and the comments it added to > the top of latch.h for details on how to do this correctly. > > * dump_block_info_periodically()'s main loop is a bit confusing. I > think that after calling dump_now(true) it should just "continue", > which will automatically retest got_sigterm. You could rightly object > to that plan on the grounds that we then wouldn't recheck got_sighup > promptly, but you can fix that by moving the got_sighup test to the > top of the loop, which is a good idea anyway for at least two other > reasons. First, you probably want to check for a pending SIGHUP on > initially entering this function, because something might have changed > during the prewarm phase, and second, see the previous comment about > using the "another valid coding pattern" from latch.h, which puts the > ResetLatch() at the bottom of the loop. > > * I think that launch_autoprewarm_dump() should ereport(ERROR, ...) > rather than just return NULL if the feature is disabled. Maybe > something like ... ERROR: pg_prewarm.dump_interval must be > non-negative in order to launch worker > > * Not sure about this one, but maybe we should consider just getting > rid of pg_prewarm.dump_interval = -1 altogether and make the minimum > value 0. If pg_prewarm.autoprewarm = on, then we start the worker and > dump according to the dump interval; if pg_prewarm.autoprewarm = off > then we don't start the worker automatically, but we still let you > start it manually. If you do, it respects the configured > dump_interval. With this design, we don't need the error suggested in > the previous item at all, and the code can be simplified in various > places --- all the checks for AT_PWARM_OFF go away. And I don't see > that we're really losing anything. There's not much sense in dumping > but not prewarming or prewarming but not dumping, so having > pg_prewarm.autoprewarm configure whether the worker is started > automatically rather than whether it prewarms (with a separate control > for whether it dumps) seems to make sense. The one time when you want > to do one without the other is when you first install the extension -- > during the first server lifetime, you'll want to dump, so that after > the next restart you have something to preload. But this design would > allow that. -- Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: