Thread: Re: CHECKPOINT unlogged data
Hi, On 2025-05-30 18:17:45 +0200, Christoph Berg wrote: > A customer reported to use CHECKPOINT before shutdowns to make the > shutdowns themselves faster and asked if it was possible to make > CHECKPOINT optionally also write out unlogged table data for that > purpose. I think the idea makes sense, so there's the patch. I've seen the need for this quite a few times, fwiw. So +1 to the idea. Greetings, Andres Freund
Hi, On May 30, 2025 12:55:28 PM EDT, Christoph Berg <myon@debian.org> wrote: >Re: Andres Freund >> Hi, >> >> On 2025-05-30 18:17:45 +0200, Christoph Berg wrote: >> > A customer reported to use CHECKPOINT before shutdowns to make the >> > shutdowns themselves faster and asked if it was possible to make >> > CHECKPOINT optionally also write out unlogged table data for that >> > purpose. I think the idea makes sense, so there's the patch. >> >> I've seen the need for this quite a few times, fwiw. So +1 to the idea. > >Do we want to officially mention this use case in checkpoint.sgml? I've >already replaced the "is not intended for use during normal operation" >wording by "not required during normal operation", but we might go one >step further and endorse it. Yes, I think it's good to mention what it is useful for. Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Fri, May 30, 2025 at 12:39:02PM -0400, Andres Freund wrote: > On 2025-05-30 18:17:45 +0200, Christoph Berg wrote: >> A customer reported to use CHECKPOINT before shutdowns to make the >> shutdowns themselves faster and asked if it was possible to make >> CHECKPOINT optionally also write out unlogged table data for that >> purpose. I think the idea makes sense, so there's the patch. > > I've seen the need for this quite a few times, fwiw. So +1 to the idea. This patch also adds an IMMEDIATE option, which I proposed some time ago [0]. I ended up withdrawing it due to general skepticism about its usefulness. FWIW I have no concerns about adding a few retail options to CHECKPOINT, but others might balk at options without solid use-cases. The unlogged table one seems reasonable enough. [0] https://postgr.es/m/17A03557-CF5C-4D4B-B719-A1D98DD75E75%40amazon.com -- nathan
Re: Nathan Bossart > This patch also adds an IMMEDIATE option, which I proposed some time ago > [0]. I ended up withdrawing it due to general skepticism about its Thanks for the pointer, I did not go that far back when looking for older threads. When writing the patch, I was also thinking about naming the option "fast" or "spread" but ultimately went with "immediate" because that's what the log message is using: =# checkpoint; 2025-05-30 18:23:17.433 CEST [579834] LOG: Checkpoint beginnt: immediate force wait SQL command "(options)" tend to be booleans, hence "immediate {on|off}". Introducing two separate keywords "fast" and "spread" seemed confusing, and there is no precedent for "fast=on" in other tools or the replication protocol. > usefulness. FWIW I have no concerns about adding a few retail options to > CHECKPOINT, but others might balk at options without solid use-cases. The > unlogged table one seems reasonable enough. I think the two options immediate and flush_all are actually useful in combination for the shutdown case. If operation is to continue normally until just before the shutdown, it might make sense to run these 3 commands (or just #1 and #3): checkpoint (flush_all, immediate false); checkpoint (flush_all); pg_ctl stop (I also thought about a VERBOSE option, but since the checkpoint messages are generated by a different process, it's probably harder than I initially thought.) Christoph
On 2025-05-30 19:23:04 +0200, Christoph Berg wrote: > Re: Nathan Bossart > > This patch also adds an IMMEDIATE option, which I proposed some time ago > > [0]. I ended up withdrawing it due to general skepticism about its > > Thanks for the pointer, I did not go that far back when looking for > older threads. > > When writing the patch, I was also thinking about naming the option > "fast" or "spread" but ultimately went with "immediate" because that's > what the log message is using: > > =# checkpoint; > 2025-05-30 18:23:17.433 CEST [579834] LOG: Checkpoint beginnt: immediate force wait > > SQL command "(options)" tend to be booleans, hence "immediate {on|off}". > Introducing two separate keywords "fast" and "spread" seemed > confusing, and there is no precedent for "fast=on" in other tools or > the replication protocol. I'd add a 'mode' that can be set to an arbitrary string, which then can be validated in C code. That seems more future proof.
Re: Andres Freund > I'd add a 'mode' that can be set to an arbitrary string, which then can be > validated in C code. That seems more future proof. Changed in the attached v2, thanks. Christoph
Attachment
On 2025/06/06 19:03, Christoph Berg wrote: > Re: Andres Freund >> I'd add a 'mode' that can be set to an arbitrary string, which then can be >> validated in C code. That seems more future proof. > > Changed in the attached v2, thanks. When I applied the patch and compiled it, I got the following warnings: utility.c:946:4: warning: label followed by a declaration is a C23 extension [-Wc23-extensions] 946 | CheckPointStmt *stmt = (CheckPointStmt *) parsetree; | ^ utility.c:947:16: warning: mixing declarations and code is incompatible with standards before C99 [-Wdeclaration-after-statement] 947 | ListCell *lc; | ^ 2 warnings generated. RequestCheckpoint(CHECKPOINT_WAIT | + (immediate ? CHECKPOINT_IMMEDIATE : 0) | + (flush_all ? CHECKPOINT_FLUSH_ALL : 0) | Some users might want to trigger a spread checkpoint but not wait for it to finish, since it could take a long time? If that's a valid use case, maybe we should add a WAIT option to let users choose whether to wait for the checkpoint to complete or not? Regards, -- Fujii Masao NTT DATA Japan Corporation
Re: Fujii Masao > utility.c:946:4: warning: label followed by a declaration is a C23 extension [-Wc23-extensions] Thanks, my compiler didn't throw that. { } block added in v3. > RequestCheckpoint(CHECKPOINT_WAIT | > + (immediate ? CHECKPOINT_IMMEDIATE : 0) | > + (flush_all ? CHECKPOINT_FLUSH_ALL : 0) | > > Some users might want to trigger a spread checkpoint but not wait for > it to finish, since it could take a long time? If that's a valid use case, > maybe we should add a WAIT option to let users choose whether to wait for > the checkpoint to complete or not? Do we want that? The checkpoint is only effective when it's finished, and running `psql -c "checkpoint (wait false)"` might make people shoot themselves into the foot. Christoph
Attachment
On Fri, Jun 06, 2025 at 04:26:56PM +0200, Christoph Berg wrote: > Re: Fujii Masao >> Some users might want to trigger a spread checkpoint but not wait for >> it to finish, since it could take a long time? If that's a valid use case, >> maybe we should add a WAIT option to let users choose whether to wait for >> the checkpoint to complete or not? > > Do we want that? The checkpoint is only effective when it's finished, > and running `psql -c "checkpoint (wait false)"` might make people > shoot themselves into the foot. *shrug* I imagine the documentation will pretty clearly indicate that setting WAIT to "false" will cause CHECKPOINT to not wait for it to finish. > + <term><literal>MODE</literal></term> > + <listitem> > + <para> > + The <command>CHECKPOINT</command> command forces an immediate > + checkpoint by default when the command is issued, without waiting for a > + regular checkpoint scheduled by the system. <literal>FAST</literal> is a > + synonym for <literal>IMMEDIATE</literal>. > + </para> > + <para> > + A <literal>SPREAD</literal> checkpoint will instead spread out the write load > + as determined by the <xref linkend="guc-checkpoint-completion-target"/> > + setting, like the system-scheduled checkpoints. > + </para> > + </listitem> > + </varlistentry> I don't understand why we need to add both FAST and IMMEDIATE. > + <varlistentry> > + <term><literal>FLUSH_ALL</literal></term> > + <listitem> > + <para> > + Requests the checkpoint to also flush data of <literal>UNLOGGED</literal> > + relations. Defaults to off. > + </para> > + </listitem> > + </varlistentry> Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED? IMHO that's more descriptive. My attempt at this patch back in 2020 included the following note, which seems relevant here: + Note that the server may consolidate concurrently requested checkpoints or + restartpoints. Such consolidated requests will contain a combined set of + options. For example, if one session requested an immediate checkpoint and + another session requested a non-immediate checkpoint, the server may combine + these requests and perform one immediate checkpoint. We might also want to make sure it's clear that CHECKPOINT does nothing if there's been no database activity since the last one (or, in the case of a restartpoint, if there hasn't been a checkpoint record). -- nathan
Re: Nathan Bossart > I imagine the documentation will pretty clearly indicate that setting WAIT > to "false" will cause CHECKPOINT to not wait for it to finish. I can add it, it's easy enough... > I don't understand why we need to add both FAST and IMMEDIATE. We have both: =# checkpoint ; 2025-06-06 18:09:25.743 CEST [872379] LOG: checkpoint starting: immediate force wait pg_basebackup --checkpoint=fast Could we settle for one official name for that? Then we could use that name in all contexts. > > + <term><literal>FLUSH_ALL</literal></term> > > Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED? > IMHO that's more descriptive. That's again coming from what the log message is saying: =# checkpoint (flush_all); 2025-06-06 18:12:46.298 CEST [873436] LOG: checkpoint starting: immediate force wait flush-all I think we should be consistent there. #define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those * belonging to unlogged tables */ Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit? > My attempt at this patch back in 2020 included the following note, which > seems relevant here: > > + Note that the server may consolidate concurrently requested checkpoints or > + restartpoints. Such consolidated requests will contain a combined set of > + options. For example, if one session requested an immediate checkpoint and > + another session requested a non-immediate checkpoint, the server may combine > + these requests and perform one immediate checkpoint. The CHECKPOINT documentation links to `28.5. WAL Configuration`, should this be mentioned there instead? > We might also want to make sure it's clear that CHECKPOINT does nothing if > there's been no database activity since the last one (or, in the case of a > restartpoint, if there hasn't been a checkpoint record). That's taken care of by "force": #define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */ Christoph
On Fri, Jun 06, 2025 at 06:20:21PM +0200, Christoph Berg wrote: > Re: Nathan Bossart >> I don't understand why we need to add both FAST and IMMEDIATE. > > We have both: > > =# checkpoint ; > 2025-06-06 18:09:25.743 CEST [872379] LOG: checkpoint starting: immediate force wait > > pg_basebackup --checkpoint=fast > > Could we settle for one official name for that? Then we could use that > name in all contexts. That seems like a good idea to me. I'm tempted to say that "fast" more accurately describes what's happening than "immediate." "Immediate" sounds like it happens instantaneously, but it's actually just happening "fast," i.e., as fast as possible. >> > + <term><literal>FLUSH_ALL</literal></term> >> >> Could we rename this to something like FLUSH_UNLOGGED or INCLUDE_UNLOGGED? >> IMHO that's more descriptive. > > That's again coming from what the log message is saying: > > =# checkpoint (flush_all); > 2025-06-06 18:12:46.298 CEST [873436] LOG: checkpoint starting: immediate force wait flush-all > > I think we should be consistent there. > > #define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those > * belonging to unlogged tables */ > > Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit? WFM. >> My attempt at this patch back in 2020 included the following note, which >> seems relevant here: >> >> + Note that the server may consolidate concurrently requested checkpoints or >> + restartpoints. Such consolidated requests will contain a combined set of >> + options. For example, if one session requested an immediate checkpoint and >> + another session requested a non-immediate checkpoint, the server may combine >> + these requests and perform one immediate checkpoint. > > The CHECKPOINT documentation links to `28.5. WAL Configuration`, > should this be mentioned there instead? I thought it would make sense to put it closer to where these options are described, since it'll be most evident for manually-initiated checkpoints. >> We might also want to make sure it's clear that CHECKPOINT does nothing if >> there's been no database activity since the last one (or, in the case of a >> restartpoint, if there hasn't been a checkpoint record). > > That's taken care of by "force": > > #define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */ Oh, I see that we always specify that for CHECKPOINT commands, except for restartpoints. IIRC even if you do specify CHECKPOINT_FORCE for a restartpoint, it'll have no effect. It's proably worth mentioning that case, at least. -- nathan
Re: Nathan Bossart > That seems like a good idea to me. I'm tempted to say that "fast" more > accurately describes what's happening than "immediate." "Immediate" sounds > like it happens instantaneously, but it's actually just happening "fast," > i.e., as fast as possible. Ack. > > #define CHECKPOINT_FLUSH_ALL 0x0010 /* Flush all pages, including those > > * belonging to unlogged tables */ > > > > Maybe CHECKPOINT_FLUSH_UNLOGGED would be more explicit? > > WFM. Do we want to change the checkpoint log message (and the new options) only, or include the CHECKPOINT_* flags? (I would guess there aren't many external users of these flags, but mmmv.) > I thought it would make sense to put it closer to where these options are > described, since it'll be most evident for manually-initiated checkpoints. Ack, I'll add that. > >> We might also want to make sure it's clear that CHECKPOINT does nothing if > >> there's been no database activity since the last one (or, in the case of a > >> restartpoint, if there hasn't been a checkpoint record). > > > > That's taken care of by "force": > > > > #define CHECKPOINT_FORCE 0x0008 /* Force even if no activity */ > > Oh, I see that we always specify that for CHECKPOINT commands, except for > restartpoints. IIRC even if you do specify CHECKPOINT_FORCE for a > restartpoint, it'll have no effect. It's proably worth mentioning that > case, at least. Right, will do. Christoph
On Wed, Jun 11, 2025 at 03:45:46PM +0200, Christoph Berg wrote: > Do we want to change the checkpoint log message (and the new options) > only, or include the CHECKPOINT_* flags? (I would guess there aren't > many external users of these flags, but mmmv.) IMO we should try to make the terminology consistent everywhere. I'd suggest putting the renaming stuff in separate prerequisite patches for your new CHECKPOINT option. -- nathan
On Mon, Jun 16, 2025 at 04:36:59PM +0200, Christoph Berg wrote: > I spent some time digging through the code, but I'm still not entirely > sure what's happening. There are several parts to it: > > 1) the list of buffers to flush is determined at the beginning of the > checkpoint, so running a 2nd FLUSH_UNLOGGED checkpoint will not make > the running checkpoint write these > > 2) running CHECKPOINT updates the checkpoint flags in shared memory so > I think the currently running checkpoint picks "MODE FAST" up and > speeds up. (But I'm not entirely sure, the call stack is quite deep > there.) > > 3) running CHECKPOINT (at least when waiting for it) seems to actually > start a new checkpoint, so FLUSH_UNLOGGED should still be effective. > (See the code arount "start_cv" in checkpointer.c) > > Admittedly, adding these points together raises some question marks > about the flag handling, so I would welcome clarification by someone > more knowledgeable in this area. I think you've got it right. With CHECKPOINT_WAIT set, RequestCheckpoint() will wait for a new checkpoint to start, at which point we know that the new flags have been seen by the checkpointer. If an immediate checkpoint is pending, CheckpointWriteDelay() will skip sleeping in the currently-running one, so the current checkpoint will be "upgraded" to immediate in some sense, but IIUC there will still be another immediate checkpoint after it completes. But AFAICT it doesn't pick up FLUSH_UNLOGGED until the next checkpoint begins. Another thing to note is what I mentioned earlier: + Note that the server may consolidate concurrently requested checkpoints or + restartpoints. Such consolidated requests will contain a combined set of + options. For example, if one session requested an immediate checkpoint and + another session requested a non-immediate checkpoint, the server may combine + these requests and perform one immediate checkpoint. -- nathan