At Tue, 15 Mar 2022 12:44:49 -0400, Robert Haas <robertmhaas@gmail.com> wrote in
> On Wed, Jan 26, 2022 at 3:25 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> > The attached is the fixed version and it surely works with the repro.
>
> Hi,
>
> I spent the morning working on this patch and came up with the
> attached version. I wrote substantial comments in RelationTruncate(),
> where I tried to make it more clear exactly what the bug is here, and
> also in storage/proc.h, where I tried to clarify both the use of the
> DELAY_CHKPT_* flags in general terms. If nobody is too sad about this
> version, I plan to commit it.
Thanks for taking this and for the time. The additional comments
seems describing the flags more clearly.
storage.c:
+ * Make sure that a concurrent checkpoint can't complete while truncation
+ * is in progress.
+ *
+ * The truncation operation might drop buffers that the checkpoint
+ * otherwise would have flushed. If it does, then it's essential that
+ * the files actually get truncated on disk before the checkpoint record
+ * is written. Otherwise, if reply begins from that checkpoint, the
+ * to-be-truncated buffers might still exist on disk but have older
+ * contents than expected, which can cause replay to fail. It's OK for
+ * the buffers to not exist on disk at all, but not for them to have the
+ * wrong contents.
FWIW, this seems like slightly confusing between buffer and its
content. I can read it correctly so I don't mind if it is natural
enough.
Otherwise all the added/revised comments looks fine. Thanks for the
labor.
> I think it should be back-patched, too, but that looks like a bit of a
> pain. I think every back-branch will require different adjustments.
I'll try that, if you are already working on it, please inform me. (It
may more than likely be too late..)
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center