Re: [HACKERS] WAL logging problem in 9.4.3? - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [HACKERS] WAL logging problem in 9.4.3?
Date
Msg-id 20190514.135910.258194307.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [HACKERS] WAL logging problem in 9.4.3?  (Noah Misch <noah@leadboat.com>)
Responses Re: [HACKERS] WAL logging problem in 9.4.3?  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
Hello.

At Sun, 12 May 2019 17:37:05 -0700, Noah Misch <noah@leadboat.com> wrote in
<20190513003705.GA1202614@rfd.leadboat.com>
> On Sun, Mar 31, 2019 at 03:31:58PM -0700, Noah Misch wrote:
> > On Sun, Mar 10, 2019 at 07:27:08PM -0700, Noah Misch wrote:
> > > I also liked the design in the https://postgr.es/m/559FA0BA.3080808@iki.fi
> > > last paragraph, and I suspect it would have been no harder to back-patch.  I
> > > wonder if it would have been simpler and better, but I'm not asking anyone to
> > > investigate that.
> > 
> > Now I am asking for that.  Would anyone like to try implementing that other
> > design, to see how much simpler it would be?

Yeah, I think it is a bit too-complex for the value. But I think
it is the best way as far as we keep reusing a file on
truncation of the whole file.

> Anyone?  I've been deferring review of v10 and v11 in hopes of seeing the
> above-described patch first.

The siginificant portion of the complexity in this patch comes
from need to behave differently per block according to remebered
logged and truncated block numbers.

0005:
+ * NB: after WAL-logging has been skipped for a block, we must not WAL-log
+ * any subsequent actions on the same block either. Replaying the WAL record
+ * of the subsequent action might fail otherwise, as the "before" state of
+ * the block might not match, as the earlier actions were not WAL-logged.
+ * Likewise, after we have WAL-logged an operation for a block, we must
+ * WAL-log any subsequent operations on the same page as well. Replaying
+ * a possible full-page-image from the earlier WAL record would otherwise
+ * revert the page to the old state, even if we sync the relation at end
+ * of transaction.
+ *
+ * If a relation is truncated (without creating a new relfilenode), and we
+ * emit a WAL record of the truncation, we can't skip WAL-logging for any
+ * of the truncated blocks anymore, as replaying the truncation record will
+ * destroy all the data inserted after that. But if we have already decided
+ * to skip WAL-logging changes to a relation, and the relation is truncated,
+ * we don't need to WAL-log the truncation either.

If this consideration holds and given the optimizations on
WAL-skip and truncation, there's no way to avoid the per-block
behavior as far as we are allowing mixture of
logged-modifications and WAL-skipped COPY on the same relation
within a transaction.

We could avoid the per-block behavior change by making the
wal-inhibition per-relation basis. That will reduce the patch
size by the amount of BufferNeedsWALs and log_heap_update, but
not that large.

 inhibit wal-skipping after any wal-logged modifications in the relation.
 inhibit wal-logging after any wal-skipped modifications in the relation.
 wal-skipped relations are synced at commit-time.
 truncation of wal-skipped relation creates a new relfilenode.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [HACKERS] Unlogged tables cleanup
Next
From: Michael Paquier
Date:
Subject: Re: [HACKERS] Unlogged tables cleanup