Re: In-placre persistance change of a relation - Mailing list pgsql-hackers

From Kyotaro Horiguchi
Subject Re: In-placre persistance change of a relation
Date
Msg-id 20230317.151634.1038632016265639446.horikyota.ntt@gmail.com
Whole thread Raw
In response to Re: In-placre persistance change of a relation  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: In-placre persistance change of a relation  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
List pgsql-hackers
At Fri, 03 Mar 2023 18:03:53 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> Correctly they are three parts. The attached patch is the first part -
> the storage mark files, which are used to identify storage files that
> have not been committed and should be removed during the next
> startup. This feature resolves the issue of orphaned storage files
> that may result from a crash occurring during the execution of a
> transaction involving the creation of a new table.
> 
> I'll post all of the three parts shortly.

Mmm. It took longer than I said, but this is the patch set that
includes all three parts.

1. "Mark files" to prevent orphan storage files for in-transaction
  created relations after a crash.

2. In-place persistence change: For ALTER TABLE SET LOGGED/UNLOGGED
  with wal_level minimal, and ALTER TABLE SET UNLOGGED with other
  wal_levels, the commands don't require a file copy for the relation
  storage. ALTER TABLE SET LOGGED with non-minimal wal_level emits
  bulk FPIs instead of a bunch of individual INSERTs.

3. An extension to ALTER TABLE SET (UN)LOGGED that can handle all
  tables in a tablespace at once.


As a side note, I quickly go over the behavior of the mark files
introduced by the first patch, particularly what happens when deletion
fails.

(1) The mark file for MAIN fork ("<oid>.u") corresponds to all forks,
    while the mark file for INIT fork ("<oid>_init.u") corresponds to
    INIT fork alone.

(2) The mark file is created just before the the corresponding storage
    file is made. This is always logged in the WAL.

(3) The mark file is deleted after removing the corresponding storage
    file during the commit and rollback. This action is logged in the
    WAL, too. If the deletion fails, an ERROR is output and the
    transaction aborts.

(4) If a crash leaves a mark file behind, server will try to delete it
    after successfully removing the corresponding storage file during
    the subsequent startup that runs a recovery. If deletion fails,
    server leaves the mark file alone with emitting a WARNING. (The
    same behavior for non-mark files.)

(5) If the deletion of the mark file fails, the leftover mark file
    prevents the creation of the corresponding storage file (causing
    an ERROR).  The leftover mark files don't result in the removal of
    the wrong files due to that behavior.

(6) The mark file for an INIT fork is created only when ALTER TABLE
    SET UNLOGGED is executed (not for CREATE UNLOGGED TABLE) to signal
    the crash-cleanup code to remove the INIT fork. (Otherwise the
    cleanup code removes the main fork instead. This is the main
    objective of introducing the mark files.)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Add macros for ReorderBufferTXN toptxn
Next
From: "wangw.fnst@fujitsu.com"
Date:
Subject: RE: Data is copied twice when specifying both child and parent table in publication