Re: In-placre persistance change of a relation - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: In-placre persistance change of a relation
Date
Msg-id 6f9828f5-e582-4eeb-a781-2d8dc92dff04@iki.fi
Whole thread Raw
In response to Re: In-placre persistance change of a relation  (Thom Brown <thom@linux.com>)
List pgsql-hackers
On 05/04/2025 00:29, Thom Brown wrote:
> On Fri, 27 Dec 2024 at 08:26, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>>
>> Hello. This is the updated version.
>>
>> (Sorry for the delay; I've been a little swamped.)
>>
>> - Undo logs are primarily stored in a fixed number of fixed-length
>>    slots and are spilled into files under some conditions.
>>
>>    The number of slots is 32 (ULOG_SLOT_NUM), and the buffer length is
>>    1024 (ULOG_SLOT_BUF_LEN). Both are currently non-configurable.
>>
>> - Undo logs are now used only during recovery and no longer involved
>>    in transaction ends for normal backends. Pending deletes for aborts
>>    have been restored.
>>
>> - Undo logs are stored on a per-Top-XID basis.
>>
>> - RelationPreserverStorate() is no longer modified.
>>
>> In this version, in the part following the introduction of orphan
>> storage prevention, the restriction on prepared transactions
>> persisting beyond server crashes (i.e., the prohibition) has been
>> removed. This is because handling for such cases has been reverted to
>> pendingDeletes.
>>
>> Let me know if you have any questions or concerns.
> 
> I just went to give this a test drive, but HEAD has drifted too far,
> at least for 0017 to apply. Could you please rebase and make the
> necessary modifications?

I had a quick look a this latest version now, up to 
"v36-0005-Prevent-orphan-storage-files-after-server-crash.patch" 
(because I'm very interested in that, but not in the rest of the 
patches). Sorry I haven't gotten around to it earlier.

Overall I'm pretty happy with the design. The main thing that's now 
missing is documentation. The main SGML docs should surely have a 
section on the UNDO log. A new README to describe how modules should use 
the undo log etc. would probably also be in order.

Off the top of my head, some subtle high-level things that should be 
explained somewhere:

- The UNDO log is only used to clean up after crash of a relation 
creation. It is *not* used for aborting or crash recovery of data, like 
on most systems. As a result, it's not as performance critical as you 
might think.

- The UNDO log is not a single sequential log like on many other 
systems. One way to think about it is that it's a per-transaction file, 
with a cache in shared memory for performance.

- The UNDO log is not used to handle controlled aborts, only for cleanup 
after a crash.

- What happens if you fail to process the UNDO log for some reason? Some 
storage files are leaked. Is that still considered OK, i.e. is the UNDO 
log a nice-to-have, or are there some more serious consequences?

- The interaction between REDO and UNDO. Every record inserted to the 
UNDO log of a transaction is WAL-logged in the REDO log. The undo log is 
like data file in that sense. Writing to the undo log follows the usual 
"WAL-before-write" rule: the WAL is flushed before the corresponding 
undo log entry is written to disk. (Is that true? I'm not 100% sure)

- When a new relation is created, do you flush the WAL before creating 
the file? Or is there still a small window where it can leak, if the 
file creation makes it to disk before crash but the undo log (or the WAL 
record of the undo log entry) does not?

Have you done any performance testing of this? By "this" I mean the 
overhead of the undo-logging on create/drop table.

-- 
Heikki Linnakangas
Neon (https://neon.tech)



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Parallel heap vacuum
Next
From: Andres Freund
Date:
Subject: Re: Parallel heap vacuum