Re: POC: Cleaning up orphaned files using undo logs - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: POC: Cleaning up orphaned files using undo logs |
Date | |
Msg-id | CA+hUKGKni7EEU4FT71vZCCwPeaGb2PQOeKOFjQJavKnD577UMQ@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Cleaning up orphaned files using undo logs (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: POC: Cleaning up orphaned files using undo logs
Re: POC: Cleaning up orphaned files using undo logs Re: POC: Cleaning up orphaned files using undo logs Re: POC: Cleaning up orphaned files using undo logs Re: POC: Cleaning up orphaned files using undo logs Re: POC: Cleaning up orphaned files using undo logs Re: POC: Cleaning up orphaned files using undo logs |
List | pgsql-hackers |
On Fri, Jun 28, 2019 at 6:09 AM Robert Haas <robertmhaas@gmail.com> wrote: > I happened to open up 0001 from this series, which is from Thomas, and > I do not think that the pg_buffercache changes are correct. The idea > here is that the customer might install version 1.3 or any prior > version on an old release, then upgrade to PostgreSQL 13. When they > do, they will be running with the old SQL definitions and the new > binaries. At that point, it sure looks to me like the code in > pg_buffercache_pages.c is going to do the Wrong Thing. [...] Yep, that was completely wrong. Here's a new version. I tested that I can install 1.3 in an older release, then pg_upgrade to master, then look at the view without the new column, then UPGRADE the extension to 1.4, and then the new column appears. Other new stuff in this tarball (and also at https://github.com/EnterpriseDB/zheap/tree/undo): Based on hallway track discussions at PGCon, I have made a few modifications to the undo log storage and record layer to support "shared" record sets. They are groups of records can be used for temporary storage space for anything that needs to outlive a whole set of transactions. The intended usage is extra transaction slots for updaters and lockers when there isn't enough space on a zheap (or other AM) page. The idea is to avoid the need to have in-heap overflow pages for transient transaction management data, and instead put that stuff on the conveyor belt of perfectly timed doom[1] along with old tuple versions. "Shared" undo records are never executed (that is, they don't really represent rollback actions), they are just used for storage space that is eventually discarded. (I experimented with a way to use these also to perform rollback actions to clean up stuff like the junk left behind by aborted CREATE INDEX CONCURRENTLY commands, which seemed promising, but it turned out to be quite tricky so I abandoned that for now). Details: 1. Renamed UndoPersistence to UndoLogCategory everywhere, and add a fourth category UNDO_SHARED where transactions can write 'out of band' data that relates to more than one transaction. 2. Introduced a new RMGR callback rm_undo_status. It is used to decide when record sets in the UNDO_SHARED category should be discarded (instead of the usual single xid-based rules). The possible answers are "discard me now!", "ask me again when a given XID is all visible", and "ask me again when a given XID is no longer running". 3. Recognise UNDO_SHARED record set boundaries differently. Whereas undolog.c recognises transaction boundaries automatically for the other categories (UNDO_PERMANENT, UNDO_UNLOGGED, UNDO_TEMP), for UNDO_SHARED the 4. Add some quick-and-dirty throw-away test stuff to demonstrate that. SELECT test_multixact([1234, 2345]) will create a new record set that will survive until the given array of transactions is no longer running, and then it'll be discarded. You can see that with SELECT * FROM undoinspect('shared'). Or look at SELECT pg_stat_undo_logs. This test simply writes all the xids into its payload, and then has an rm_undo_status function that returns the first xid it finds in the list that is still running, or if none are running returns UNDO_STATUS_DISCARD. Currently you can only return UNDO_STATUS_WAIT_XMIN so wait for an xid to be older than the oldest xmin; presumably it'd be useful to be able to discard as soon as an xid is no longer active, which could be a bit sooner. Another small change: several people commented that UndoLogIsDiscarded(ptr) ought to have some kind of fast path that doesn't acquire locks since it'll surely be hammered. Here's an attempt at that that provides an inlined function that uses a per-backend recent_discard to avoid doing more work in the (hopefully) common case that you mostly encounter discarded undo pointers. I hope this change will show up in profilers in some zheap workloads but this hasn't been tested yet. Another small change/review: the function UndoLogGetNextInsertPtr() previously took a transaction ID, but I'm not sure if that made sense, I need to think about it some more. I pulled the latest patches pulled in from the "undoprocessing" branch as of late last week, and most of the above is implemented as fixup commits on top of that. Next I'm working on DBA facilities for forcing undo records to be discarded (which consists mostly of sorting out the interlocking to make that work safely). And also testing facilities for simulating undo log switching (when you fill up each log and move to another one, which are rare code paths run, so we need a good way to make them not rare). [1] https://speakerdeck.com/macdice/transactions-in-postgresql-and-other-animals?slide=23 -- Thomas Munro https://enterprisedb.com
Attachment
pgsql-hackers by date: