Re: should there be a hard-limit on the number of transactionspending undo? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: should there be a hard-limit on the number of transactionspending undo? |
Date | |
Msg-id | CAH2-WznLvY+Ljj0s+S0rf+Mk4_bPnXiqA3c58Vr7wLKqVcr1kA@mail.gmail.com Whole thread Raw |
In response to | Re: should there be a hard-limit on the number of transactionspending undo? (Thomas Munro <thomas.munro@gmail.com>) |
Responses |
Re: should there be a hard-limit on the number of transactionspending undo?
|
List | pgsql-hackers |
On Mon, Jul 22, 2019 at 4:15 AM Thomas Munro <thomas.munro@gmail.com> wrote: > I had a similar thought: you might regret that choice if you were > wanting to implement an AM with lock table-based concurrency control > (meaning that there are lock ordering concerns for row and page locks, > for DML statements, not just DDL). That seemed a bit too far fetched > to mention before, but are you saying the same sort of concerns might > come up with indexes that support true undo (as opposed to indexes > that still need VACUUM)? Yes. It doesn't really make any difference with B-Trees, because the locks there are very similar to row locks (you still need forwarding UNDO metadata in index pages, probably for checking the visibility of index tuples that have their ghost bit set). But when you need to undo changes to an indexes with coarse grained index tuples (e.g. in a GIN index), the transaction needs to roll back the index tuple as a whole, necessitating that locks be held. Heap TIDs need to be completely stable to avoid a VACUUM-like mechanism -- you cannot just create a new HOT chain. You even have to be willing to store a single heap row across two heap pages in extreme cases where an UPDATE makes it impossible to fit a new row on the same heap page as the original -- this is called row forwarding. Once heap TIDs are guaranteed to be associated with a logical row for the lifetime of that row, and once you lock index entries, you're always able to cleanly undo the changes in the index (i.e. remove new tuples on abort). Then you have indexes that don't need VACUUMING, and that have cheap index-only scans. > For comparison, ARIES[1] has no-deadlock rollbacks as a basic property > and reacquires locks during restart before new transactions are allow > to execute. In its model, the locks in question can be on things like > rows and pages. We don't even use our lock table for those (except > for non-blocking SIREAD locks, irrelevant here). Right. ARIES has plenty to say about concurrency control, even though we often think of it as something that is only concerned with crash recovery. The undo phase is tied to how concurrency control works in general in ARIES. There is something called ARIES/KVL, and something else called ARIES/IM [1]. > After crash > recovery, if zheap encounters a row with pending rollback from an > aborted transaction, as usual it either needs to read an older version > from an undo log (for reads) or help execute the rollback before > updating (for writes). That only requires page-at-a-time LWLocks > ("latching"), so it's deadlock-free. The only deadlock risk comes > from the need to acquire heavyweight locks on relations which > typically only conflict when you run DDL, so yeah, it's tempting to > worry a lot less about those than the fine grained lock traffic from > DML statements that DB2 and others have to deal with. I think that DB2 index deletes are synchronous, and immediately remove space from a leaf page. Rollbacks will re-insert the deleted tuple. Systems that use a limited form of MVCC based on 2PL [2] set a ghost bit instead of physically removing the tuple immediately. But I don't think that that's actually very different to the DB2 classic 2PL approach, since there is forwarding undo information that makes it possible to reclaim tuples with the ghost bit set at the earliest possible opportunity. And because you can immediately do an in-place update of an index tuple's heap TID in the case of unique indexes, which can be optimized as a special case. Queries like "UPDATE tab set tab_pk = tab_pk + 1" work per the SQL standard (no duplicate violation), and don't even bloat the index, because the changes in the index can happen almost entirely in-place. > I might as well put the quote marks on now: "Perhaps we could > implement A later." I don't claim to have any real answers here. I don't claim to understand how much of a problem this is. [1] https://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe.pdf [2] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf -- See "6.7 Standard Practice" -- Peter Geoghegan
pgsql-hackers by date: