Re: should there be a hard-limit on the number of transactionspending undo? - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: should there be a hard-limit on the number of transactionspending undo?
Date
Msg-id CAH2-WznLvY+Ljj0s+S0rf+Mk4_bPnXiqA3c58Vr7wLKqVcr1kA@mail.gmail.com
Whole thread Raw
In response to Re: should there be a hard-limit on the number of transactionspending undo?  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: should there be a hard-limit on the number of transactionspending undo?
List pgsql-hackers
On Mon, Jul 22, 2019 at 4:15 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> I had a similar thought: you might regret that choice if you were
> wanting to implement an AM with lock table-based concurrency control
> (meaning that there are lock ordering concerns for row and page locks,
> for DML statements, not just DDL).  That seemed a bit too far fetched
> to mention before, but are you saying the same sort of concerns might
> come up with indexes that support true undo (as opposed to indexes
> that still need VACUUM)?

Yes. It doesn't really make any difference with B-Trees, because the
locks there are very similar to row locks (you still need forwarding
UNDO metadata in index pages, probably for checking the visibility of
index tuples that have their ghost bit set). But when you need to undo
changes to an indexes with coarse grained index tuples (e.g. in a GIN
index), the transaction needs to roll back the index tuple as a whole,
necessitating that locks be held. Heap TIDs need to be completely
stable to avoid a VACUUM-like mechanism -- you cannot just create a
new HOT chain. You even have to be willing to store a single heap row
across two heap pages in extreme cases where an UPDATE makes it
impossible to fit a new row on the same heap page as the original --
this is called row forwarding.

Once heap TIDs are guaranteed to be associated with a logical row for
the lifetime of that row, and once you lock index entries, you're
always able to cleanly undo the changes in the index (i.e. remove new
tuples on abort). Then you have indexes that don't need VACUUMING, and
that have cheap index-only scans.

> For comparison, ARIES[1] has no-deadlock rollbacks as a basic property
> and reacquires locks during restart before new transactions are allow
> to execute.  In its model, the locks in question can be on things like
> rows and pages.  We don't even use our lock table for those (except
> for non-blocking SIREAD locks, irrelevant here).

Right. ARIES has plenty to say about concurrency control, even though
we often think of it as something that is only concerned with crash
recovery. The undo phase is tied to how concurrency control works in
general in ARIES. There is something called ARIES/KVL, and something
else called ARIES/IM [1].

> After crash
> recovery, if zheap encounters a row with pending rollback from an
> aborted transaction, as usual it either needs to read an older version
> from an undo log (for reads) or help execute the rollback before
> updating (for writes).  That only requires page-at-a-time LWLocks
> ("latching"), so it's deadlock-free.  The only deadlock risk comes
> from the need to acquire heavyweight locks on relations which
> typically only conflict when you run DDL, so yeah, it's tempting to
> worry a lot less about those than the fine grained lock traffic from
> DML statements that DB2 and others have to deal with.

I think that DB2 index deletes are synchronous, and immediately remove
space from a leaf page. Rollbacks will re-insert the deleted tuple.
Systems that use a limited form of MVCC based on 2PL [2] set a ghost
bit instead of physically removing the tuple immediately. But I don't
think that that's actually very different to the DB2 classic 2PL
approach, since there is forwarding undo information that makes it
possible to reclaim tuples with the ghost bit set at the earliest
possible opportunity. And because you can immediately do an in-place
update of an index tuple's heap TID in the case of unique indexes,
which can be optimized as a special case. Queries like "UPDATE tab set
tab_pk = tab_pk + 1" work per the SQL standard (no duplicate
violation), and don't even bloat the index, because the changes in the
index can happen almost entirely in-place.

> I might as well put the quote marks on now:  "Perhaps we could
> implement A later."

I don't claim to have any real answers here. I don't claim to
understand how much of a problem this is.

[1] https://15721.courses.cs.cmu.edu/spring2016/papers/a16-graefe.pdf
[2] http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf -- See
"6.7 Standard Practice"
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Jeevan Ladhe
Date:
Subject: concerns around pg_lsn
Next
From: Tom Lane
Date:
Subject: Re: Proposal for Signal Detection Refactoring