Re: Vacuum dead tuples that are "between" transactions - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Vacuum dead tuples that are "between" transactions |
Date | |
Msg-id | 10856.1141108339@sss.pgh.pa.us Whole thread Raw |
In response to | Vacuum dead tuples that are "between" transactions (Paul Tillotson <spam1011@adelphia.net>) |
Responses |
Re: Vacuum dead tuples that are "between" transactions
Re: Vacuum dead tuples that are "between" transactions |
List | pgsql-hackers |
Paul Tillotson <spam1011@adelphia.net> writes: > The topic of improving vacuum for use in heavy-update environments seems > to come up frequently on the list. Has anyone weighed the costs of > allowing VACUUM to reclaim tuples that are not older than the oldest > transaction but are nonetheless invisible to all running transactions? > It seems that it's not that hard.... It's not that easy either --- you are assuming that every process advertises far more of its internal state than it actually does. > Suppose that in the system, we have a serializable transaction with xid > 1000 and a read committed transaction with xid 1001. Other than these > two, the oldest running xid is 2000. > Suppose we consider a tuple with xmin 1200 and xmax 1201. We will > assume that xid 1201 committed before xid 2000 began to run. > So: > (A) This tuple is invisible to the serializable transaction, since its > snapshot can't ever advance. Wrong --- you can't assume that simply from the transaction numbering, even assuming that you know that 1000 is serializable. 1000 might not have set its snapshot until quite some time after it started. (This is even pretty likely, if it waited for some locks before setting the snapshot.) You'd need access to the snapshot 1000 is actually using to be sure which "later" transactions are invisible to it. While advertising whole snapshots (rather than just xmin) in shared memory is at least theoretically possible, the costs of doing that seem nontrivial to me ... and they'd have to be paid whether any savings ensued or not. > (B) The read committed transaction might be able to see it. However, if > its current command started AFTER xid 1201 committed, it can't. Another issue is that there's not just "one single snapshot" to worry about per backend. Cursors for instance capture their own snaps. So a backend would have to somehow keep track of the oldest live snapshot it has internally. > The read committed ones are a more difficult matter, but I think you can > treat a tuple as dead if it was inserted after the read committed > transaction started to run AND the tuple was deleted before the > transaction's currently running command started to run. To do that requires not just that you have access to a backend's oldest snapshot, but that you have access to *all* its active snapshots; because such a transient tuple might be visible in some newer snap even though it's too new for the oldest snap. Doing that will create very significant problems of shared memory management, as well as performance and locking issues. There's been some talk of distinguishing "global" and "within database" xmin values, so that a long-lived transaction wouldn't interfere with vacuuming tables in other databases that that xact couldn't possibly access. That seems doable to me, but I think any finer-grained analysis is probably going to be a net loss because of the distributed overhead it'd impose on every transaction. regards, tom lane
pgsql-hackers by date: