Re: Deriving Recovery Snapshots - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Deriving Recovery Snapshots |
Date | |
Msg-id | 1224676704.27145.249.camel@ebony.2ndQuadrant Whole thread Raw |
In response to | Re: Deriving Recovery Snapshots (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>) |
Responses |
Re: Deriving Recovery Snapshots
Re: Deriving Recovery Snapshots |
List | pgsql-hackers |
On Wed, 2008-10-22 at 12:29 +0300, Heikki Linnakangas wrote: > How about: > > 1. Keep all transactions and subtransactions in UnobservedXids. > 2. If it fills up, remove all subtransactions from it, that the startup > process knows to be subtransactions and knows the parents, and update > subtrans. Mark the array as overflowed. > > To take a snapshot, a backend simply copies UnobservedXids array and the > flag. If it hasn't overflowed, a transaction is considered to be in > progress if it's in the array. If it has overflowed, and the xid is not > in the array, check subtrans We can't check subtrans. We do not have any record of what the parent is for an unobserved transaction id. So the complete list of unobserved xids *must* be added to the snapshot. If that makes snapshot overflow, we have a big problem: we would be forced to say "sorry snapshot cannot be issued at this time, please wait". Ugh! > Note that the startup process sees all WAL records, so it can do > arbitrarily complex bookkeeping in backend-private memory, and only > expose the necessary parts in shared mem. For example, it can keep track > of the parent-child relationships of the xids in UnobservedXids, but the > backends taking snapshots don't need to know about that. For step 2 to > work, that's exactly what the startup process needs to keep track of. > For the startup process to know about the parent-child relationships, > we'll need something like WAL changes you suggested. I'm not too > thrilled about adding a new field to all WAL records. Seems simpler to > just rely on the new WAL records on AssignTransactionId(), and we can > only do it, say, every 100 subtransactions, if we make the > UnobservedXids array big enough (100*max_connections). Yes, we can make the UnobservedXids array bigger, but only to the point where it will all fit within a snapshot. The WAL changes proposed use space that was previously wasted, so there is no increase in amount of data going to disk. The additional time to derive that data is very quick when those fields are unused and that logic is executed before we take WALInsertLock. So overall, very low overhead. Every new subxid needs to specify its parent's xid. We must supply that information somehow: either via an XLOG_XACT_ASSIGNMENT, or as I have done in most cases, tuck that into the wasted space on the xlrec. Writing a WAL record every 100 subtransactions will not work: we need to write to subtrans *before* that xid appears anywhere on disk, so that visibility tests can determine the status of the transaction. The approach I have come up with is very finely balanced. It's the *only* approach that I've come up with that covers all requirements; there were very few technical choices to make. If it wasn't for subtransactions, disappearing transactions because of FATAL errors and unobserved xids it would be much simpler. But having said that, the code isn't excessively complex, I wrote it in about 3 days. > This isn't actually that different from your proposal. The big > difference is that instead of PROC entries and UnobservedXids, all > transactions are tracked in UnobservedXids, and instead of caching > subtransactions in the subxids array in PROC entries, they're cached in > UnobservedXids as well. > Aanother, completely different approach, would be to forget about xid > arrays altogether, and change the way snapshots are taken: just do a > full memcpy of the clog between xmin and xmax. That might be pretty slow > if xmax-xmin is big, though. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: