Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Proposal for CSN based snapshots |
Date | |
Msg-id | 5370E267.2020904@vmware.com Whole thread Raw |
In response to | Re: Proposal for CSN based snapshots (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Proposal for CSN based snapshots
Re: Proposal for CSN based snapshots |
List | pgsql-hackers |
On 05/12/2014 05:41 PM, Andres Freund wrote: > I haven't fully thought it through but I think it should make some of > the decoding code simpler. And it should greatly simplify the hot > standby code. Cool. I was worried it might conflict with the logical decoding stuff in some fundamental way, as I'm not really familiar with it. > Some of the stuff in here will be influence whether your freezing > replacement patch gets in. Do you plan to further pursue that one? Not sure. I got to the point where it seemed to work, but I got a bit of a cold feet proceeding with it. I used the page header's LSN field to define the "epoch" of the page, but I started to feel uneasy about it. I would be much more comfortable with an extra field in the page header, even though that uses more disk space. And requires dealing with pg_upgrade. >> The core of the design is to store the LSN of the commit record in pg_clog. >> Currently, we only store 2 bits per transaction there, indicating if the >> transaction committed or not, but the patch will expand it to 64 bits, to >> store the LSN. To check the visibility of an XID in a snapshot, the XID's >> commit LSN is looked up in pg_clog, and compared with the snapshot's LSN. > > We'll continue to need some of the old states? You plan to use values > that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED > etc? Exactly. Using 64 bits per XID instead of just 2 will obviously require a lot more disk space, so we might actually want to still support the old clog format too, as an "archive" format. The clog for old transactions could be converted to the more compact 2-bits per XID format (or even just 1 bit). > How do you plan to deal with subtransactions? pg_subtrans will stay unchanged. We could possibly merge it with pg_clog, reserving some 32-bit chunk of values that are not valid LSNs to mean an uncommitted subtransaction, with the parent XID. That assumes that you never need to look up the parent of an already-committed subtransaction. I thought that was true at first, but I think the SSI code looks up the parent of a committed subtransaction, to find its predicate locks. Perhaps it could be changed, but seems best to leave it alone for now; there will be a lot code churn anyway. I think we can get rid of the sub-XID array in PGPROC. It's currently used to speed up TransactionIdIsInProgress(), but with the patch it will no longer be necessary to call TransactionIdIsInProgress() every time you check the visibility of an XID, so it doesn't need to be so fast anymore. With the new "commit-in-progress" status in clog, we won't need the sub-committed clog status anymore. The "commit-in-progress" status will achieve the same thing. >> Currently, before consulting the clog for an XID's status, it is necessary >> to first check if the transaction is still in progress by scanning the proc >> array. To get rid of that requirement, just before writing the commit record >> in the WAL, the backend will mark the clog slot with a magic value that says >> "I'm just about to commit". After writing the commit record, it is replaced >> with the record's actual LSN. If a backend sees the magic value in the clog, >> it will wait for the transaction to finish the insertion, and then check >> again to get the real LSN. I'm thinking of just using XactLockTableWait() >> for that. This mechanism makes the insertion of a commit WAL record and >> updating the clog appear atomic to the rest of the system. > > So it's quite possible that clog will become more of a contention point > due to the doubled amount of writes. Yeah. OTOH, each transaction will take more space in the clog, which will spread the contention across more pages. And I think there are ways to mitigate contention in clog, if it becomes a problem. We could make the locking more fine-grained than one lock per page, use atomic 64-bit reads/writes on platforms that support it, etc. >> In theory, we could use a snapshot LSN as the cutoff-point for >> HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but >> that makes me feel uneasy. > > It'd possibly also end up being less efficient because you'd visit the > clog for potentially quite some transactions to get the LSN. True. - Heikki
pgsql-hackers by date: