Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Proposal for CSN based snapshots
Date
Msg-id 5370E267.2020904@vmware.com
Whole thread Raw
In response to Re: Proposal for CSN based snapshots  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: Proposal for CSN based snapshots
Re: Proposal for CSN based snapshots
List pgsql-hackers
On 05/12/2014 05:41 PM, Andres Freund wrote:
> I haven't fully thought it through but I think it should make some of
> the decoding code simpler. And it should greatly simplify the hot
> standby code.

Cool. I was worried it might conflict with the logical decoding stuff in 
some fundamental way, as I'm not really familiar with it.

> Some of the stuff in here will be influence whether your freezing
> replacement patch gets in. Do you plan to further pursue that one?

Not sure. I got to the point where it seemed to work, but I got a bit of 
a cold feet proceeding with it. I used the page header's LSN field to 
define the "epoch" of the page, but I started to feel uneasy about it. I 
would be much more comfortable with an extra field in the page header, 
even though that uses more disk space. And requires dealing with pg_upgrade.

>> The core of the design is to store the LSN of the commit record in pg_clog.
>> Currently, we only store 2 bits per transaction there, indicating if the
>> transaction committed or not, but the patch will expand it to 64 bits, to
>> store the LSN. To check the visibility of an XID in a snapshot, the XID's
>> commit LSN is looked up in pg_clog, and compared with the snapshot's LSN.
>
> We'll continue to need some of the old states? You plan to use values
> that can never be valid lsns for them? I.e. 0/0 IN_PROGRESS, 0/1 ABORTED
> etc?

Exactly.

Using 64 bits per XID instead of just 2 will obviously require a lot 
more disk space, so we might actually want to still support the old clog 
format too, as an "archive" format. The clog for old transactions could 
be converted to the more compact 2-bits per XID format (or even just 1 bit).

> How do you plan to deal with subtransactions?

pg_subtrans will stay unchanged. We could possibly merge it with 
pg_clog, reserving some 32-bit chunk of values that are not valid LSNs 
to mean an uncommitted subtransaction, with the parent XID. That assumes 
that you never need to look up the parent of an already-committed 
subtransaction. I thought that was true at first, but I think the SSI 
code looks up the parent of a committed subtransaction, to find its 
predicate locks. Perhaps it could be changed, but seems best to leave it 
alone for now; there will be a lot code churn anyway.

I think we can get rid of the sub-XID array in PGPROC. It's currently 
used to speed up TransactionIdIsInProgress(), but with the patch it will 
no longer be necessary to call TransactionIdIsInProgress() every time 
you check the visibility of an XID, so it doesn't need to be so fast 
anymore.

With the new "commit-in-progress" status in clog, we won't need the 
sub-committed clog status anymore. The "commit-in-progress" status will 
achieve the same thing.

>> Currently, before consulting the clog for an XID's status, it is necessary
>> to first check if the transaction is still in progress by scanning the proc
>> array. To get rid of that requirement, just before writing the commit record
>> in the WAL, the backend will mark the clog slot with a magic value that says
>> "I'm just about to commit". After writing the commit record, it is replaced
>> with the record's actual LSN. If a backend sees the magic value in the clog,
>> it will wait for the transaction to finish the insertion, and then check
>> again to get the real LSN. I'm thinking of just using XactLockTableWait()
>> for that. This mechanism makes the insertion of a commit WAL record and
>> updating the clog appear atomic to the rest of the system.
>
> So it's quite possible that clog will become more of a contention point
> due to the doubled amount of writes.

Yeah. OTOH, each transaction will take more space in the clog, which 
will spread the contention across more pages. And I think there are ways 
to mitigate contention in clog, if it becomes a problem. We could make 
the locking more fine-grained than one lock per page, use atomic 64-bit 
reads/writes on platforms that support it, etc.

>> In theory, we could use a snapshot LSN as the cutoff-point for
>> HeapTupleSatisfiesVisibility(). Maybe it's just because this is new, but
>> that makes me feel uneasy.
>
> It'd possibly also end up being less efficient because you'd visit the
> clog for potentially quite some transactions to get the LSN.

True.

- Heikki



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: wrapping in extended mode doesn't work well with default pager
Next
From: Pavel Stehule
Date:
Subject: Re: cannot to compile PL/V8 on Fedora 20