Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Proposal for CSN based snapshots
Date
Msg-id 20140512152628.GC9619@awork2.anarazel.de
Whole thread Raw
In response to Re: Proposal for CSN based snapshots  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Proposal for CSN based snapshots
List pgsql-hackers
On 2014-05-12 18:01:59 +0300, Heikki Linnakangas wrote:
> On 05/12/2014 05:41 PM, Andres Freund wrote:
> >I haven't fully thought it through but I think it should make some of
> >the decoding code simpler. And it should greatly simplify the hot
> >standby code.
> 
> Cool. I was worried it might conflict with the logical decoding stuff in
> some fundamental way, as I'm not really familiar with it.

My gut feeling is that it should be possible to make it work. I'm too
deep into the last week of a project to properly analyze it, but I am
sure we'll find a place to grab a drink and discuss it next
week.
Essentially all it needs is to be able to represent snapshots from the
past including (and that's the hard part) a snapshot from somewhere in
the midst of an xact. The latter is done by loggin cmin/cmax for all
catalog tuples and building a lookup table when looking inside an
xact. That shouldn't change much for CSN based snapshots. I think.

> >Some of the stuff in here will be influence whether your freezing
> >replacement patch gets in. Do you plan to further pursue that one?
>
> Not sure. I got to the point where it seemed to work, but I got a bit of a
> cold feet proceeding with it. I used the page header's LSN field to define
> the "epoch" of the page, but I started to feel uneasy about it.

Yea. I don't think the approach is fundamentally broken but it touches a
*lot* of arcane places... Or at least it needs to touch many and the
trick is finding them all :)

> I would be
> much more comfortable with an extra field in the page header, even though
> that uses more disk space. And requires dealing with pg_upgrade.

Maybe we can reclaim pagesize_version and prune_xid in some way? It
seems to be prune_xid could be represented as an LSN with CSN snapshots
combined with your freezing approach and we probably don't need the last
two bytes of the lsn for that purpose...

> Using 64 bits per XID instead of just 2 will obviously require a lot more
> disk space, so we might actually want to still support the old clog format
> too, as an "archive" format. The clog for old transactions could be
> converted to the more compact 2-bits per XID format (or even just 1 bit).

Wouldn't it make more sense to have two slrus then? A SRLU with dynamic
width doesn't seem easily doable.

> >How do you plan to deal with subtransactions?
> 
> pg_subtrans will stay unchanged. We could possibly merge it with pg_clog,
> reserving some 32-bit chunk of values that are not valid LSNs to mean an
> uncommitted subtransaction, with the parent XID. That assumes that you never
> need to look up the parent of an already-committed subtransaction. I thought
> that was true at first, but I think the SSI code looks up the parent of a
> committed subtransaction, to find its predicate locks. Perhaps it could be
> changed, but seems best to leave it alone for now; there will be a lot code
> churn anyway.
> 
> I think we can get rid of the sub-XID array in PGPROC. It's currently used
> to speed up TransactionIdIsInProgress(), but with the patch it will no
> longer be necessary to call TransactionIdIsInProgress() every time you check
> the visibility of an XID, so it doesn't need to be so fast anymore.

Whether it can be removed depends on how the whole hot standby stuff is
dealt with... Also, there's some other callsites that do
TransactionIdIsInProgress() at some frequency. Just think about all the
multixact business :(

> With the new "commit-in-progress" status in clog, we won't need the
> sub-committed clog status anymore. The "commit-in-progress" status will
> achieve the same thing.

Wouldn't that cause many spurious waits? Because commit-in-progress
needs to be waited on, but a sub-committed xact surely not?

> >So it's quite possible that clog will become more of a contention point
> >due to the doubled amount of writes.
> 
> Yeah. OTOH, each transaction will take more space in the clog, which will
> spread the contention across more pages. And I think there are ways to
> mitigate contention in clog, if it becomes a problem.

I am not opposed, more wondering if you'd thought about it.

I don't think spreading the contention works very well with the current
implementation of slru.c. It's already very prone to throwing away the
wrong page. Widening it will just make that worse.

> We could make the
> locking more fine-grained than one lock per page, use atomic 64-bit
> reads/writes on platforms that support it, etc.

We *really* need an atomics abstraction layer... There's more and more
stuff coming that needs it.

This is going to be a *large* patch.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Ants Aasma
Date:
Subject: Re: Proposal for CSN based snapshots
Next
From: Ants Aasma
Date:
Subject: Re: Proposal for CSN based snapshots