Re: Proposal for CSN based snapshots - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Proposal for CSN based snapshots
Date
Msg-id CANP8+jLyhL92eJi7RVX_yvbKjxm9=OBKcmNnodaYmeE+o2i1+Q@mail.gmail.com
Whole thread Raw
In response to Re: Proposal for CSN based snapshots  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Proposal for CSN based snapshots
List pgsql-hackers
On 24 July 2015 at 19:21, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jul 24, 2015 at 1:00 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> It depends on the exact design we use to get that. Certainly we do not want
> them if they cause a significant performance regression.

Yeah.  I think the performance worries expressed so far are:

- Currently, if you see an XID that is between the XMIN and XMAX of
the snapshot, you hit CLOG only on first access.  After that, the
tuple is hinted.  With this approach, the hint bit doesn't avoid
needing to hit CLOG anymore, because it's not enough to know whether
or not the tuple committed; you have to know the CSN at which it
committed, which means you have to look that up in CLOG (or whatever
SLRU stores this data).  Heikki mentioned adding some caching to
ameliorate this problem, but it sounds like he was worried that the
impact might still be significant.

This seems like the heart of the problem. Changing a snapshot from a list of xids into one number is easy. Making XidInMVCCSnapshot() work is the hard part because there needs to be a translation/lookup from CSN to determine if it contains the xid.

That turns CSN into a reference to a cached snapshot, or a reference by which a snapshot can be derived on demand.
 
- Mixing synchronous_commit=on and synchronous_commit=off won't work
as well, because if the LSN ordering of commit records matches the
order in which transactions become visible, then an async-commit
transaction can't become visible before a later sync-commit
transaction.  I expect we might just decide we can live with this, but
it's worth discussing in case people feel otherwise.

Using the Commit LSN as CSN seems interesting, but it is not the only choice.

Commit LSN is not the precise order in which commits become visible because of the race condition between marking commit in WAL and marking commit in clog. That problem is accentuated by mixing async and sync, but that is not the only source of racing. 

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: proposal: multiple psql option -c
Next
From: Simon Riggs
Date:
Subject: Re: optimizing vacuum truncation scans