Re: cheaper snapshots - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: cheaper snapshots
Date
Msg-id 1311869151.3117.1527.camel@hvost
Whole thread Raw
In response to cheaper snapshots  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: cheaper snapshots
List pgsql-hackers
On Wed, 2011-07-27 at 22:51 -0400, Robert Haas wrote:
> On Wed, Oct 20, 2010 at 10:07 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I wonder whether we could do something involving WAL properties --- the
> > current tuple visibility logic was designed before WAL existed, so it's
> > not exploiting that resource at all.  I'm imagining that the kernel of a
> > snapshot is just a WAL position, ie the end of WAL as of the time you
> > take the snapshot (easy to get in O(1) time).  Visibility tests then
> > reduce to "did this transaction commit with a WAL record located before
> > the specified position?".  You'd need some index datastructure that made
> > it reasonably cheap to find out the commit locations of recently
> > committed transactions, where "recent" means "back to recentGlobalXmin".
> > That seems possibly do-able, though I don't have a concrete design in
> > mind.
> 
> I was mulling this idea over some more (the same ideas keep floating
> back to the top...).  I don't think an LSN can actually work, because
> there's no guarantee that the order in which the WAL records are
> emitted is the same order in which the effects of the transactions
> become visible to new snapshots.  For example:
> 
> 1. Transaction A inserts its commit record, flushes WAL, and begins
> waiting for sync rep.
> 2. A moment later, transaction B sets synchronous_commit=off, inserts
> its commit record, requests a background WAL flush, and removes itself
> from the ProcArray.
> 3. Transaction C takes a snapshot.

It is Transaction A here which is acting badly - it should also remove
itself from procArray right after it inserts its commit record, as for
everybody else except the client app of transaction A it is committed at
this point. It just cant report back to client before getting
confirmation that it is actually syncrepped (or locally written to
stable storage).

At least at the point of consistent snapshots the right sequence should
be:

1) inert commit record into wal
2) remove yourself from ProcArray (or use some other means to declare
that your transaction is no longer running)
3) if so configured, wait for WAL flus to stable storage and/or SYnc Rep
confirmation

Based on this let me suggest a simple snapshot cache mechanism

A simple snapshot cache mechanism
=================================

have an array of running transactions, with one slot per backend

txid running_transactions[max_connections];

there are exactly 3 operations on this array

1. insert backends running transaction id
-----------------------------------------

this is done at the moment of acquiring your transaction id from system,
and synchronized by the same mechanism as getting the transaction id

running_transactions[my_backend] = current_transaction_id

2. remove backends running transaction id
-----------------------------------------

this is done at the moment of committing or aborting the transaction,
again synchronized by the write commit record mechanism. 

running_transactions[my_backend] = NULL

should be first thing after insertin WAcommit record

3. getting a snapshot
---------------------

memcpy() running_transactions to local memory, then construct a snapshot


it may be that you need to protect all3 operations with a single
spinlock, if so then I'd propose the same spinlock used when getting
your transaction id (and placing the array near where latest transaction
id is stored so they share cache line). 

But it is also possible, that you can get logically consistent snapshots
by protecting only some ops. for example, if you protect only insert and
get snapshot, then the worst that can happen is that you get a snapshot
that is a few commits older than what youd get with full locking and it
may well be ok for all real uses.



-- 
-------
Hannu Krosing
PostgreSQL Infinite Scalability and Performance Consultant
PG Admin Book: http://www.2ndQuadrant.com/books/



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: cheaper snapshots
Next
From: Hannu Krosing
Date:
Subject: Re: cheaper snapshots