Re: cheaper snapshots - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: cheaper snapshots
Date
Msg-id 1311867300.3117.1472.camel@hvost
Whole thread Raw
In response to Re: cheaper snapshots  (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses Re: cheaper snapshots
List pgsql-hackers
On Thu, 2011-07-28 at 17:10 +0200, Hannu Krosing wrote:
> On Thu, 2011-07-28 at 10:45 -0400, Tom Lane wrote:
> > Hannu Krosing <hannu@2ndQuadrant.com> writes:
> > > On Thu, 2011-07-28 at 10:23 -0400, Robert Haas wrote:
> > >> I'm confused by this, because I don't think any of this can be done
> > >> when we insert the commit record into the WAL stream.
> > 
> > > The update to stored snapshot needs to happen at the moment when the WAL
> > > record is considered to be "on stable storage", so the "current
> > > snapshot" update presumably can be done by the same process which forces
> > > it to stable storage, with the same contention pattern that applies to
> > > writing WAL records, no ?
> > 
> > No.  There is no reason to tie this to fsyncing WAL.  For purposes of
> > other currently-running transactions, the commit can be considered to
> > occur at the instant the commit record is inserted into WAL buffers.
> > If we crash before that makes it to disk, no problem, because nothing
> > those other transactions did will have made it to disk either. 
> 
> Agreed. Actually figured it out right after pushing send :)
> 
> > The
> > advantage of defining it that way is you don't have weirdly different
> > behaviors for sync and async transactions.
> 
> My main point was, that we already do synchronization when writing wal,
> why not piggyback on this to also update latest snapshot .

So the basic design could be "a sparse snapshot", consisting of 'xmin,
xmax, running_txids[numbackends] where each backend manages its own slot
in running_txids - sets a txid when aquiring one and nulls it at commit,
possibly advancing xmin if xmin==mytxid. as xmin update requires full
scan of running_txids, it is also a good time to update xmax  - no need
to advance xmax when "inserting" your next txid, so you don't need to
locak anything at insert time. 

the valid xmax is still computed when getting the snapshot. 

hmm, probably no need to store xmin and xmax at all.

it needs some further analysis to figure out, if doing it this way
without any locks can produce any relevantly bad snapshots.

maybe you still need one spinlock + memcpy of running_txids to local
memory to get snapshot.

also, as the running_txids array is global, it may need to be made even
sparser to minimise cache-line collisions. needs to be a tuning decision
between cache conflicts and speed of memcpy.

> 
> 
> -- 
> -------
> Hannu Krosing
> PostgreSQL (Infinite) Scalability and Performance Consultant
> PG Admin Book: http://www.2ndQuadrant.com/books/
> 
> 




pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Netbeans and postgres
Next
From: Hannu Krosing
Date:
Subject: Re: cheaper snapshots