Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers

From Hannu Krosing
Subject Re: Transaction Snapshots and Hot Standby
Date
Msg-id 1221219702.7026.41.camel@huvostro
Whole thread Raw
In response to Re: Transaction Snapshots and Hot Standby  (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses Re: Transaction Snapshots and Hot Standby  (Csaba Nagy <nagy@ecircle-ag.com>)
List pgsql-hackers
On Fri, 2008-09-12 at 12:31 +0300, Hannu Krosing wrote:
> On Fri, 2008-09-12 at 09:45 +0100, Simon Riggs wrote:
> > On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
> > > Gregory Stark wrote:
> > > > b) vacuum on the server which cleans up a tuple the slave has in scope has to
> > > >    block WAL reply on the slave (which I suppose defeats the purpose of having
> > > >    a live standby for users concerned more with fail-over latency).
> > > 
> > > One problem with this, BTW, is that if there's a continuous stream of 
> > > medium-length transaction in the slave, each new snapshot taken will 
> > > prevent progress in the WAL replay, so the WAL replay will advance in 
> > > "baby steps", and can fall behind indefinitely. As soon as there's a 
> > > moment that there's no active snapshot, it can catch up, but if the 
> > > slave is seriously busy, that might never happen.
> > 
> > It should be possible to do mixed mode.
> > 
> > Stall WAL apply for up to X seconds, then cancel queries. Some people
> > may want X=0 or low, others might find X = very high acceptable (Merlin
> > et al).
> 
> Or even milder version.
> 
> * Stall WAL apply for up to X seconds, 
> * then stall new queries, let old ones run to completion (with optional
> fallback to canceling after Y sec), 
> * apply WAL. 
> * Repeat.

Now that I have thought a little more about delegating keeping old
versions to filesystem level (ZFS , XFS+LVM) snapshots I'd like to
propose the following:
0. run queries and apply WAL freely until WAL application would    remove old rows.
1. stall applying WAL for up to N seconds
2. stall starting new queries for up to M seconds
3.  if some backends are still running long queries, then 
  3.1. make filesystem level snapshot (FS snapshot),   3.2. mount the FS snapshot somewhere (maybe as
data.at.OldestXmin       in parallel to $PGDATA) and   3.3 hand this mounted FS snapshot over to those backends
 
4. apply WAL
5. GoTo 0.

Of course we need to do the filesystem level snapshots in 3. only if the
long-running queries don't already have one given to them. Or maybe also
if they are running in READ COMMITTED mode and and have aquired a new PG
snapshot since they got their FS snapshot need a new one.

Also, snapshots need to be reference counted, so we can unmount and
destroy them once all their users have finished.

I think that enabling long-running queries this way is both low-hanging
fruit (or at least medium-height-hanging ;) ) and also consistent to
PostgreSQL philosophy of not replication effort. As an example we trust
OS's file system cache and don't try to write our own.

----------------
Hannu











pgsql-hackers by date:

Previous
From: Richard Huxton
Date:
Subject: Re: Transaction Snapshots and Hot Standby
Next
From: Csaba Nagy
Date:
Subject: Re: Transaction Snapshots and Hot Standby