Home > mailing lists

Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers

From	Hannu Krosing
Subject	Re: Transaction Snapshots and Hot Standby
Date	September 12, 2008 08:42:02
Msg-id	1221219702.7026.41.camel@huvostro Whole thread
In response to	Re: Transaction Snapshots and Hot Standby (Hannu Krosing <hannu@2ndQuadrant.com>)
Responses	Re: Transaction Snapshots and Hot Standby
List	pgsql-hackers

Tree view

On Fri, 2008-09-12 at 12:31 +0300, Hannu Krosing wrote:
> On Fri, 2008-09-12 at 09:45 +0100, Simon Riggs wrote:
> > On Thu, 2008-09-11 at 15:42 +0300, Heikki Linnakangas wrote:
> > > Gregory Stark wrote:
> > > > b) vacuum on the server which cleans up a tuple the slave has in scope has to
> > > >    block WAL reply on the slave (which I suppose defeats the purpose of having
> > > >    a live standby for users concerned more with fail-over latency).
> > > 
> > > One problem with this, BTW, is that if there's a continuous stream of 
> > > medium-length transaction in the slave, each new snapshot taken will 
> > > prevent progress in the WAL replay, so the WAL replay will advance in 
> > > "baby steps", and can fall behind indefinitely. As soon as there's a 
> > > moment that there's no active snapshot, it can catch up, but if the 
> > > slave is seriously busy, that might never happen.
> > 
> > It should be possible to do mixed mode.
> > 
> > Stall WAL apply for up to X seconds, then cancel queries. Some people
> > may want X=0 or low, others might find X = very high acceptable (Merlin
> > et al).
> 
> Or even milder version.
> 
> * Stall WAL apply for up to X seconds, 
> * then stall new queries, let old ones run to completion (with optional
> fallback to canceling after Y sec), 
> * apply WAL. 
> * Repeat.

Now that I have thought a little more about delegating keeping old
versions to filesystem level (ZFS , XFS+LVM) snapshots I'd like to
propose the following:
0. run queries and apply WAL freely until WAL application would    remove old rows.
1. stall applying WAL for up to N seconds
2. stall starting new queries for up to M seconds
3.  if some backends are still running long queries, then 
  3.1. make filesystem level snapshot (FS snapshot),   3.2. mount the FS snapshot somewhere (maybe as
data.at.OldestXmin       in parallel to $PGDATA) and   3.3 hand this mounted FS snapshot over to those backends

4. apply WAL
5. GoTo 0.

Of course we need to do the filesystem level snapshots in 3. only if the
long-running queries don't already have one given to them. Or maybe also
if they are running in READ COMMITTED mode and and have aquired a new PG
snapshot since they got their FS snapshot need a new one.

Also, snapshots need to be reference counted, so we can unmount and
destroy them once all their users have finished.

I think that enabling long-running queries this way is both low-hanging
fruit (or at least medium-height-hanging ;) ) and also consistent to
PostgreSQL philosophy of not replication effort. As an example we trust
OS's file system cache and don't try to write our own.

----------------
Hannu

pgsql-hackers by date:

From: Richard Huxton
Date: 12 September 2008, 08:32:54
Subject: Re: Transaction Snapshots and Hot Standby

From: Csaba Nagy
Date: 12 September 2008, 08:44:48
Subject: Re: Transaction Snapshots and Hot Standby

Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers

Previous

Next