Re: Transaction Snapshots and Hot Standby - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Transaction Snapshots and Hot Standby
Date
Msg-id 1221208721.3913.942.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Transaction Snapshots and Hot Standby  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Thu, 2008-09-11 at 17:58 +0300, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > So part of the handshake between
> > primary and standby must be "what is your recentxmin?". The primary will
> > then use the lower/earliest of the two.
> 
> Even then, the master might already have vacuumed away tuples that are 
> visible to an already running transaction in the slave, before the slave 
> connects. Presumably the master doesn't wait for the slave to connect 
> before starting to accept new connections.

Yep, OK.

> >> As you mentioned, the options there are to defer applying WAL, or cancel 
> >> queries. I think both options need the same ability to detect when 
> >> you're about to remove a tuple that's still visible to some snapshot, 
> >> just the action is different. We should probably provide a GUC to 
> >> control which you want.
> > 
> > I don't see any practical way of telling whether a tuple removal will
> > affect a snapshot or not. Each removed row would need to be checked
> > against each standby snapshot. Even if those were available, it would be
> > too costly. 
> 
> How about using the same method as we use in HeapTupleSatisfiesVacuum? 
> Before replaying a vacuum record, look at the xmax of the tuple 
> (assuming it committed). If it's < slave's OldestXmin, it can be 
> removed. Otherwise not. Like HeapTupleSatisfiesVacuum, it's 
> conservative, but doesn't require any extra bookkeeping.
> 
> And vice versa: if we implement the more precise book-keeping, with all 
> snapshots in shared memory or something, we might as well use it in 
> HeapTupleSatisfiesVacuum. That has been discussed before, but it's a 
> separate project.

Tuple removals earlier than the slave's OldestXmin are easy, thats true.
I'm not sure what you had in mind for "Otherwise not"? 

Maybe you mean "stop applying WAL until slave's OldestXmin is > tuple
removal xid". Not sure, reading other subthreads of this post.

I think its possible to defer removal actions on specific blocks only,
but that is an optimisation that's best left for a while.

BTW, tuple removals would need a cleanup lock on a block, just as they
do on master server. So WAL apply can be delayed momentarily by
pinholders anyway, whatever we do.

> > It was also suggested we might take the removed rows and put them in a
> > side table, but that makes me think of the earlier ideas for HOT and so
> > I've steered clear of that.
> 
> Yeah, that's non-trivial. Basically a whole new, different 
> implementation of MVCC, but without changing any on-disk formats.
> 
> BTW, we haven't talked about how to acquire a snapshot in the slave. 
> You'll somehow need to know which transactions have not yet committed, 
> but will in the future. In the master, we keep track of in-progress 
> transaction in the ProcArray, so I suppose we'll need to do the same in 
> the slave. Very similar to prepared transactions, actually. I believe 
> the Abort records, which are not actually needed for normal operation, 
> become critical here. The slave will need to put an entry to ProcArray 
> for any new XLogRecord.xl_xid it sees in the WAL, and remove the entry 
> at a Commit and Abort record. And clear them all at a shutdown record.

I wouldn't do it like that.

I was going to maintain a "current snapshot" in shared memory, away from
the PROCARRAY. Each time we see a TransactionId we check whether its
already been seen, if not, insert it. When a transaction commits or
aborts we remove the stated xid. If we see a shutdown checkpoint we
clear the array completely. When query backends want a snapshot they
just read the array.  It doesn't matter whether queries commit or abort,
since those changes can't be seen anyway by queries until commit.

Reason for doing it this way is PROCARRAY may be full of query backends,
so having dummy backends in there as well sounds confusing.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: [Review] pgbench duration option
Next
From: Simon Riggs
Date:
Subject: Re: Transaction Snapshots and Hot Standby