Re: Hot standby and b-tree killed items - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot standby and b-tree killed items
Date
Msg-id 1229683962.4793.504.camel@ebony.2ndQuadrant
Whole thread Raw
In response to Re: Hot standby and b-tree killed items  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Hot standby and b-tree killed items  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Re: Hot standby and b-tree killed items  (Simon Riggs <simon@2ndQuadrant.com>)
Re: Hot standby and b-tree killed items  (Robert Treat <xzilla@users.sourceforge.net>)
List pgsql-hackers
On Fri, 2008-12-19 at 12:24 +0200, Heikki Linnakangas wrote:
> Simon Riggs wrote:
> > We have infrastructure in place to make this work correctly, just need
> > to add latestRemovedXid field to xl_btree_vacuum. So that part is easily
> > solved.
> 
> That's tricky because there's no xmin/xmax on index tuples.

Doh. 

>  You could 
> conservatively use OldestXmin as latestRemovedXid, but that could stall 
> the WAL redo a lot more than necessary. Or you could store 
> latestRemovedXid in the page header, but that would need to be 
> WAL-logged to ensure that it's valid after crash. Or you could look at 
> the heap to fetch the xmin/xmax, but that would be expensive.

Agreed. Probably need to use OldestXmin then.



If I was going to add anything to the btree page header, it would be
latestRemovedLSN, only set during recovery. That way we don't have to
explicitly kill queries, we can do the a wait on OldestXmin then let
them ERROR out when they find a page that has been modified.

I have a suspicion that we may need some modification of that solution
for all data blocks, so we don't kill too many queries.

Hmmm. I wonder if we can track latestRemovedLSN for all of
shared_buffers. That was initially rejected, but if we set the
latestRemovedLSN to be the block's LSN when we read it in, that would be
fairly useful. Either way we use 8 bytes RAM per buffer.



BTW, I noticed the other day that Oracle 11g only allows you to have a
read only slave *or* allows you to continue replaying. You need to
manually switch back and forth between those modes. They can't do
*both*, as Postgres will be able to do. That's because their undo
information is stored off-block in the Undo Tablespace, so is not
available for standby queries. Nice one, Postgres.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support



pgsql-hackers by date:

Previous
From: Grzegorz Jaskiewicz
Date:
Subject: Re: possible bug in 8.4
Next
From: Simon Riggs
Date:
Subject: Re: Latest version of Hot Standby patch