Re: Hot Standby dev build (v8) - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Hot Standby dev build (v8)
Date
Msg-id 4974847A.5060906@enterprisedb.com
Whole thread Raw
In response to Re: Hot Standby dev build (v8)  (Simon Riggs <simon@2ndQuadrant.com>)
Responses Re: Hot Standby dev build (v8)  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
Simon Riggs wrote:
> I suggest we handle this on the recovery side, not on the master, by
> deriving the xmin at the point the WAL record arrives. We would
> calculate it by looking at recovery procs only. That will likely give us
> a later value than we would get from the master, but that can't be
> helped.

Hmm, that's an interesting idea. It presumes that we see an abort/commit 
WAL record at the right moment for every transaction that we have a 
recovery proc for. We just concluded in the other thread that we do 
always emit abortion records when the database is running normally; I 
think that's good enough for this purpose.

A few other random ideas I had:

- in btree delete redo, follow the index pointers, and look at the xids 
on the heap tuples. That requires some random I/O, but will give the 
exact value we need. Since it's quite expensive, I think we'd only want 
to do it after using some more conservative test but quicker test to 
determine that there might be a conflict.

- Add latestRemovedXid to b-tree page header, and update it as tuples 
are killed. Need to tolerate the fact that tuple kills are not WAL-logged.

> Btree deletes were an important optimisation when it first went it, but
> now we have HOT it is much less important. 

If HOT is working well for your application, there won't be many btree 
deletes anyway, and the whole issue is moot.

> Another route might be to put
> an option to turn off btree delete on the master, default = on. We
> probably should consider turning it off entirely when it doesn't yield
> significant benefit.

I'd rather put in a generic mechanism to prevent vacuuming of recent 
tuples that might still be needed in the standby. Like always 
subtracting a fixed amount of xids from OldestXmin/RecentGlobalXmin, or 
having a feedback loop from the standby to the master, allowing the 
master to say what it's oldest xmin is. But that's a fair amount of 
work; I'd rather leave that as a future enhancement, and just figure out 
something simple for this specific issue. We'll need to handle it 
gracefully even if we try to avoid it by retaining dead tuples longer.

> Lots of scanning to remove the odd row is probably
> pretty wasteful and likely adds contention at the very point we don't
> want it - index splits.

Remember that if you can remove enough dead tuples from the index page, 
you've just made room on the page and don't need to split. Splitting is 
pretty expensive compared to scanning a few line pointers.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Andrew Chernow
Date:
Subject: Re: libpq WSACleanup is not needed
Next
From: Simon Riggs
Date:
Subject: Re: Hot Standby dev build (v8)