Thread: Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing
Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > Introduce WAL records to log reuse of btree pages, allowing conflict > resolution during Hot Standby. Page reuse interlock requested by Tom. > Analysis and patch by me. There's still a theoretical possibility for this to happen: 1. A page is marked as deleted by VACUUM, setting xact field in the opaque 2. Master crashes. WAL replay replays the XLOG_BTREE_DELETE_PAGE record. It resets the xact field to FrozenTransactionId 3. The page is recycled. This writes a XLOG_BTREE_REUSE_PAGE record with FrozenTransactionId as latestRemovedXid When the standby replays that, it will call ResolveRecoveryConflictWithSnapshot with FrozenTransactionid, not the original xid that was used in the master when the page was deleted. A straightforward way to fix that is to WAL-log the real xid in the XLOG_BTREE_DELETE_PAGE records, instead of resetting it to FrozenTransactionId. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing
From
Simon Riggs
Date:
On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Introduce WAL records to log reuse of btree pages, allowing conflict > > resolution during Hot Standby. Page reuse interlock requested by Tom. > > Analysis and patch by me. > > There's still a theoretical possibility for this to happen: > > 1. A page is marked as deleted by VACUUM, setting xact field in the opaque > 2. Master crashes. WAL replay replays the XLOG_BTREE_DELETE_PAGE record. > It resets the xact field to FrozenTransactionId > 3. The page is recycled. This writes a XLOG_BTREE_REUSE_PAGE record with > FrozenTransactionId as latestRemovedXid > > When the standby replays that, it will call > ResolveRecoveryConflictWithSnapshot with FrozenTransactionid, not the > original xid that was used in the master when the page was deleted. > A straightforward way to fix that is to WAL-log the real xid in the > XLOG_BTREE_DELETE_PAGE records, instead of resetting it to > FrozenTransactionId. An even simpler way would be to reset the value to latestCompletedXid during btree_xlog_delete_page(). That touches less code. I doubt it will make much difference to conflict recovery, since if pages are being deleted then btree delete records are likely to be frequent and will have already killed long running queries. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote: >> A straightforward way to fix that is to WAL-log the real xid in the >> XLOG_BTREE_DELETE_PAGE records, instead of resetting it to >> FrozenTransactionId. > An even simpler way would be to reset the value to latestCompletedXid > during btree_xlog_delete_page(). That touches less code. I doubt it will > make much difference to conflict recovery, since if pages are being > deleted then btree delete records are likely to be frequent and will > have already killed long running queries. I'm a bit concerned about XID wraparound if the value doesn't get reset to FrozenTransactionId. There's no guarantee the page will get reused promptly ... regards, tom lane
Re: Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing
From
Simon Riggs
Date:
On Thu, 2010-02-18 at 14:17 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote: > >> A straightforward way to fix that is to WAL-log the real xid in the > >> XLOG_BTREE_DELETE_PAGE records, instead of resetting it to > >> FrozenTransactionId. > > > An even simpler way would be to reset the value to latestCompletedXid > > during btree_xlog_delete_page(). That touches less code. I doubt it will > > make much difference to conflict recovery, since if pages are being > > deleted then btree delete records are likely to be frequent and will > > have already killed long running queries. > > I'm a bit concerned about XID wraparound if the value doesn't get reset > to FrozenTransactionId. There's no guarantee the page will get reused > promptly ... I'd be very interested for you to have a look at Hot Standby from a transaction wraparound perspective. There was some code in there to handle anti-wraparound in RecordKnownAssignedTransactionId() but it was removed, though I'm a little hazy on that myself. You've got the best nose for corner cases and risks. In this case, I don't see any problem. The xid after recovery will be a same or higher value than if the crash had never taken place, so I can't see any risk that isn't already addressed. Since we now have to handle cases where blocks have been touched in pre-9.0 code and are in a state they could never get into in 9.0, we do still have to handle a value of btpo.xact == FrozenTransactionId. I will add a special case to the handling of XLOG_BTREE_REUSE_PAGE records also to allow for that. Any similar theoretical issues would be most welcome if reported. -- Simon Riggs www.2ndQuadrant.com
Re: [COMMITTERS] pgsql: Introduce WAL records to log reuse of btree pages, allowing
From
Simon Riggs
Date:
On Thu, 2010-02-18 at 14:23 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Introduce WAL records to log reuse of btree pages, allowing conflict > > resolution during Hot Standby. Page reuse interlock requested by Tom. > > Analysis and patch by me. > > There's still a theoretical possibility for this to happen: > > 1. A page is marked as deleted by VACUUM, setting xact field in the opaque > 2. Master crashes. WAL replay replays the XLOG_BTREE_DELETE_PAGE record. > It resets the xact field to FrozenTransactionId > 3. The page is recycled. This writes a XLOG_BTREE_REUSE_PAGE record with > FrozenTransactionId as latestRemovedXid > > When the standby replays that, it will call > ResolveRecoveryConflictWithSnapshot with FrozenTransactionid, not the > original xid that was used in the master when the page was deleted. > > A straightforward way to fix that is to WAL-log the real xid in the > XLOG_BTREE_DELETE_PAGE records, instead of resetting it to > FrozenTransactionId. Bug accepted, proposal implemented and committed. -- Simon Riggs www.2ndQuadrant.com