Re: Hot Standby b-tree delete records review - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Hot Standby b-tree delete records review
Date
Msg-id 1271923534.8305.30756.camel@ebony
Whole thread Raw
In response to Hot Standby b-tree delete records review  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Hot Standby b-tree delete records review  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
On Thu, 2010-04-22 at 10:24 +0300, Heikki Linnakangas wrote:
> btree_redo:
> >     case XLOG_BTREE_DELETE:
> > 
> >         /*
> >          * Btree delete records can conflict with standby queries. You
> >          * might think that vacuum records would conflict as well, but
> >          * we've handled that already. XLOG_HEAP2_CLEANUP_INFO records
> >          * provide the highest xid cleaned by the vacuum of the heap
> >          * and so we can resolve any conflicts just once when that
> >          * arrives. After that any we know that no conflicts exist
> >          * from individual btree vacuum records on that index.
> >          */
> >         {
> >             TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record);
> >             xl_btree_delete *xlrec = (xl_btree_delete *) XLogRecGetData(record);
> > 
> >             /*
> >              * XXX Currently we put everybody on death row, because
> >              * currently _bt_delitems() supplies InvalidTransactionId.
> >              * This can be fairly painful, so providing a better value
> >              * here is worth some thought and possibly some effort to
> >              * improve.
> >              */
> >             ResolveRecoveryConflictWithSnapshot(latestRemovedXid, xlrec->node);
> >         }
> >         break;
> 
> The XXX comment is out-of-date, the latestRemovedXid value is calculated
> by btree_xlog_delete_get_latestRemovedXid() nowadays.

Removed, thanks.

> If we're re-replaying the WAL record, for example after restarting the
> standby server, btree_xlog_delete_get_latestRemovedXid() won't find the
> deleted records and will return InvalidTransactionId. That's OK, because
> until we reach again the point in WAL where we were before the restart,
> we don't accept read-only connections so there's no-one to kill anyway,
> but you do get a useless "Invalid latestRemovedXid reported, using
> latestCompletedXid instead" message in the log (that shouldn't be
> capitalized, BTW).

> It would be nice to check if there's any potentially conflicting
> read-only queries before calling
> btree_xlog_delete_get_latestRemovedXid(), which can be quite expensive.

Good idea. You're welcome to add such tuning yourself, if you like.

> If the "Invalid latestRemovedXid reported, using latestCompletedXid
> instead" message is going to happen commonly, I think it should be
> downgraded to DEBUG1. If it's an unexpected scenario, it should be
> upgraded to WARNING.

Set to DEBUG because the above optimisation makes it return invalid much
more frequently, which we don't want reported.

> In btree_xlog_delete_get_latestRemovedXid:
> >     Assert(num_unused == 0);
> 
> Can't that happen as well in a re-replay scenario, if a heap item was
> vacuumed away later on?

OK, will remove. The re-replay gets me every time.

> >     /*
> >      * Note that if all heap tuples were LP_DEAD then we will be
> >      * returning InvalidTransactionId here. This seems very unlikely
> >      * in practice.
> >      */
> 
> If none of the removed heap tuples were present anymore, we currently
> return InvalidTransactionId, which kills/waits out all read-only
> queries. But if none of the tuples were present anymore, the read-only
> queries wouldn't have seen them anyway, so ISTM that we should treat
> InvalidTransactionId return value as "we don't need to kill anyone".

That's not the point. The tuples were not themselves the sole focus,
they indicated the latestRemovedXid of the backend on the primary that
had performed the deletion. So even if those tuples are no longer
present there may be others with similar xids that would conflict, so we
cannot ignore. Comment updated.

> Why does btree_xlog_delete_get_latestRemovedXid() keep the
> num_unused/num_dead/num_redirect counts, it doesn't actually do anything
> with them.

Probably a debug tool. Removed.

Changes committed.

Thanks for the review.

-- Simon Riggs           www.2ndQuadrant.com



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Assertion failure twophase.c (3) (testing HS/SR)
Next
From: Heikki Linnakangas
Date:
Subject: Re: Hot Standby b-tree delete records review