Re: On-the-fly index tuple deletion vs. hot_standby - Mailing list pgsql-hackers

From Noah Misch
Subject Re: On-the-fly index tuple deletion vs. hot_standby
Date
Msg-id 20110612190144.GE21098@tornado.leadboat.com
Whole thread Raw
In response to Re: On-the-fly index tuple deletion vs. hot_standby  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: On-the-fly index tuple deletion vs. hot_standby
List pgsql-hackers
On Sun, Jun 12, 2011 at 12:15:29AM -0400, Robert Haas wrote:
> On Sat, Jun 11, 2011 at 11:40 PM, Noah Misch <noah@leadboat.com> wrote:
> > We currently achieve that wait-free by first marking the page with the next
> > available xid and then reusing it when that mark (btpo.xact) predates the
> > oldest running xid (RecentXmin). ?(At the moment, I'm failing to work out why
> > this is OK with scans from transactions that haven't allocated an xid, but I
> > vaguely recall convincing myself it was fine at one point.) ?It would indeed
> > also be enough to call GetLockConflicts(locktag-of-index, AccessExclusiveLock)
> > and check whether any of the returned transactions have PGPROC.xmin below the
> > mark. ?That's notably more expensive than just comparing RecentXmin, so I'm
> > not sure how well it would pay off overall. ?However, it could only help us on
> > the master. ?(Not strictly true, but any way I see to extend it to the standby
> > has critical flaws.) ?On the master, we can see a conflicting transaction and
> > put off reusing the page. ?By the time the record hits the standby, we have to
> > apply it, and we might have a running transaction that will hold a lock on the
> > index for the next, say, 72 hours. ?At such times, vacuum_defer_cleanup_age or
> > hot_standby_feedback ought to prevent the recovery stall.
> >
> > This did lead me to realize that what we do in this regard on the standby can
> > be considerably independent from what we do on the master. ?If fruitful, the
> > standby can prove the absence of a scan holding a right-link in a completely
> > different fashion. ?So, we *could* take the cleanup-lock approach on the
> > standby without changing very much on the master.
> 
> Well, I'm generally in favor of trying to fix this problem without
> changing what the master does.  It's a weakness of our replication
> technology that the standby has no better way to cope with a cleanup
> operation on the master than to start killing queries, but then again
> it's a weakness of our MVCC technology that we don't reuse space
> quickly enough and end up with bloat.  I hear a lot more complaints
> about the second weakness than I do about the first.

I fully agree.  That said, if this works on the standby, we may as well also use
it opportunistically on the master, to throttle bloat.

> At any rate, if taking a cleanup lock on the right-linked page on the
> standby is sufficient to fix the problem, that seems like a far
> superior solution in any case.  Presumably the frequency of someone
> having a pin on that particular page will be far lower than any
> matching based on XID or heavyweight locks.  And the vast majority of
> such pins should disappear before the startup process feels obliged to
> get out its big hammer.

Yep; looks promising.

Does such a thing have a chance of being backpatchable?  I think the chances
start slim and fall almost to zero on account of the difficulty of avoiding a
WAL format change.  Assuming that conclusion, I do think it's worth starting
with something simple, even if it means additional bloat on the master in the
wal_level=hot_standby + vacuum_defer_cleanup_age / hot_standby_feedback case.
In choosing those settings, the administrator has taken constructive steps to
accept master-side bloat in exchange for delaying recovery conflict.  What's
your opinion?

Thanks,
nm


pgsql-hackers by date:

Previous
From: Seref Arikan
Date:
Subject: Detailed documentation for external calls (threading, shared resources etc)
Next
From: Noah Misch
Date:
Subject: Make relation_openrv atomic wrt DDL