Re: Summary and Plan for Hot Standby - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Summary and Plan for Hot Standby |
Date | |
Msg-id | 4B000B0C.70003@enterprisedb.com Whole thread Raw |
In response to | Summary and Plan for Hot Standby (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Summary and Plan for Hot Standby
Re: Summary and Plan for Hot Standby Re: Summary and Plan for Hot Standby Re: Summary and Plan for Hot Standby Re: Summary and Plan for Hot Standby Re: Summary and Plan for Hot Standby |
List | pgsql-hackers |
Simon Riggs wrote: > There are two remaining areas of significant thought/effort: Here's a list of other TODO items I've collected so far. Some of them are just improvements or nice-to-have stuff, but some are more serious: - If WAL recovery runs out of lock space while acquiring an AccessExclusiveLock on behalf of a transaction that ran in the master, it will FATAL and abort recovery, bringing down the standby. Seems like it should wait/cancel queries instead. - When switching from standby mode to normal operation, we momentarily hold all AccessExclusiveLocks held by prepared xacts twice, needing twice the lock space. You can run out of lock space at that point, causing failover to fail. - When replaying b-tree deletions, we currently wait out/cancel all running (read-only) transactions. We take the ultra-conservative stance because we don't know how recent the tuples being deleted are. If we could store a better estimate for latestRemovedXid in the WAL record, we could make that less conservative. - The assumption that b-tree vacuum records don't need conflict resolution because we did that with the additional cleanup-info record works ATM, but it hinges on the fact that we don't delete any tuples marked as killed while we do the vacuum. That seems like a low-hanging fruit that I'd actually like to do now that I spotted it, but will then need to fix b-tree vacuum records accordingly. We'd probably need to do something about the previous item first to keep performance acceptable. - There's the optimization to replay of b-tree vacuum records that we discussed earlier: Replay has to touch all leaf pages because of the interlock between heap scans, to ensure that we don't vacuum away a heap tuple that a concurrent index scan is about to visit. Instead of actually reading in and pinning all pages, during replay we could just check that the pages that don't need any other work to be done are not currently pinned in the buffer cache. - Do we do the b-tree page pinning explained in previous point correctly at the end of index vacuum? ISTM we're not visiting any pages after the last page that had dead tuples on it. - code structure. I moved much of the added code to a new standby.c module that now takes care of replaying standby related WAL records. But there's code elsewhere too. I'm not sure if this is a good division but seems better than the original ad hoc arrangement where e.g lock-related WAL handling was in inval.c - The "standby delay" is measured as current timestamp - timestamp of last replayed commit record. If there's little activity in the master, that can lead to surprising results. For example, imagine that max_standby_delay is set to 8 hours. The standby is fully up-to-date with the master, and there's no write activity in master. After 10 hours, a long reporting query is started in the standby. Ten minutes later, a small transaction is executed in the master that conflicts with the reporting query. I would expect the reporting query to be canceled 8 hours after the conflicting transaction began, but it is in fact canceled immediately, because it's over 8 hours since the last commit record was replayed. - ResolveRecoveryConflictWithVirtualXIDs polls until the victim transactions have ended. It would be much nicer to sleep. We'd need a version of LockAcquire with a timeout. Hmm, IIRC someone submitted a patch for lock timeouts recently. Maybe we could borrow code from that? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: