Minimal logical decoding on standbys - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Minimal logical decoding on standbys |
Date | |
Msg-id | 20181212204154.nsxf3gzqv3gesl32@alap3.anarazel.de Whole thread Raw |
Responses |
Re: Minimal logical decoding on standbys
Re: Minimal logical decoding on standbys |
List | pgsql-hackers |
Hi, Craig previously worked on $subject, see thread [1]. A bunch of the prerequisite features from that and other related threads have been integrated into PG. What's missing is actually allowing logical decoding on a standby. The latest patch from that thread does that [2], but unfortunately hasn't been updated after slipping v10. The biggest remaining issue to allow it is that the catalog xmin on the primary has to be above the catalog xmin horizon of all slots on the standby. The patch in [2] does so by periodically logging a new record that announces the current catalog xmin horizon. Additionally it checks that hot_standby_feedback is enabled when doing logical decoding from a standby. I don't like the approach of managing the catalog horizon via those periodically logged catalog xmin announcements. I think we instead should build ontop of the records we already have and use to compute snapshot conflicts. As of HEAD we don't know whether such tables are catalog tables, but that's just a bool that we need to include in the records, a basically immeasurable overhead given the size of those records. I also don't think we should actually enforce hot_standby_feedback being enabled - there's use-cases where that's not convenient, and it's not bullet proof anyway (can be enabled/disabled without using logical decoding inbetween). I think when there's a conflict we should have the HINT mention that hs_feedback can be used to prevent such conflicts, that ought to be enough. Attached is a rough draft patch. If we were to go for this approach, we'd obviously need to improve the actual conflict handling against slots - right now it just logs a WARNING and retries shortly after. I think there's currently one hole in this approach. Nbtree (and other index types, which are pretty unlikely to matter here) have this logic to handle snapshot conflicts for single-page deletions: /* * If we have any conflict processing to do, it must happen before we * update the page. * * Btree delete records can conflict with standby queries. You might * think that vacuum records would conflict as well, but we've handled * that already. XLOG_HEAP2_CLEANUP_INFO records provide the highest xid * cleaned by the vacuum of the heap and so we can resolve any conflicts * just once when that arrives. After that we know that no conflicts * exist from individual btree vacuum records on that index. */ if (InHotStandby) { TransactionId latestRemovedXid = btree_xlog_delete_get_latestRemovedXid(record); RelFileNode rnode; XLogRecGetBlockTag(record, 0, &rnode, NULL, NULL); ResolveRecoveryConflictWithSnapshot(latestRemovedXid, xlrec->onCatalogTable, rnode); } I.e. we get the latest removed xid from the heap, which has the following logic: /* * If there's nothing running on the standby we don't need to derive a * full latestRemovedXid value, so use a fast path out of here. This * returns InvalidTransactionId, and so will conflict with all HS * transactions; but since we just worked out that that's zero people, * it's OK. * * XXX There is a race condition here, which is that a new backend might * start just after we look. If so, it cannot need to conflict, but this * coding will result in throwing a conflict anyway. */ if (CountDBBackends(InvalidOid) == 0) return latestRemovedXid; /* * In what follows, we have to examine the previous state of the index * page, as well as the heap page(s) it points to. This is only valid if * WAL replay has reached a consistent database state; which means that * the preceding check is not just an optimization, but is *necessary*. We * won't have let in any user sessions before we reach consistency. */ if (!reachedConsistency) elog(PANIC, "btree_xlog_delete_get_latestRemovedXid: cannot operate with inconsistent data"); so we wouldn't get a correct xid when not if nobody is connected to a database (and by implication when not yet consistent). I'm wondering if it's time to move the latestRemovedXid computation for this type of record to the primary - it's likely to be cheaper there and avoids this kind of complication. Secondarily, it'd have the advantage of making pluggable storage integration easier - there we have the problem that we don't know which type of relation we're dealing with during recovery, so such lookups make pluggability harder (zheap just adds extra flags to signal that, but that's not extensible). Another alternative would be to just prevent such index deletions for catalog tables when wal_level = logical. If we were to go with this approach, there'd be at least the following tasks: - adapt tests from [2] - enforce hot-standby to be enabled on the standby when logical slots are created, and at startup if a logical slot exists - fix issue around btree_xlog_delete_get_latestRemovedXid etc mentioned above. - Have a nicer conflict handling than what I implemented here. Craig's approach deleted the slots, but I'm not sure I like that. Blocking seems more appropriately here, after all it's likely that the replication topology would be broken afterwards. - get_rel_logical_catalog() shouldn't be in lsyscache.[ch], and can be optimized (e.g. check wal_level before opening rel etc). Once we have this logic, it can be used to implement something like failover slots on-top, by having having a mechanism that occasionally forwards slots on standbys using pg_replication_slot_advance(). Greetings, Andres Freund [1] https://www.postgresql.org/message-id/CAMsr+YEVmBJ=dyLw=+kTihmUnGy5_EW4Mig5T0maieg_Zu=XCg@mail.gmail.com [2] https://archives.postgresql.org/message-id/CAMsr%2BYEbS8ZZ%2Bw18j7OPM2MZEeDtGN9wDVF68%3DMzpeW%3DKRZZ9Q%40mail.gmail.com
Attachment
pgsql-hackers by date: