Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys |
Date | |
Msg-id | 8d263a6b-a99f-0f9b-43ba-e8967c386f2b@amazon.com Whole thread Raw |
In response to | Re: Minimal logical decoding on standbys ("Drouvot, Bertrand" <bdrouvot@amazon.com>) |
Responses |
Re: Minimal logical decoding on standbys
|
List | pgsql-hackers |
Hi Andres, On 6/14/21 7:41 AM, Drouvot, Bertrand wrote: > Hi Andres, > > On 4/8/21 5:47 AM, Andres Freund wrote: >> Hi, >> >> On 2021-04-07 13:32:18 -0700, Andres Freund wrote: >>> While working on this I found a, somewhat substantial, issue: >>> >>> When the primary is idle, on the standby logical decoding via walsender >>> will typically not process the records until further WAL writes come in >>> from the primary, or until a 10s lapsed. >>> >>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to >>> increase, but gets woken up by walreceiver when new WAL has been >>> flushed. Which means that typically walsenders will get woken up at the >>> same time that the startup process will be - which means that by the >>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely >>> that the startup process already replayed the record and updated >>> XLogCtl->lastReplayedEndRecPtr. >>> >>> I think fixing this would require too invasive changes at this point. I >>> think we might be able to live with 10s delay issue for one release, >>> but >>> it sure is ugly :(. >> This is indeed pretty painful. It's a lot more regularly occuring if you >> either have a slot disk, or you switch around the order of >> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush(). >> >> - There's about which timeline to use. If you use pg_recvlogical and you >> restart the server, you'll see errors like: >> >> pg_recvlogical: error: unexpected termination of replication >> stream: ERROR: requested WAL segment 000000000000000000000003 has >> already been removed >> >> the real filename is 000000010000000000000003 - i.e. the timeline is >> 0. >> >> This isn't too hard to fix, but definitely needs fixing. > > Thanks, nice catch! > > From what I have seen, we are not going through InitXLOGAccess() on a > Standby and in some cases (like the one you mentioned) > StartLogicalReplication() is called without IdentifySystem() being > called previously: this lead to ThisTimeLineID still set to 0. > > I am proposing a fix in the attached v18 by adding a check in > StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved. > >> >> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially >> leading us to drop a slot that has been created since we signalled a >> recovery conflict. See >> https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de >> for some very similar issues. > > I have rewritten this part by following the same logic as the one used > in 96540f80f8 (the commit linked to the thread you mentioned). > >> >> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to >> just drop the logical slots. Instead we should just mark them as >> invalid, like InvalidateObsoleteReplicationSlots(). > > Makes fully sense and done that way in the attached patch. > > I am setting the slot's data.xmin and data.catalog_xmin as > InvalidTransactionId to mark the slot(s) as invalid in case of conflict. > >> - There's no tests covering timeline switches, what happens if there's a >> promotion if logical decoding is currently ongoing. > > I'll now work on the tests. > >> >> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error >> message is not good (and I've complained about it before...). > > I changed it and made it more simple. > > I also removed the details around mentioning xmin or catalog xmin (as > I am not sure of the added value and they are currently also not > mentioned during standby recovery snapshot conflict). > >> >> Unfortunately I think the things I have found are too many for me to >> address within the given time. I'll send a version with a somewhat >> polished set of the changes I made in the next few days... > > Thanks for the review and feedback. > > Please find enclosed v18 with the changes I worked on. > > I still need to have a look on the tests. Please find enclosed v19 that also contains the changes related to your TAP tests remarks, mainly: - get rid of 024 and add more tests in 026 (025 has been used in the meantime) - test that logical decoding actually produces useful and correct results - test standby promotion and logical decoding behavior once done - useless "use" removal - check_confl_logicalslot() function removal - rewrote make_slot_active() to make use of poll_query_until() and timeout - remove the useless eval() - remove the "Catalog xmins should advance after standby logical slot fetches the changes" test One thing that's not clear to me is your remark "There's also no test for a recovery conflict due to row removal": Don't you think that the "vacuum full" conflict test is enough? if not, what kind of additional tests would you like to see? > > There is also the 10s delay to work on, do you already have an idea on > how we should handle it? > > Thanks > > Bertrand > Thanks Bertrand
Attachment
pgsql-hackers by date: