Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys
Date
Msg-id 8d263a6b-a99f-0f9b-43ba-e8967c386f2b@amazon.com
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: Minimal logical decoding on standbys
List pgsql-hackers
Hi Andres,

On 6/14/21 7:41 AM, Drouvot, Bertrand wrote:
> Hi Andres,
>
> On 4/8/21 5:47 AM, Andres Freund wrote:
>> Hi,
>>
>> On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
>>> While working on this I found a, somewhat substantial, issue:
>>>
>>> When the primary is idle, on the standby logical decoding via walsender
>>> will typically not process the records until further WAL writes come in
>>> from the primary, or until a 10s lapsed.
>>>
>>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
>>> increase, but gets woken up by walreceiver when new WAL has been
>>> flushed. Which means that typically walsenders will get woken up at the
>>> same time that the startup process will be - which means that by the
>>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
>>> that the startup process already replayed the record and updated
>>> XLogCtl->lastReplayedEndRecPtr.
>>>
>>> I think fixing this would require too invasive changes at this point. I
>>> think we might be able to live with 10s delay issue for one release, 
>>> but
>>> it sure is ugly :(.
>> This is indeed pretty painful. It's a lot more regularly occuring if you
>> either have a slot disk, or you switch around the order of
>> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
>>
>> - There's about which timeline to use. If you use pg_recvlogical and you
>>    restart the server, you'll see errors like:
>>
>>    pg_recvlogical: error: unexpected termination of replication 
>> stream: ERROR:  requested WAL segment 000000000000000000000003 has 
>> already been removed
>>
>>    the real filename is 000000010000000000000003 - i.e. the timeline is
>>    0.
>>
>>    This isn't too hard to fix, but definitely needs fixing.
>
> Thanks, nice catch!
>
> From what I have seen, we are not going through InitXLOGAccess() on a 
> Standby and in some cases (like the one you mentioned) 
> StartLogicalReplication() is called without IdentifySystem() being 
> called previously: this lead to ThisTimeLineID still set to 0.
>
> I am proposing a fix in the attached v18 by adding a check in 
> StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved.
>
>>
>> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
>>    leading us to drop a slot that has been created since we signalled a
>>    recovery conflict.  See
>> https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de
>>    for some very similar issues.
>
> I have rewritten this part by following the same logic as the one used 
> in 96540f80f8 (the commit linked to the thread you mentioned).
>
>>
>> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to
>>    just drop the logical slots. Instead we should just mark them as
>>    invalid, like InvalidateObsoleteReplicationSlots().
>
> Makes fully sense and done that way in the attached patch.
>
> I am setting the slot's data.xmin and data.catalog_xmin as 
> InvalidTransactionId to mark the slot(s) as invalid in case of conflict.
>
>> - There's no tests covering timeline switches, what happens if there's a
>>    promotion if logical decoding is currently ongoing.
>
> I'll now work on the tests.
>
>>
>> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error
>>    message is not good (and I've complained about it before...).
>
> I changed it and made it more simple.
>
> I also removed the details around mentioning xmin or catalog xmin (as 
> I am not sure of the added value and they are currently also not 
> mentioned during standby recovery snapshot conflict).
>
>>
>> Unfortunately I think the things I have found are too many for me to
>> address within the given time. I'll send a version with a somewhat
>> polished set of the changes I made in the next few days...
>
> Thanks for the review and feedback.
>
> Please find enclosed v18 with the changes I worked on.
>
> I still need to have a look on the tests.

Please find enclosed v19 that also contains the changes related to your 
TAP tests remarks, mainly:

- get rid of 024 and add more tests in 026 (025 has been used in the 
meantime)

- test that logical decoding actually produces useful and correct results

- test standby promotion and logical decoding behavior once done

- useless "use" removal

- check_confl_logicalslot() function removal

- rewrote make_slot_active() to make use of poll_query_until() and timeout

- remove the useless eval()

- remove the "Catalog xmins should advance after standby logical slot 
fetches the changes" test

One thing that's not clear to me is your remark "There's also no test 
for a recovery conflict due to row removal": Don't you think that the 
"vacuum full" conflict test is enough? if not, what kind of additional 
tests would you like to see?

>
> There is also the 10s delay to work on, do you already have an idea on 
> how we should handle it?
>
> Thanks
>
> Bertrand
>
Thanks

Bertrand



Attachment

pgsql-hackers by date:

Previous
From: "houzj.fnst@fujitsu.com"
Date:
Subject: RE: [bug?] Missed parallel safety checks, and wrong parallel safety
Next
From: "Finnerty, Jim"
Date:
Subject: Re:disfavoring unparameterized nested loops