Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Drouvot, Bertrand
Subject Re: Minimal logical decoding on standbys
Date
Msg-id 2feee395-959b-6d53-369a-c00f8f2fa6c5@amazon.com
Whole thread Raw
In response to Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: Minimal logical decoding on standbys  (Ibrar Ahmed <ibrar.ahmad@gmail.com>)
List pgsql-hackers
Hi Andres,

On 6/22/21 12:38 PM, Drouvot, Bertrand wrote:
> Hi Andres,
>
> On 6/14/21 7:41 AM, Drouvot, Bertrand wrote:
>> Hi Andres,
>>
>> On 4/8/21 5:47 AM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
>>>> While working on this I found a, somewhat substantial, issue:
>>>>
>>>> When the primary is idle, on the standby logical decoding via 
>>>> walsender
>>>> will typically not process the records until further WAL writes 
>>>> come in
>>>> from the primary, or until a 10s lapsed.
>>>>
>>>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
>>>> increase, but gets woken up by walreceiver when new WAL has been
>>>> flushed. Which means that typically walsenders will get woken up at 
>>>> the
>>>> same time that the startup process will be - which means that by the
>>>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
>>>> that the startup process already replayed the record and updated
>>>> XLogCtl->lastReplayedEndRecPtr.
>>>>
>>>> I think fixing this would require too invasive changes at this 
>>>> point. I
>>>> think we might be able to live with 10s delay issue for one 
>>>> release, but
>>>> it sure is ugly :(.
>>> This is indeed pretty painful. It's a lot more regularly occuring if 
>>> you
>>> either have a slot disk, or you switch around the order of
>>> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
>>>
>>> - There's about which timeline to use. If you use pg_recvlogical and 
>>> you
>>>    restart the server, you'll see errors like:
>>>
>>>    pg_recvlogical: error: unexpected termination of replication 
>>> stream: ERROR:  requested WAL segment 000000000000000000000003 has 
>>> already been removed
>>>
>>>    the real filename is 000000010000000000000003 - i.e. the timeline is
>>>    0.
>>>
>>>    This isn't too hard to fix, but definitely needs fixing.
>>
>> Thanks, nice catch!
>>
>> From what I have seen, we are not going through InitXLOGAccess() on a 
>> Standby and in some cases (like the one you mentioned) 
>> StartLogicalReplication() is called without IdentifySystem() being 
>> called previously: this lead to ThisTimeLineID still set to 0.
>>
>> I am proposing a fix in the attached v18 by adding a check in 
>> StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved.
>>
>>>
>>> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
>>>    leading us to drop a slot that has been created since we signalled a
>>>    recovery conflict.  See
>>> https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de 
>>>
>>>    for some very similar issues.
>>
>> I have rewritten this part by following the same logic as the one 
>> used in 96540f80f8 (the commit linked to the thread you mentioned).
>>
>>>
>>> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to
>>>    just drop the logical slots. Instead we should just mark them as
>>>    invalid, like InvalidateObsoleteReplicationSlots().
>>
>> Makes fully sense and done that way in the attached patch.
>>
>> I am setting the slot's data.xmin and data.catalog_xmin as 
>> InvalidTransactionId to mark the slot(s) as invalid in case of conflict.
>>
>>> - There's no tests covering timeline switches, what happens if 
>>> there's a
>>>    promotion if logical decoding is currently ongoing.
>>
>> I'll now work on the tests.
>>
>>>
>>> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error
>>>    message is not good (and I've complained about it before...).
>>
>> I changed it and made it more simple.
>>
>> I also removed the details around mentioning xmin or catalog xmin (as 
>> I am not sure of the added value and they are currently also not 
>> mentioned during standby recovery snapshot conflict).
>>
>>>
>>> Unfortunately I think the things I have found are too many for me to
>>> address within the given time. I'll send a version with a somewhat
>>> polished set of the changes I made in the next few days...
>>
>> Thanks for the review and feedback.
>>
>> Please find enclosed v18 with the changes I worked on.
>>
>> I still need to have a look on the tests.
>
> Please find enclosed v19 that also contains the changes related to 
> your TAP tests remarks, mainly:
>
> - get rid of 024 and add more tests in 026 (025 has been used in the 
> meantime)
>
> - test that logical decoding actually produces useful and correct results
>
> - test standby promotion and logical decoding behavior once done
>
> - useless "use" removal
>
> - check_confl_logicalslot() function removal
>
> - rewrote make_slot_active() to make use of poll_query_until() and 
> timeout
>
> - remove the useless eval()
>
> - remove the "Catalog xmins should advance after standby logical slot 
> fetches the changes" test
>
> One thing that's not clear to me is your remark "There's also no test 
> for a recovery conflict due to row removal": Don't you think that the 
> "vacuum full" conflict test is enough? if not, what kind of additional 
> tests would you like to see?
>
>>
>> There is also the 10s delay to work on, do you already have an idea 
>> on how we should handle it?
>>
>> Thanks
>>
>> Bertrand
>>
> Thanks
>
> Bertrand
>
Please find enclosed v20 a needed rebase (nothing serious worth 
mentioning) of v19.

FWIW, just to sum up that v19 (and so v20):

- contained the changes (see details above) related to your TAP tests 
remarks

- contained the changes (see details above) related to your code remarks

There is still the 10s delay thing that need work: do you already have 
an idea on how we should handle it?

And still one thing that's not clear to me is your remark "There's also 
no test for a recovery conflict due to row removal": Don't you think 
that the "vacuum full" conflict test is enough? if not, what kind of 
additional tests would you like to see?

Thanks

Bertrand





Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: A (but copied many) typo of char-mapping tables
Next
From: Tomas Vondra
Date:
Subject: Re: data corruption hazard in reorderbuffer.c