Re: Minimal logical decoding on standbys - Mailing list pgsql-hackers

From Ibrar Ahmed
Subject Re: Minimal logical decoding on standbys
Date
Msg-id CALtqXTfXsyq-G7x3Ti3bm4KgpepqZA_-ZukzJ5W5vFLjRrhGSg@mail.gmail.com
Whole thread Raw
In response to Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Responses Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
List pgsql-hackers


On Fri, Jul 16, 2021 at 1:07 PM Drouvot, Bertrand <bdrouvot@amazon.com> wrote:
Hi Andres,

On 6/22/21 12:38 PM, Drouvot, Bertrand wrote:
> Hi Andres,
>
> On 6/14/21 7:41 AM, Drouvot, Bertrand wrote:
>> Hi Andres,
>>
>> On 4/8/21 5:47 AM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
>>>> While working on this I found a, somewhat substantial, issue:
>>>>
>>>> When the primary is idle, on the standby logical decoding via
>>>> walsender
>>>> will typically not process the records until further WAL writes
>>>> come in
>>>> from the primary, or until a 10s lapsed.
>>>>
>>>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
>>>> increase, but gets woken up by walreceiver when new WAL has been
>>>> flushed. Which means that typically walsenders will get woken up at
>>>> the
>>>> same time that the startup process will be - which means that by the
>>>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
>>>> that the startup process already replayed the record and updated
>>>> XLogCtl->lastReplayedEndRecPtr.
>>>>
>>>> I think fixing this would require too invasive changes at this
>>>> point. I
>>>> think we might be able to live with 10s delay issue for one
>>>> release, but
>>>> it sure is ugly :(.
>>> This is indeed pretty painful. It's a lot more regularly occuring if
>>> you
>>> either have a slot disk, or you switch around the order of
>>> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
>>>
>>> - There's about which timeline to use. If you use pg_recvlogical and
>>> you
>>>    restart the server, you'll see errors like:
>>>
>>>    pg_recvlogical: error: unexpected termination of replication
>>> stream: ERROR:  requested WAL segment 000000000000000000000003 has
>>> already been removed
>>>
>>>    the real filename is 000000010000000000000003 - i.e. the timeline is
>>>    0.
>>>
>>>    This isn't too hard to fix, but definitely needs fixing.
>>
>> Thanks, nice catch!
>>
>> From what I have seen, we are not going through InitXLOGAccess() on a
>> Standby and in some cases (like the one you mentioned)
>> StartLogicalReplication() is called without IdentifySystem() being
>> called previously: this lead to ThisTimeLineID still set to 0.
>>
>> I am proposing a fix in the attached v18 by adding a check in
>> StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved.
>>
>>>
>>> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
>>>    leading us to drop a slot that has been created since we signalled a
>>>    recovery conflict.  See
>>> https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de
>>>
>>>    for some very similar issues.
>>
>> I have rewritten this part by following the same logic as the one
>> used in 96540f80f8 (the commit linked to the thread you mentioned).
>>
>>>
>>> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to
>>>    just drop the logical slots. Instead we should just mark them as
>>>    invalid, like InvalidateObsoleteReplicationSlots().
>>
>> Makes fully sense and done that way in the attached patch.
>>
>> I am setting the slot's data.xmin and data.catalog_xmin as
>> InvalidTransactionId to mark the slot(s) as invalid in case of conflict.
>>
>>> - There's no tests covering timeline switches, what happens if
>>> there's a
>>>    promotion if logical decoding is currently ongoing.
>>
>> I'll now work on the tests.
>>
>>>
>>> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error
>>>    message is not good (and I've complained about it before...).
>>
>> I changed it and made it more simple.
>>
>> I also removed the details around mentioning xmin or catalog xmin (as
>> I am not sure of the added value and they are currently also not
>> mentioned during standby recovery snapshot conflict).
>>
>>>
>>> Unfortunately I think the things I have found are too many for me to
>>> address within the given time. I'll send a version with a somewhat
>>> polished set of the changes I made in the next few days...
>>
>> Thanks for the review and feedback.
>>
>> Please find enclosed v18 with the changes I worked on.
>>
>> I still need to have a look on the tests.
>
> Please find enclosed v19 that also contains the changes related to
> your TAP tests remarks, mainly:
>
> - get rid of 024 and add more tests in 026 (025 has been used in the
> meantime)
>
> - test that logical decoding actually produces useful and correct results
>
> - test standby promotion and logical decoding behavior once done
>
> - useless "use" removal
>
> - check_confl_logicalslot() function removal
>
> - rewrote make_slot_active() to make use of poll_query_until() and
> timeout
>
> - remove the useless eval()
>
> - remove the "Catalog xmins should advance after standby logical slot
> fetches the changes" test
>
> One thing that's not clear to me is your remark "There's also no test
> for a recovery conflict due to row removal": Don't you think that the
> "vacuum full" conflict test is enough? if not, what kind of additional
> tests would you like to see?
>
>>
>> There is also the 10s delay to work on, do you already have an idea
>> on how we should handle it?
>>
>> Thanks
>>
>> Bertrand
>>
> Thanks
>
> Bertrand
>
Please find enclosed v20 a needed rebase (nothing serious worth
mentioning) of v19.

FWIW, just to sum up that v19 (and so v20):

- contained the changes (see details above) related to your TAP tests
remarks

- contained the changes (see details above) related to your code remarks

There is still the 10s delay thing that need work: do you already have
an idea on how we should handle it?

And still one thing that's not clear to me is your remark "There's also
no test for a recovery conflict due to row removal": Don't you think
that the "vacuum full" conflict test is enough? if not, what kind of
additional tests would you like to see?

Thanks

Bertrand





The patch does not apply and an updated patch is required.
patching file src/include/replication/slot.h
Hunk #1 FAILED at 214.
1 out of 2 hunks FAILED -- saving rejects to file src/include/replication/slot.h.rej



--
Ibrar Ahmed

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: badly calculated width of emoji in psql
Next
From: Ibrar Ahmed
Date:
Subject: Re: Re[3]: On login trigger: take three