Hi,
On 1/19/23 10:43 AM, Drouvot, Bertrand wrote:
> Hi,
>
> On 1/19/23 3:46 AM, Andres Freund wrote:
>> Hi,
>>
>> I mean a logical walsender that starts on a standby and continues across
>> promotion of the standby.
>>
>
> Got it, thanks, will do.
>
While working on it, I noticed that with V41 a:
pg_recvlogical -S active_slot -P test_decoding -d postgres -f - --start
on the standby is getting:
pg_recvlogical: error: unexpected termination of replication stream: ERROR: could not find record while sending
logically-decodeddata: invalid record length at 0/311C438: wanted 24, got 0
pg_recvlogical: disconnected; waiting 5 seconds to try again
when the standby gets promoted (the logical decoding is able to resume correctly after the error though).
This is fixed in V42 attached (no error anymore and logical decoding through the walsender works correctly after the
promotion).
The fix is in 0003 where in logical_read_xlog_page() (as compare to V41):
- We now check if RecoveryInProgress() (instead of relying on am_cascading_walsender) to check if the standby got
promoted
- Based on this, the currTLI is being retrieved with GetXLogReplayRecPtr() or GetWALInsertionTimeLine() (so, with
GetWALInsertionTimeLine()after promotion)
- This currTLI is being used as an argument in WALRead() (instead of state->seg.ws_tli, which anyhow sounds weird as
being
compared with itself that way "tli != state->seg.ws_tli" in WALRead()). That way WALRead() discovers that the
timelinechanged and then opens the right WAL file.
Please find V42 attached.
I'll resume working on the TAP tests comments.
Regards,
--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com