Home > mailing lists

RE: logical apply worker's lock waits in subscriber can stall checkpointer in publisher - Mailing list pgsql-hackers

From	Hayato Kuroda (Fujitsu)
Subject	RE: logical apply worker's lock waits in subscriber can stall checkpointer in publisher
Date	January 29 10:03:29
Msg-id	TY7PR01MB145549C44DB50705E0E3D3DCAF59EA@TY7PR01MB14554.jpnprd01.prod.outlook.com Whole thread Raw
In response to	logical apply worker's lock waits in subscriber can stall checkpointer in publisher (Fujii Masao <masao.fujii@gmail.com>)
Responses	Re: logical apply worker's lock waits in subscriber can stall checkpointer in publisher
List	pgsql-hackers

Tree view

Dear Fujii-san,

> While reviewing the patch at [1], I noticed a case where lock waits on
> a logical apply worker in the subscriber can cause the checkpointer on
> the publisher to stall. This seems like unexpected behavior and may
> need to be addressed.
> 
> The issue can occur as follows:
> 
> 1. A logical apply worker on the subscriber blocks waiting for a lock.
> 2. Because the apply worker cannot receive further messages, the walsender's
>     send buffer on the publisher becomes full.
> 3. If the walsender then encounters a max_slot_wal_keep_size error,
>     it attempts to send an error message to the subscriber before exiting.
>     However, with a full send buffer, the walsender blocks while trying to
>     send this message.
> 4. The checkpointer on the publisher calls InvalidateObsoleteReplicationSlots()
>     and waits for the slot to be released. Since the walsender is stuck and
>     the slot is not released, the checkpointer also becomes stuck.

I confirmed this could happen if the max_slot_wal_keep_size is enabled
(in other words, the value is not -1).
Per my test, wal_sender_timeout cannot work well because the process is stuck at
the lower layer, but tcp_user_timeout can terminate the process. Can we mention
the workaround in the doc instead of fixing the code?

It won't work for a Unix domain socket connection, but it's not realistic for the
production stage.

Best regards,
Hayato Kuroda
FUJITSU LIMITED

pgsql-hackers by date:

From: Richard Guo
Date: 29 January, 09:44:24
Subject: Re: pg_plan_advice

From: Michael Paquier
Date: 29 January, 10:25:31
Subject: Re: [PATCH] Refactor *_abbrev_convert() functions

RE: logical apply worker's lock waits in subscriber can stall checkpointer in publisher - Mailing list pgsql-hackers

Previous

Next