RE: Logical replication timeout - Mailing list pgsql-hackers

From Hayato Kuroda (Fujitsu)
Subject RE: Logical replication timeout
Date
Msg-id OSCPR01MB1496698CA14BD0DE49261819AF5022@OSCPR01MB14966.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Logical replication timeout  (RECHTÉ Marc <marc.rechte@meteo.fr>)
Responses Re: Logical replication timeout
List pgsql-hackers
Dear Marc,

> For some unknown reason (probably a very big transaction at the source), we
> experienced a logical decoding breakdown,
...
> When those timeout occurred, the sender was still busy deleting files from
> data/pg_replslot/bdcpb21_sene, accumulating more than 6 millions small
> ".spill" files. It seems this very long pause is at cleanup stage were PG is
> blindly trying to delete those files.

Thanks for reporting the issue! We will discuss and provide fix if possible.
Apart from the code fix, I have some comments from another perspective.

> The publisher is PostgreSQL 15.6
> The subscriber is PostgreSQL 14.5

Can you enable the parameter "streaming" to on on your system [1]? It allows to
stream the in-progress transactions to the subscriber side. I feel this can avoid
the case that there are many .spill files on the publisher side.

Another approach is to tune the logical_decoding_work_mem parameter [2].
This specifies the maximum amount of memory used by the logical decoding, and
some changes are spilled when it exceeds the limitation. Naively, this setting
can reduce the number of files.

I hope both settings can optimize your system.

[1]: https://www.postgresql.org/docs/14/sql-createsubscription.html
[2]: https://www.postgresql.org/docs/14/runtime-config-resource.html#GUC-LOGICAL-DECODING-WORK-MEM

Best regards,
Hayato Kuroda
FUJITSU LIMITED


pgsql-hackers by date:

Previous
From: "章晨曦@易景科技"
Date:
Subject: Re: Re: transaction lost when delete clog file after normal shutdown
Next
From: RECHTÉ Marc
Date:
Subject: Re: Logical replication timeout