Dear Marc,
> For some unknown reason (probably a very big transaction at the source), we
> experienced a logical decoding breakdown,
...
> When those timeout occurred, the sender was still busy deleting files from
> data/pg_replslot/bdcpb21_sene, accumulating more than 6 millions small
> ".spill" files. It seems this very long pause is at cleanup stage were PG is
> blindly trying to delete those files.
Thanks for reporting the issue! We will discuss and provide fix if possible.
Apart from the code fix, I have some comments from another perspective.
> The publisher is PostgreSQL 15.6
> The subscriber is PostgreSQL 14.5
Can you enable the parameter "streaming" to on on your system [1]? It allows to
stream the in-progress transactions to the subscriber side. I feel this can avoid
the case that there are many .spill files on the publisher side.
Another approach is to tune the logical_decoding_work_mem parameter [2].
This specifies the maximum amount of memory used by the logical decoding, and
some changes are spilled when it exceeds the limitation. Naively, this setting
can reduce the number of files.
I hope both settings can optimize your system.
[1]: https://www.postgresql.org/docs/14/sql-createsubscription.html
[2]: https://www.postgresql.org/docs/14/runtime-config-resource.html#GUC-LOGICAL-DECODING-WORK-MEM
Best regards,
Hayato Kuroda
FUJITSU LIMITED