31.03.2026 13:54, Michael Paquier wrote:
> On Tue, Mar 31, 2026 at 10:00:00AM +0300, Alexander Lakhin wrote:
>> So the backend is not completely stuck, but CommitTransactionCommand()
>> may take more than 5 seconds under some circumstances (maybe it's worth
>> investigating which exactly).
> One could blame slow hardware, difficult to say, and I'm puzzled by
> these periodic bumps that don't seem to happen elsewhere.
I managed to get the backtrace of such a sluggish backend:
Using host libthread_db library "/lib/riscv64-linux-gnu/libthread_db.so.1".
0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
Id Target Id Frame
* 1 Thread 0x3fb2a4c620 (LWP 564194) "postgres" 0x0000003fb1f4cc26 in posix_fadvise64 () from
/lib/riscv64-linux-gnu/libc.so.6
#0 0x0000003fb1f4cc26 in posix_fadvise64 () from /lib/riscv64-linux-gnu/libc.so.6
#1 0x0000002abef79444 in XLogFileClose () at xlog.c:3672
#2 0x0000002abef7cc66 in XLogWrite (WriteRqst=..., tli=tli@entry=1, flexible=flexible@entry=false) at xlog.c:2356
#3 0x0000002abef7dbfc in XLogFlush (record=33561688) at xlog.c:2892
#4 0x0000002abef77976 in RecordTransactionCommit () at xact.c:1516
#5 CommitTransaction () at xact.c:2379
#6 0x0000002abef78938 in CommitTransactionCommandInternal () at xact.c:3224
#7 0x0000002abef78acc in CommitTransactionCommand () at xact.c:3185
#8 0x0000003fb2a3ed88 in initialize_worker_spi (table=0x2abf8bf358) at worker_spi.c:132
#9 worker_spi_main (main_arg=<optimized out>) at worker_spi.c:181
....
(Three test runs produced the same stack trace.)
I think this can explain slow CommitTransactionCommand() and why it
happens not every time. Regarding other animals, I guess they can
experience the same bumps but not exceeding 5 seconds (50 tries). Thus,
from my understanding, for the failure to happen, we need to have slow
storage and initialize_worker_spi() -> CommitTransactionCommand() reaching
XLogFileClose().
Best regards,
Alexander