Re: AIO v2.5 - Mailing list pgsql-hackers
From | Alexander Lakhin |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | a26cad59-ec3e-42ee-956a-3826f051d1f8@gmail.com Whole thread Raw |
In response to | Re: AIO v2.5 (Andres Freund <andres@anarazel.de>) |
List | pgsql-hackers |
Hello Andres, 24.04.2025 03:40, Andres Freund wrote: > Hi, > > On 2025-04-20 18:00:00 +0300, Alexander Lakhin wrote: > >> 2025-04-08 01:41:54.670 UTC [4013251][client backend][0/2:0] DEBUG: waiting for self with 0 pending > I'd like to extend this debug message with the number of in-flight IOs. I > assume nobody will mind me doing that. I printed that number for debugging and always got 1, because io_max_concurrency == 1 during that test (because of shared_buffers=1MB). With shared_buffers increased: --- a/src/test/recovery/t/032_relfilenode_reuse.pl +++ b/src/test/recovery/t/032_relfilenode_reuse.pl @@ -18,7 +18,7 @@ log_connections=on # to avoid "repairing" corruption full_page_writes=off log_min_messages=debug2 -shared_buffers=1MB +shared_buffers=16MB I got 100 iterations passed. >> I could reproduce this error with the following TEMP_CONFIG: >> log_min_messages = DEBUG4 >> >> log_connections = on >> log_disconnections = on >> log_statement = 'all' >> log_line_prefix = '%m [%d][%p:%l][%b] %q[%a] ' >> >> fsync = on >> io_method=io_uring >> backtrace_functions = 'pgaio_io_wait_for_free' > Hm. Several hundred iterations in without triggering the issue even once, with > that added config. Given your reproducer used fsync, I assume this was on a > real filesystem not tmpfs? What FS? I tried on XFS. > > very well could just be a timing dependent issue, the time for an fsync will > differ heavily betweeen devices... Yeah, it definitely depends on timing. Using that TEMP_CONFIG changes the duration of the test significantly for me: Files=1, Tests=14, 2 wallclock secs ( 0.01 usr 0.01 sys + 0.13 cusr 0.32 csys = 0.47 CPU) -> Files=1, Tests=14, 14 wallclock secs ( 0.00 usr 0.00 sys + 0.14 cusr 0.33 csys = 0.47 CPU) I reproduced the failure on ext4 (desktop-grade NVME) and on xfs (cheap SSD), on two different machines. It's also reproduced on HDD (with ext4), seemingly worse, where the duration is increased even more: I 25 # +++ tap check in src/test/recovery +++ t/032_relfilenode_reuse.pl .. ok All tests successful. Files=1, Tests=14, 28 wallclock secs ( 0.00 usr 0.00 sys + 0.15 cusr 0.39 csys = 0.54 CPU) Result: PASS I 26 # +++ tap check in src/test/recovery +++ t/032_relfilenode_reuse.pl .. 7/? Bailout called. Further testing stopped: aborting wait: program timed out >> When running 032_relfilenode_reuse.pl in a loop, I got failures on >> iterations 10, 18, 6: >> 2025-04-20 15:56:01.310 CEST [][1517068:122][startup] DEBUG: updated min recovery point to 0/4296E90 on timeline 1 >> 2025-04-20 15:56:01.310 CEST [][1517068:123][startup] CONTEXT: WAL redo at >> 0/4296E48 for Transaction/COMMIT: 2025-04-20 15:56:01.09363+02; inval msgs: >> catcache 21; sync >> 2025-04-20 15:56:01.310 CEST [][1517068:124][startup] DEBUG: waiting for all >> backends to process ProcSignalBarrier generation 5 >> 2025-04-20 15:56:01.310 CEST [][1517068:125][startup] CONTEXT: WAL redo at 0/4296E90 for Database/DROP: dir 16409/50001 >> 2025-04-20 15:56:01.312 CEST [postgres][1517075:144][client backend] >> [032_relfilenode_reuse.pl] DEBUG: waiting for self with 0 pending >> 2025-04-20 15:56:01.312 CEST [postgres][1517075:145][client backend] [032_relfilenode_reuse.pl] BACKTRACE: >> pgaio_io_wait_for_free at aio.c:703:2 >> (inlined by) pgaio_io_acquire at aio.c:186:3 >> AsyncReadBuffers at bufmgr.c:1854:9 >> StartReadBuffersImpl at bufmgr.c:1425:18 >> (inlined by) StartReadBuffers at bufmgr.c:1497:9 >> read_stream_start_pending_read at read_stream.c:333:25 >> read_stream_look_ahead at read_stream.c:475:3 >> read_stream_next_buffer at read_stream.c:840:6 >> pg_prewarm at pg_prewarm.c:214:10 >> ExecInterpExpr at execExprInterp.c:927:7 >> ExecEvalExprNoReturn at executor.h:447:2 >> ... >> 2025-04-20 15:56:01.312 CEST [][1517068:126][startup] DEBUG: finished >> waiting for all backends to process ProcSignalBarrier generation 5 >> 2025-04-20 15:56:01.312 CEST [][1517068:127][startup] CONTEXT: WAL redo at 0/4296E90 for Database/DROP: dir 16409/50001 >> 2025-04-20 15:56:01.314 CEST [postgres][1517075:146][client backend] >> [032_relfilenode_reuse.pl] ERROR: no free IOs despite no in-flight IOs > The earlier report also had a ProcSignalBarrier wait just before. That's > doesn't have to be related, the test stresses those pretty intentionally. > > But it does seem like it could be related, perhaps we somehow end processing > interrupts while issuing IOs, which then again submits IOs or something? I > don't immediately see such code, e.g. pgaio_closing_fd() is careful to iterate > over the list while waiting. > > I think pgaio_closing_fd() should get a DEBUG2 (or so) message documenting > that we are waiting for an IO due to closing an FD. pgaio_shutdown() as well, > while at it. > > > I somewhat suspect that this is the same issue that you reported earlier, that > I also couldn't repro. That involved a lot of CREATE DATABASEs, which will > trigger a lot of procsignal barriers... I'll try to come up with a more > dedicated stress test and see whether I can trigger a problem that way. Maybe it does matter that "no free IOs despite no in-flight IOs" follows "WAL redo at 0/4296E90 for Database/DROP" here (that's seemingly constant)... I've tried also: --- a/src/backend/storage/aio/aio.c +++ b/src/backend/storage/aio/aio.c @@ -705,6 +705,7 @@ pgaio_io_wait_for_free(void) pgaio_debug(DEBUG2, "waiting for self with %d pending", pgaio_my_backend->num_staged_ios); +pg_sleep(3000); and got the test failed on each run with: t/032_relfilenode_reuse.pl .. Dubious, test returned 29 (wstat 7424, 0x1d00) 032_relfilenode_reuse_primary.log 2025-04-24 05:29:27.103 CEST [3504475] 032_relfilenode_reuse.pl LOG: statement: UPDATE large SET datab = 1; 2025-04-24 05:29:27.104 CEST [3504475] 032_relfilenode_reuse.pl DEBUG: waiting for self with 0 pending 2025-04-24 05:29:27.112 CEST [3504369] DEBUG: releasing pm child slot 3 2025-04-24 05:29:27.112 CEST [3504369] LOG: client backend (PID 3504475) was terminated by signal 11: Segmentation fault 2025-04-24 05:29:27.112 CEST [3504369] DETAIL: Failed process was running: UPDATE large SET datab = 1; >> I also encountered another failure when running this test: >> t/032_relfilenode_reuse.pl .. 1/? Bailout called. Further testing stopped: aborting wait: program died >> >> Core was generated by `postgres: standby: law postgres [local] SELECT '. >> Program terminated with signal SIGABRT, Aborted. >> #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 >> >> (gdb) bt >> #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44 >> #1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78 >> #2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 >> #3 0x0000700a5a24527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 >> #4 0x0000700a5a2288ff in __GI_abort () at ./stdlib/abort.c:79 >> #5 0x0000630ff2c5dcea in errfinish (filename=filename@entry=0x630ff2cdd139 "aio.c", lineno=lineno@entry=559, >> funcname=funcname@entry=0x630ff2e85da8 <__func__.9> "pgaio_io_wait") at elog.c:599 >> #6 0x0000630ff2aae79c in pgaio_io_wait (ioh=ioh@entry=0x700a57b59700, ref_generation=3972) at aio.c:559 >> #7 0x0000630ff2aaefab in pgaio_io_wait_for_free () at aio.c:771 > Do you still have the core file for this? Or at least the generated log > message? It'd be useful to know what state the IO was in when the error was > triggered... I can reproduce it easily, so here is the message: 2025-04-24 06:10:01.924 EEST [postgres][3036378:150][client backend] [032_relfilenode_reuse.pl] PANIC: waiting for own IO in wrong state: 0 2025-04-24 06:10:01.924 EEST [postgres][3036378:151][client backend] [032_relfilenode_reuse.pl] STATEMENT: SELECT SUM(pg_prewarm(oid)) warmed_buffers FROM pg_class WHERE pg_relation_filenode(oid) != 0; (Sorry for not including it before — I observed it many times, always with state 0, so it had lost its value for me.) Best regards, Alexander Lakhin Neon (https://neon.tech)
pgsql-hackers by date: