Re: AIO v2.5 - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: AIO v2.5
Date
Msg-id 554d4da2-9392-4098-9c91-4c34f222abbb@gmail.com
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

31.03.2025 02:46, Andres Freund wrote:
I've pushed most of these after some very light further editing.  Yay.  Thanks
a lot for all the reviews!

So far the buildfarm hasn't been complaining, but it's early days.

I found one complaint against commit 12ce89fd0, expressed as:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2025-04-08%2001%3A36%3A41

149/334 postgresql:recovery / recovery/032_relfilenode_reuse                            ERROR           206.41s   (exit status 255 or signal 127 SIGinvalid)

[01:41:53.249](0.620s) ok 8 - standby: post move contents as expected
[01:44:55.175](181.927s) Bail out!  aborting wait: program timed out
stream contents: >><<
pattern searched for: (?^m:warmed_buffers)

032_relfilenode_reuse_standby.log contains:
2025-04-08 01:41:54.674 UTC [4013024][startup][33/0:0] DEBUG:  waiting for all backends to process ProcSignalBarrier generation 7
2025-04-08 01:41:54.674 UTC [4013024][startup][33/0:0] CONTEXT:  WAL redo at 0/43E5360 for Database/DROP: dir 1663/50001
2025-04-08 01:41:54.670 UTC [4013251][client backend][0/2:0] DEBUG:  waiting for self with 0 pending
2025-04-08 01:41:54.674 UTC [4013251][client backend][0/2:0] ERROR:  no free IOs despite no in-flight IOs
2025-04-08 01:41:54.674 UTC [4013251][client backend][0/2:0] STATEMENT:  SELECT SUM(pg_prewarm(oid)) warmed_buffers FROM pg_class WHERE pg_relation_filenode(oid) != 0;

I could reproduce this error with the following TEMP_CONFIG:
log_min_messages = DEBUG4

log_connections = on
log_disconnections = on
log_statement = 'all'
log_line_prefix = '%m [%d][%p:%l][%b] %q[%a] '

fsync = on
io_method=io_uring
backtrace_functions = 'pgaio_io_wait_for_free'

When running 032_relfilenode_reuse.pl in a loop, I got failures on
iterations 10, 18, 6:
2025-04-20 15:56:01.310 CEST [][1517068:122][startup] DEBUG:  updated min recovery point to 0/4296E90 on timeline 1
2025-04-20 15:56:01.310 CEST [][1517068:123][startup] CONTEXT:  WAL redo at 0/4296E48 for Transaction/COMMIT: 2025-04-20 15:56:01.09363+02; inval msgs: catcache 21; sync
2025-04-20 15:56:01.310 CEST [][1517068:124][startup] DEBUG:  waiting for all backends to process ProcSignalBarrier generation 5
2025-04-20 15:56:01.310 CEST [][1517068:125][startup] CONTEXT:  WAL redo at 0/4296E90 for Database/DROP: dir 16409/50001
2025-04-20 15:56:01.312 CEST [postgres][1517075:144][client backend] [032_relfilenode_reuse.pl] DEBUG:  waiting for self with 0 pending
2025-04-20 15:56:01.312 CEST [postgres][1517075:145][client backend] [032_relfilenode_reuse.pl] BACKTRACE:
pgaio_io_wait_for_free at aio.c:703:2
 (inlined by) pgaio_io_acquire at aio.c:186:3
AsyncReadBuffers at bufmgr.c:1854:9
StartReadBuffersImpl at bufmgr.c:1425:18
 (inlined by) StartReadBuffers at bufmgr.c:1497:9
read_stream_start_pending_read at read_stream.c:333:25
read_stream_look_ahead at read_stream.c:475:3
read_stream_next_buffer at read_stream.c:840:6
pg_prewarm at pg_prewarm.c:214:10
ExecInterpExpr at execExprInterp.c:927:7
ExecEvalExprNoReturn at executor.h:447:2
...
2025-04-20 15:56:01.312 CEST [][1517068:126][startup] DEBUG:  finished waiting for all backends to process ProcSignalBarrier generation 5
2025-04-20 15:56:01.312 CEST [][1517068:127][startup] CONTEXT:  WAL redo at 0/4296E90 for Database/DROP: dir 16409/50001
2025-04-20 15:56:01.314 CEST [postgres][1517075:146][client backend] [032_relfilenode_reuse.pl] ERROR:  no free IOs despite no in-flight IOs
2025-04-20 15:56:01.314 CEST [postgres][1517075:147][client backend] [032_relfilenode_reuse.pl] BACKTRACE:
pgaio_io_wait_for_free at aio.c:735:3
 (inlined by) pgaio_io_acquire at aio.c:186:3
AsyncReadBuffers at bufmgr.c:1854:9
StartReadBuffersImpl at bufmgr.c:1425:18
 (inlined by) StartReadBuffers at bufmgr.c:1497:9
read_stream_start_pending_read at read_stream.c:333:25
read_stream_look_ahead at read_stream.c:475:3
read_stream_next_buffer at read_stream.c:840:6
pg_prewarm at pg_prewarm.c:214:10
ExecInterpExpr at execExprInterp.c:927:7
ExecEvalExprNoReturn at executor.h:447:2
...

I configured the build with:
CPPFLAGS="-O1" ./configure --enable-injection-points --enable-cassert --enable-debug --enable-tap-tests --with-liburing

I also encountered another failure when running this test:
t/032_relfilenode_reuse.pl .. 1/? Bailout called.  Further testing stopped:  aborting wait: program died

Core was generated by `postgres: standby: law postgres [local] SELECT                   '.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44

(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000700a5a24527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x0000700a5a2288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000630ff2c5dcea in errfinish (filename=filename@entry=0x630ff2cdd139 "aio.c", lineno=lineno@entry=559,
    funcname=funcname@entry=0x630ff2e85da8 <__func__.9> "pgaio_io_wait") at elog.c:599
#6  0x0000630ff2aae79c in pgaio_io_wait (ioh=ioh@entry=0x700a57b59700, ref_generation=3972) at aio.c:559
#7  0x0000630ff2aaefab in pgaio_io_wait_for_free () at aio.c:771
#8  pgaio_io_acquire (resowner=0x63100114dee8, ret=ret@entry=0x6310011efec8) at aio.c:186
#9  0x0000630ff2ab9137 in AsyncReadBuffers (operation=operation@entry=0x6310011efe88, nblocks_progress=nblocks_progress@entry=0x7ffd8d262034) at bufmgr.c:1854
#10 0x0000630ff2abc6cc in StartReadBuffersImpl (allow_forwarding=true, flags=0, nblocks=0x7ffd8d262034, blockNum=0, buffers=0x6310011efe3c,
    operation=0x6310011efe88) at bufmgr.c:1425
#11 StartReadBuffers (operation=0x6310011efe88, buffers=buffers@entry=0x6310011efe3c, blockNum=0, nblocks=nblocks@entry=0x7ffd8d262034, flags=flags@entry=0)
    at bufmgr.c:1497
#12 0x0000630ff2ab2f3f in read_stream_start_pending_read (stream=stream@entry=0x6310011efde0) at read_stream.c:328
#13 0x0000630ff2ab3363 in read_stream_look_ahead (stream=stream@entry=0x6310011efde0) at read_stream.c:475
#14 0x0000630ff2ab37e5 in read_stream_next_buffer (stream=stream@entry=0x6310011efde0, per_buffer_data=per_buffer_data@entry=0x0) at read_stream.c:837
#15 0x0000700a5adae60c in pg_prewarm (fcinfo=<optimized out>) at pg_prewarm.c:214
#16 0x0000630ff29004cc in ExecInterpExpr (state=0x6310012239c8, econtext=0x6310011e1920, isnull=0x0) at execExprInterp.c:926
#17 0x0000630ff291bfb4 in ExecEvalExprNoReturn (econtext=<optimized out>, state=<optimized out>) at ../../../src/include/executor/executor.h:445
#18 ExecEvalExprNoReturnSwitchContext (econtext=<optimized out>, state=<optimized out>) at ../../../src/include/executor/executor.h:486
#19 advance_aggregates (aggstate=aggstate@entry=0x6310011e14f8) at nodeAgg.c:820
#20 0x0000630ff291db18 in agg_retrieve_direct (aggstate=0x6310011e14f8) at nodeAgg.c:2540
#21 ExecAgg (pstate=0x6310011e14f8) at nodeAgg.c:2265
#22 0x0000630ff290deef in ExecProcNodeFirst (node=0x6310011e14f8) at execProcnode.c:469
...

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: Unicode full case mapping: PG_UNICODE_FAST, and standard-compliant UCS_BASIC
Next
From: Ivan Kush
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER