Re: AIO v2.5 - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: AIO v2.5
Date
Msg-id 062daca9-dfad-4750-9da8-b13388301ad9@gmail.com
Whole thread Raw
In response to Re: AIO v2.5  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
Hello Andres,

14.04.2025 19:06, Andres Freund wrote:
> Unfortunately I'm several hundred iterations in, without reproducing the
> issue. I'm bad at statistics, but I think that makes it rather unlikely that I
> will, without changing some aspect.
>
> Was this an assert enabled build? What compiler and what optimization settings
> did you use? Do you have huge pages configured (so that the default
> huge_pages=try would end up with huge pages)?

Yes, I used --enable-cassert; no explicit optimization setting and no huge
pages configured. pg_config says:
CONFIGURE =  '--enable-debug' '--enable-cassert' '--enable-tap-tests' '--with-liburing'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels 
-Wmissing-format-attribute -Wimplicit-fallthrough=3 -Wcast-function-type -Wshadow=compatible-local -Wformat-security 
-fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wno-format-truncation -Wno-stringop-truncation -g -O2

Please look at the complete script attached. I've just run it and got:
iteration 56 (jobs: 44)
Tue Apr 15 06:30:52 PM CEST 2025
dropdb: error: database removal failed: ERROR:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-15 18:31:00.650 CEST [1612266] LOG:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-15 18:31:00.650 CEST [1612266] CONTEXT:  completing I/O on behalf of process 1612271
2025-04-15 18:31:00.650 CEST [1612266] STATEMENT:  DROP DATABASE db3;

I used gcc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0, but now I've also
reproduced the issue with CC=clang (18.1.3 (1ubuntu1)).

Please take a look also at the simple reproducer for the crash inside
pg_get_aios() I mentioned upthread:
for i in {1..100}; do
   numjobs=12
   echo "iteration $i"
   date
   for ((j=1;j<=numjobs;j++)); do
     ( createdb db$j; for k in {1..300}; do
         echo "CREATE TABLE t (a INT); CREATE INDEX ON t (a); VACUUM t;
               SELECT COUNT(*) >= 0 AS ok FROM pg_aios; " \
         | psql -d db$j >/dev/null 2>&1;
       done; dropdb db$j; ) &
   done
   wait
   psql -c 'SELECT 1' || break;
done

it fails for me as follows:
iteration 20
Tue Apr 15 07:21:29 PM EEST 2025
dropdb: error: connection to server on socket "/tmp/.s.PGSQL.55432" failed: No such file or directory
        Is the server running locally and accepting connections on that socket?
...
2025-04-15 19:21:30.675 EEST [3111699] LOG:  client backend (PID 3320979) was terminated by signal 11: Segmentation
fault
2025-04-15 19:21:30.675 EEST [3111699] DETAIL:  Failed process was running: SELECT COUNT(*) >= 0 AS ok FROM pg_aios;
2025-04-15 19:21:30.675 EEST [3111699] LOG:  terminating any other active server processes

>> I reproduced this error on three different machines (all are running
>> Ubuntu 24.04, two with kernel version 6.8, one with 6.11), with PGDATA
>> located on tmpfs.
> That's another variable to try - so far I've been trying this on 6.15.0-rc1
> [1].  I guess I'll have to set up a ubuntu 24.04 VM and try with that.
>
> Greetings,
>
> Andres Freund
>
>
> [1] I wanted to play with io_uring changes that were recently merged. Namely
> support for readv/writev of "fixed" buffers. That avoids needing to pin/unpin
> buffers while IO is ongoing, which turns out to be a noticeable bottleneck in
> some workloads, particularly when using 1GB huge pages.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)
Attachment

pgsql-hackers by date:

Previous
From: James Hunter
Date:
Subject: Re: BitmapHeapScan streaming read user and prelim refactoring
Next
From: Jacob Champion
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER