Re: AIO v2.5 - Mailing list pgsql-hackers

From Alexander Lakhin
Subject Re: AIO v2.5
Date
Msg-id 96abefe8-fa72-41f5-8840-0517125c24e3@gmail.com
Whole thread Raw
In response to Re: AIO v2.5  (Alexander Lakhin <exclusion@gmail.com>)
Responses Re: AIO v2.5
List pgsql-hackers
Hello Andres,

07.04.2025 22:10, Alexander Lakhin wrote:
I ran it for a while in a VM, it hasn't triggered yet. Neither on xfs nor on
tmpfs.

Before sharing the script I tested it on two my machines, but I had
anticipated that the error can be hard to reproduce. Will try to reduce
the reproducer...

I've managed to reduce it to the following:
ulimit -n 4096

echo "
fsync = off
autovacuum = off

checkpoint_timeout = 30s

io_max_concurrency = 10
io_method = io_uring
" >> $PGDATA/postgresql.conf

pg_ctl -l server.log start

for i in `seq 1000`; do
  numjobs=$((20 + $RANDOM % 60))
  echo "iteration $i (jobs: $numjobs)"
  date
  for ((j=1;j<=numjobs;j++)); do
    (
      createdb db$j;
      for ((n=1;n<=50;n++)); do
        cat << EOF | psql -d db$j -a >>/dev/null 2>&1
DROP TABLE IF EXISTS tenk1;
CREATE TABLE tenk1 (
    unique1     int4,
    unique2     int4,
    two         int4,
    four        int4,
    ten         int4,
    twenty      int4,
    hundred     int4,
    thousand    int4,
    twothousand int4,
    fivethous   int4,
    tenthous    int4,
    odd         int4,
    even        int4,
    stringu1    name,
    stringu2    name,
    string4     name
);
COPY tenk1 FROM '.../src/test/regress/data/tenk.data';
EOF
      done;
    ) &
  done
  wait
 
  for ((j=1;j<=numjobs;j++)); do dropdb db$j & done
  wait
  grep -A3 -E '(ERROR|could not read blocks )' server.log && break;
done

pg_ctl stop

It fails for me as below:
iteration 13 (jobs: 25)
Sun Apr 13 05:31:47 AM UTC 2025
iteration 14 (jobs: 67)
Sun Apr 13 05:31:50 AM UTC 2025
dropdb: error: database removal failed: ERROR:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-13 05:31:58.930 UTC [1153451] LOG:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-13 05:31:58.930 UTC [1153451] CONTEXT:  completing I/O on behalf of process 1153456
2025-04-13 05:31:58.930 UTC [1153451] STATEMENT:  DROP DATABASE db5;
2025-04-13 05:31:58.930 UTC [1153456] ERROR:  could not read blocks 0..0 in file "global/1213": Operation canceled
2025-04-13 05:31:58.930 UTC [1153456] STATEMENT:  DROP DATABASE db6;
2025-04-13 05:31:58.931 UTC [1034758] LOG:  checkpoint complete: wrote 3 buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s, average=0.000 s; distance=18 kB, estimate=458931 kB; lsn=16/54589E08, redo lsn=16/54586F88
2025-04-13 05:31:58.931 UTC [1034758] LOG:  checkpoint starting: immediate force wait


I reproduced this error on three different machines (all are running
Ubuntu 24.04, two with kernel version 6.8, one with 6.11), with PGDATA
located on tmpfs.

Best regards,
Alexander Lakhin
Neon (https://neon.tech)

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Adding error messages to a few slash commands
Next
From: Pavel Luzanov
Date:
Subject: Re: Adding error messages to a few slash commands