Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | 4qk3ehe6w7x7hfrldei2hefjcb7v7nfmj2owl2ir64craqcapz@kbrao22ljxeb Whole thread Raw |
In response to | Re: AIO v2.5 (Alexander Lakhin <exclusion@gmail.com>) |
Responses |
Re: AIO v2.5
|
List | pgsql-hackers |
Hi, On 2025-04-13 09:00:01 +0300, Alexander Lakhin wrote: > 07.04.2025 22:10, Alexander Lakhin wrote: > > > I ran it for a while in a VM, it hasn't triggered yet. Neither on xfs nor on > > > tmpfs. > > > > Before sharing the script I tested it on two my machines, but I had > > anticipated that the error can be hard to reproduce. Will try to reduce > > the reproducer... > > I've managed to reduce it to the following: Thanks a lot for working on that! > [reproducer] > > It fails for me as below: > iteration 13 (jobs: 25) > Sun Apr 13 05:31:47 AM UTC 2025 > iteration 14 (jobs: 67) > Sun Apr 13 05:31:50 AM UTC 2025 > dropdb: error: database removal failed: ERROR: could not read blocks 0..0 in file "global/1213": Operation canceled > 2025-04-13 05:31:58.930 UTC [1153451] LOG: could not read blocks 0..0 in file "global/1213": Operation canceled > 2025-04-13 05:31:58.930 UTC [1153451] CONTEXT: completing I/O on behalf of process 1153456 > 2025-04-13 05:31:58.930 UTC [1153451] STATEMENT: DROP DATABASE db5; > 2025-04-13 05:31:58.930 UTC [1153456] ERROR: could not read blocks 0..0 in file "global/1213": Operation canceled > 2025-04-13 05:31:58.930 UTC [1153456] STATEMENT: DROP DATABASE db6; > 2025-04-13 05:31:58.931 UTC [1034758] LOG: checkpoint complete: wrote 3 > buffers (0.0%), wrote 0 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 > recycled; write=0.002 s, sync=0.001 s, total=0.002 s; sync files=0, > longest=0.000 s, average=0.000 s; distance=18 kB, estimate=458931 kB; > lsn=16/54589E08, redo lsn=16/54586F88 > 2025-04-13 05:31:58.931 UTC [1034758] LOG: checkpoint starting: immediate force wait Unfortunately I'm several hundred iterations in, without reproducing the issue. I'm bad at statistics, but I think that makes it rather unlikely that I will, without changing some aspect. Was this an assert enabled build? What compiler and what optimization settings did you use? Do you have huge pages configured (so that the default huge_pages=try would end up with huge pages)? So far I've been trying to use a cassert enabled build built with -O0, without huge pages. After the current test run I'll switch to cassert+-O2. > I reproduced this error on three different machines (all are running > Ubuntu 24.04, two with kernel version 6.8, one with 6.11), with PGDATA > located on tmpfs. That's another variable to try - so far I've been trying this on 6.15.0-rc1 [1]. I guess I'll have to set up a ubuntu 24.04 VM and try with that. Greetings, Andres Freund [1] I wanted to play with io_uring changes that were recently merged. Namely support for readv/writev of "fixed" buffers. That avoids needing to pin/unpin buffers while IO is ongoing, which turns out to be a noticeable bottleneck in some workloads, particularly when using 1GB huge pages.
pgsql-hackers by date: