Re: AIO v2.5 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: AIO v2.5 |
Date | |
Msg-id | u3otgmy67yiltqw4533wqqhfbpzrf5ds7pqowssvw6w27klb7c@gbjnkuutrsza Whole thread Raw |
In response to | Re: AIO v2.5 (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
List | pgsql-hackers |
Hi, On 2025-03-06 12:36:43 +0100, Jakub Wartak wrote: > On Tue, Mar 4, 2025 at 8:00 PM Andres Freund <andres@anarazel.de> wrote: > > Questions: > > > > - My current thinking is that we'd set io_method = worker initially - so we > > actually get some coverage - and then decide whether to switch to > > io_method=sync by default for 18 sometime around beta1/2. Does that sound > > reasonable? > > IMHO, yes, good idea. Anyway final outcomes partially will depend on > how many other stream-consumers be committed, right? I think it's more whether we find cases where it performs substantially worse with the read stream users that exists. The behaviour for non-read-stream IO shouldn't change. > > - To allow io_workers to be PGC_SIGHUP, and to eventually allow to > > automatically in/decrease active workers, the max number of workers (32) is > > always allocated. That means we use more semaphores than before. I think > > that's ok, it's not 1995 anymore. Alternatively we can add a > > "io_workers_max" GUC and probe for it in initdb. > > Wouldn't that matter only on *BSDs? Yea, NetBSD and OpenBSD only, I think. > > - pg_stat_aios currently has the IO Handle flags as dedicated columns. Not > > sure that's great? > > > > They could be an enum array or such too? That'd perhaps be a bit more > > extensible? OTOH, we don't currently use enums in the catalogs and arrays > > are somewhat annoying to conjure up from C. > > s/pg_stat_aios/pg_aios/ ? :^) Ooops, yes. > It looks good to me as it is. > Anyway it > is a debugging view - perhaps mark it as such in the docs - so there > is no stable API for that and shouldn't be queried by any software > anyway. Cool > > - Documentation for pg_stat_aios. > > pg_aios! :) > > So, I've taken aio-2 branch from Your's github repo for a small ride > on legacy RHEL 8.7 with dm-flakey to inject I/O errors. This is more a > question: perhaps IO workers should auto-close fd on errors or should > we use SIGUSR2 for it? The scenario is like this: When you say "auto-close", you mean that one IO error should trigger *all* workers to close their FDs? > so usual stuff with kernel remounting it RO, but here's the dragon > with io_method=worker: > > # mount -o remount,rw /flakey/ > mount: /flakey: cannot remount /dev/mapper/flakey read-write, is > write-protected. > # umount /flakey # to fsck or just mount rw again > umount: /flakey: target is busy. > # lsof /flakey/ > COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME > postgres 103483 postgres 14u REG 253,2 36249600 17 > /flakey/tblspace/PG_18_202503031/5/24586 > postgres 103484 postgres 6u REG 253,2 36249600 17 > /flakey/tblspace/PG_18_202503031/5/24586 > postgres 103485 postgres 6u REG 253,2 36249600 17 > /flakey/tblspace/PG_18_202503031/5/24586 > > Those 10348[345] are IO workers, they have still open fds and there's > no way to close those without restart -- well without close() > injection probably via gdb. The same is already true with bgwriter, checkpointer etc? > pg_terminate_backend() on those won't work. The only thing that works seems > to be sending SIGUSR2 Sending SIGINT works. > , but is that safe [there could be some errors after pwrite() ]? Could you expand on that? > With > io_worker=sync just quitting the backend of course works. Not sure > what your thoughts are because any other bgworker could be having open > fds there. It's a very minor thing. Otherwise that outage of separate > tablespace (rarely used) would potentially cause inability to fsck > there and lower the availability of the DB (due to potential restart > required). I think a crash-restart is the only valid thing to get out of a scenario like that, independent of AIO: - If there had been any writes we need to perform crash recovery anyway, to recreate those writes - If there just were reads, it's good to restart as well, as otherwise there might be pages in the buffer pool that don't exist on disk anymore, due to the errors. Greetings, Andres Freund
pgsql-hackers by date: