Re: Performance implications of 8K pread()s - Mailing list pgsql-performance

From Thomas Munro
Subject Re: Performance implications of 8K pread()s
Date
Msg-id CA+hUKG+Zr8FBi3pojzU3x4k5XyenHhZ_mdWQ7pKwGeFT2+Tq+Q@mail.gmail.com
Whole thread Raw
In response to Performance implications of 8K pread()s  (Dimitrios Apostolou <jimis@gmx.net>)
Responses Re: Performance implications of 8K pread()s  (Thomas Munro <thomas.munro@gmail.com>)
Re: Performance implications of 8K pread()s  (Dimitrios Apostolou <jimis@gmx.net>)
List pgsql-performance
On Wed, Jul 12, 2023 at 1:11 AM Dimitrios Apostolou <jimis@gmx.net> wrote:
> Note that I suspect my setup being related, (btrfs compression behaving
> suboptimally) since the raw device can give me up to 1GB/s rate. It is however
> evident that reading in bigger chunks would mitigate such setup inefficiencies.
> On a system that reads are already optimal and the read rate remains the same,
> then bigger block size would probably reduce the sys time postgresql consumes
> because of the fewer system calls.

I don't know about btrfs but maybe it can be tuned to prefetch
sequential reads better...

> So would it make sense for postgres to perform reads in bigger blocks? Is it
> easy-ish to implement (where would one look for that)? Or must the I/O unit be
> tied to postgres' page size?

It is hard to implement.  But people are working on it.  One of the
problems is that the 8KB blocks that we want to read data into aren't
necessarily contiguous so you can't just do bigger pread() calls
without solving a lot more problems first.  The project at
https://wiki.postgresql.org/wiki/AIO aims to deal with the
"clustering" you seek plus the "gathering" required for non-contiguous
buffers by allowing multiple block-sized reads to be prepared and
collected on a pending list up to some size that triggers merging and
submission to the operating system at a sensible rate, so we can build
something like a single large preadv() call.  In the current
prototype, if io_method=worker then that becomes a literal preadv()
call running in a background "io worker" process, but it could also be
OS-specific stuff (io_uring, ...) that starts an asynchronous IO
depending on settings.  If you take that branch and run your test you
should see 128KB-sized preadv() calls.



pgsql-performance by date:

Previous
From: Philip Semanchuk
Date:
Subject: Entire index scanned, but only when in SQL function?
Next
From: Thomas Munro
Date:
Subject: Re: Performance implications of 8K pread()s