Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring - Mailing list pgsql-hackers

From Andres Freund
Subject Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring
Date
Msg-id 7bduf2aqh6ygz7qugmb65ohczozeed36oscviebhjcvussjqt4@5fcoh7427txo
Whole thread Raw
Responses Re: postmaster uses more CPU in 18 beta1 with io_method=io_uring
List pgsql-hackers
Hi,

On 2025-06-03 12:24:38 -0700, MARK CALLAGHAN wrote:
> When measuring the time to create a connection, it is ~2.3X longer with
> io_method=io_uring then with io_method=sync (6.9ms vs 3ms), and the
> postmaster process uses ~3.5X more CPU to create connections.

I can reproduce that - the reason for the slowdown is that we create one
io_uring instance for each potential process, and the way we create them
creates one mmap()ed region for each potential process.  That creates extra
overhead, particularly when child processes exit.


> The reproduction case so far is my usage of the Insert Benchmark on a large
> server with 48 cores. I need to fix the benchmark client -- today it
> creates ~1000 connections/s to run a monitoring query in between every 100
> queries and the extra latency from connection create makes results worse
> for one of the benchmark steps.

Heh, yea - 1000/connections sec will influence performance regardless of this issue.


> While I can fix the benchmark client to avoid this, I am curious about the
> extra latency in connection create.
> 
> I used "perf record -e cycles -F 333 -g -p $pidof_postmaster -- sleep 30"
> but I have yet to find a big difference from the reports generated with
> that for io_method=io_uring vs =sync. It shows that much time is spent in
> the kernel dealing with the VM (page tables, etc).

I see a lot of additional time spent below
  do_group_exit->do_exit->...->unmap_vmas
which fits the theory that this is due to the number of memory mappings.

There has been a bunch of discussion around this on mastodon, particularly
below [1] which ended in Jens prototyping that approach [2] where Jens pointed
out that we should use
https://man7.org/linux/man-pages/man3/io_uring_queue_init_mem.3.html to avoid
creating this many memory mappings.

There are a few complications around that though - only newer kernels (>=6.5)
support the caller providing the memory for the mapping and there isn't yet a
good way to figure out how much memory needs to be provided.


I think this is a big enough pitfall that it's, obviously assuming the patch
has a sensible complexity, worth fixing this in 18. RMT, anyone, what do you
think?

Greetings,

Andres Freund

[1] https://fosstodon.org/@axboe/114630982449670090
[2] https://pastebin.com/7M3C8aFH



pgsql-hackers by date:

Previous
From: Christoph Berg
Date:
Subject: Re: pg18: Virtual generated columns are not (yet) safe when superuser selects from them
Next
From: Noah Misch
Date:
Subject: Re: Wrong security context for deferred triggers?