Re: effective_io_concurrency and NVMe devices - Mailing list pgsql-hackers

From David Rowley
Subject Re: effective_io_concurrency and NVMe devices
Date
Msg-id CAApHDvpOjk-LPXvSbRcNGbuux8iw3JBwtjWDBNDkirwAC3S17g@mail.gmail.com
Whole thread Raw
In response to effective_io_concurrency and NVMe devices  (Bruce Momjian <bruce@momjian.us>)
Responses Re: effective_io_concurrency and NVMe devices
List pgsql-hackers
On Wed, 20 Apr 2022 at 14:56, Bruce Momjian <bruce@momjian.us> wrote:
> NVMe devices have a maximum queue length of 64k:

> Should we increase its maximum to 64k?  Backpatched?  (SATA has a
> maximum queue length of 256.)

I have a machine here with 1 x PCIe 3.0 NVMe SSD and also 1 x PCIe 4.0
NVMe SSD. I ran a few tests to see how different values of
effective_io_concurrency would affect performance. I tried to come up
with a query that did little enough CPU processing to ensure that I/O
was the clear bottleneck.

The test was with a 128GB table on a machine with 64GB of RAM.  I
padded the tuples out so there were 4 per page so that the aggregation
didn't have much work to do.

The query I ran was: explain (analyze, buffers, timing off) select
count(p) from r where a = 1;

Here's what I saw:

NVME PCIe 3.0 (Samsung 970 Evo 1TB)
e_i_c query_time_ms
0 88627.221
1 652915.192
5 271536.054
10 141168.986
100 67340.026
1000 70686.596
10000 70027.938
100000 70106.661

Saw a max of 991 MB/sec in iotop

NVME PCIe 4.0 (Samsung 980 Pro 1TB)
e_i_c query_time_ms
0 59306.960
1 956170.704
5 237879.121
10 135004.111
100 55662.030
1000 51513.717
10000 59807.824
100000 53443.291

Saw a max of 1126 MB/sec in iotop

I'm not pretending that this is the best query and table size to show
it, but at least this test shows that there's not much to gain by
prefetching further.   I imagine going further than we need to is
likely to have negative consequences due to populating the kernel page
cache with buffers that won't be used for a while. I also imagine
going too far out likely increases the risk that buffers we've
prefetched are evicted before they're used.

This does also highlight that an effective_io_concurrency of 1 (the
default) is pretty terrible in this test.  The bitmap contained every
2nd page. I imagine that would break normal page prefetching by the
kernel. If that's true, then it does not explain why e_i_c = 0 was so
fast.

I've attached the test setup that I did. I'm open to modifying the
test and running again if someone has an idea that might show benefits
to larger values for effective_io_concurrency.

David

Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Checking pgwin32_is_junction() errors
Next
From: Richard Guo
Date:
Subject: Re: Assert failure in CTE inlining with view and correlated subquery