Home > mailing lists

Performance implications of 8K pread()s - Mailing list pgsql-performance

From	Dimitrios Apostolou
Subject	Performance implications of 8K pread()s
Date	July 10, 2023 17:28:51
Msg-id	218fa2e0-bc58-e469-35dd-c5cb35906064@gmx.net Whole thread Raw
Responses	Re: Performance implications of 8K pread()s (Thomas Munro <thomas.munro@gmail.com>) Re: Performance implications of 8K pread()s (Thomas Munro <thomas.munro@gmail.com>)
List	pgsql-performance

Tree view

Hello list,

I have noticed that the performance during a SELECT COUNT(*) command is
much slower than what the device can provide. Parallel workers improve the
situation but for simplicity's sake, I disable parallelism for my
measurements here by setting max_parallel_workers_per_gather to 0.

Strace'ing the postgresql process shows that all reads happen in offset'ed 8KB
blocks using pread():

   pread64(172, ..., 8192, 437370880) = 8192

The read rate I see on the device is only 10-20 MB/s. My case is special
though, as this is on a zstd-compressed btrfs filesystem, on a very fast
(1GB/s) direct attached storage system. Given the decompression ratio is around
10x, the above rate corresponds to about 100 to 200 MB/s of data going into the
postgres process.

Can the 8K block size cause slowdown? Here are my observations:

+ Reading a 1GB postgres file using dd (which uses read() internally) in
    8K and 32K chunks:

      # dd if=4156889.4 of=/dev/null bs=8k
      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.18829 s, 174 MB/s

      # dd if=4156889.4 of=/dev/null bs=8k    # 2nd run, data is cached
      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.287623 s, 3.7 GB/s

      # dd if=4156889.8 of=/dev/null bs=32k
      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 1.02688 s, 1.0 GB/s

      # dd if=4156889.8 of=/dev/null bs=32k    # 2nd run, data is cached
      1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.264049 s, 4.1 GB/s

    The rates displayed are after decompression (the fs does it
    transparently) and the results have been verified with multiple runs.

    Notice that the read rate with bs=8k is 174MB/s (I see ~20MB/s on the
    device), slow and similar to what Postgresql gave us above. With bs=32k
    the rate increases to 1GB/s (I see ~80MB/s on the device, but the time
    is very short to register properly).

   The cached reads are fast in both cases.

Note that I suspect my setup being related, (btrfs compression behaving
suboptimally) since the raw device can give me up to 1GB/s rate. It is however
evident that reading in bigger chunks would mitigate such setup inefficiencies.
On a system that reads are already optimal and the read rate remains the same,
then bigger block size would probably reduce the sys time postgresql consumes
because of the fewer system calls.

So would it make sense for postgres to perform reads in bigger blocks? Is it
easy-ish to implement (where would one look for that)? Or must the I/O unit be
tied to postgres' page size?

Regards,
Dimitris

pgsql-performance by date:

From: Akash Anand
Date: 10 July 2023, 10:01:00
Subject: Re: Why is query performance on RLS enabled Postgres worse?

From: Philip Semanchuk
Date: 11 July 2023, 19:07:26
Subject: Entire index scanned, but only when in SQL function?

Performance implications of 8K pread()s - Mailing list pgsql-performance

Previous

Next