Re: Block / Page Size Optimization - Mailing list pgsql-performance

From Andres Freund
Subject Re: Block / Page Size Optimization
Date
Msg-id 20190408162846.c5zv4hu5akixjncn@alap3.anarazel.de
Whole thread Raw
In response to Block / Page Size Optimization  (Gunther <raj@gusw.net>)
List pgsql-performance
Hi,

On 2019-04-08 11:09:07 -0400, Gunther wrote:
> I can set an XFS file system with 8192 bytes block size, but then it does
> not mount on Linux, because the VM page size is the limit, 4096 again.
> 
> There seems to be no way to change that in (most, common) Linux variants. In
> FreeBSD there appears to be a way to change that.
> 
> But then, there is a hardware limit also, as far as the VM memory page
> allocation is concerned. Apparently most i386 / amd64 architectures the VM
> page sizes are 4k, 2M, and 1G. The latter, I believe, are called "hugepages"
> and I only ever see that discussed in the PostgreSQL manuals for Linux, not
> for FreeBSD.
> 
> People have asked: does it matter? And then there is all that chatter about
> "why don't you run a benchmark and report back to us" -- "OK, will do" --
> and then it's crickets.
> 
> But why is this such a secret?
> 
> On Amazon AWS there is the following very simple situation: IO is capped on
> IO operations per second (IOPS). Let's say, on a smallish volume, I get 300
> IOPS (once my burst balance is used up.)
> 
> Now my simple theoretical reasoning is this: one IO call transfers 1 block
> of 4k size. That means, with a cap of 300 IOPS, I get to send 1.17 MB per
> second. That would be the absolute limit. BUT, if I could double the
> transfer size to 8k, I should be able to move 2.34 MB per second. Shouldn't
> I?

The kernel collapses consecutive write requests. You can see the
average sizes of IO requests using iostat -xm 1. When e.g. bulk loading
into postgres I see:

Device            r/s     w/s     rMB/s     wMB/s   rrqm/s   wrqm/s  %rrqm  %wrqm r_await w_await aqu-sz rareq-sz
wareq-sz svctm  %util
 
sda              4.00  696.00      0.02    471.05     0.00    80.00   0.00  10.31    8.50    7.13   4.64     4.00
693.03  0.98  68.50
 

so the average write request size was 693.03 kb. Thus I got 470 MB/sec
despite there only being ~700 IOPS. That's with 4KB page sizes, 4KB FS
blocks, and 8KB postgres  block size.


There still might be some benefit of different FS block sizes, but it's
not going to be related directly to IOPS.

Greetings,

Andres Freund



pgsql-performance by date:

Previous
From: Gunther
Date:
Subject: Block / Page Size Optimization
Next
From: phb07
Date:
Subject: Re: Oracle to postgres migration