Re: pgcon unconference / impact of block size on performance - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: pgcon unconference / impact of block size on performance
Date
Msg-id 31c3f2cd-5ce9-6130-4c06-2700fad0a970@enterprisedb.com
Whole thread Raw
In response to RE: pgcon unconference / impact of block size on performance  (Jakub Wartak <Jakub.Wartak@tomtom.com>)
Responses RE: pgcon unconference / impact of block size on performance
List pgsql-hackers

On 6/7/22 15:48, Jakub Wartak wrote:
> Hi,
> 
>> The really
>> puzzling thing is why is the filesystem so much slower for smaller pages. I mean,
>> why would writing 1K be 1/3 of writing 4K?
>> Why would a filesystem have such effect?
> 
> Ha! I don't care at this point as 1 or 2kB seems too small to handle many real world scenarios ;)
> 

I think that's not quite true - a lot of OLTP works with fairly narrow
rows, and if they use more data, it's probably in TOAST, so again split
into smaller rows. It's true smaller pages would cut some of the limits
(columns, index tuple, ...) of course, and that might be an issue.

Independently of that, it seems like an interesting behavior and it
might tell us something about how to optimize for larger pages.

>>> b) Another thing that you could also include in testing is that I've spotted a
>> couple of times single-threaded fio might could be limiting factor (numjobs=1 by
>> default), so I've tried with numjobs=2,group_reporting=1 and got this below
>> ouput on ext4 defaults even while dropping caches (echo 3) each loop iteration -
>> - something that I cannot explain (ext4 direct I/O caching effect? how's that
>> even possible? reproduced several times even with numjobs=1) - the point being
>> 206643 1kb IOPS @ ext4 direct-io > 131783 1kB IOPS @ raw, smells like some
>> caching effect because for randwrite it does not happen. I've triple-checked with
>> iostat -x... it cannot be any internal device cache as with direct I/O that doesn't
>> happen:
>>>
>>> [root@x libaio-ext4]# grep -r -e 'write:' -e 'read :' *
>>> nvme/randread/128/1k/1.txt:  read : io=12108MB, bw=206644KB/s,
>>> iops=206643, runt= 60001msec [b]
>>> nvme/randread/128/2k/1.txt:  read : io=18821MB, bw=321210KB/s,
>>> iops=160604, runt= 60001msec [b]
>>> nvme/randread/128/4k/1.txt:  read : io=36985MB, bw=631208KB/s,
>>> iops=157802, runt= 60001msec [b]
>>> nvme/randread/128/8k/1.txt:  read : io=57364MB, bw=976923KB/s,
>>> iops=122115, runt= 60128msec
>>> nvme/randwrite/128/1k/1.txt:  write: io=1036.2MB, bw=17683KB/s,
>>> iops=17683, runt= 60001msec [a, as before]
>>> nvme/randwrite/128/2k/1.txt:  write: io=2023.2MB, bw=34528KB/s,
>>> iops=17263, runt= 60001msec [a, as before]
>>> nvme/randwrite/128/4k/1.txt:  write: io=16667MB, bw=282977KB/s,
>>> iops=70744, runt= 60311msec [reproduced benefit, as per earlier email]
>>> nvme/randwrite/128/8k/1.txt:  write: io=22997MB, bw=391839KB/s,
>>> iops=48979, runt= 60099msec
>>>
>>
>> No idea what might be causing this. BTW so you're not using direct-io to access
>> the raw device? Or am I just misreading this?
> 
> Both scenarios (raw and fs) have had direct=1 set. I just cannot understand how having direct I/O enabled (which
disablescaching) achieves better read IOPS on ext4 than on raw device... isn't it contradiction?
 
> 

Thanks for the clarification. Not sure what might be causing this. Did
you use the same parameters (e.g. iodepth) in both cases?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Jakub Wartak
Date:
Subject: RE: pgcon unconference / impact of block size on performance
Next
From: Pavel Borisov
Date:
Subject: Re: Add 64-bit XIDs into PostgreSQL 15