Re: pgcon unconference / impact of block size on performance - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: pgcon unconference / impact of block size on performance |
Date | |
Msg-id | 31c3f2cd-5ce9-6130-4c06-2700fad0a970@enterprisedb.com Whole thread Raw |
In response to | RE: pgcon unconference / impact of block size on performance (Jakub Wartak <Jakub.Wartak@tomtom.com>) |
Responses |
RE: pgcon unconference / impact of block size on performance
|
List | pgsql-hackers |
On 6/7/22 15:48, Jakub Wartak wrote: > Hi, > >> The really >> puzzling thing is why is the filesystem so much slower for smaller pages. I mean, >> why would writing 1K be 1/3 of writing 4K? >> Why would a filesystem have such effect? > > Ha! I don't care at this point as 1 or 2kB seems too small to handle many real world scenarios ;) > I think that's not quite true - a lot of OLTP works with fairly narrow rows, and if they use more data, it's probably in TOAST, so again split into smaller rows. It's true smaller pages would cut some of the limits (columns, index tuple, ...) of course, and that might be an issue. Independently of that, it seems like an interesting behavior and it might tell us something about how to optimize for larger pages. >>> b) Another thing that you could also include in testing is that I've spotted a >> couple of times single-threaded fio might could be limiting factor (numjobs=1 by >> default), so I've tried with numjobs=2,group_reporting=1 and got this below >> ouput on ext4 defaults even while dropping caches (echo 3) each loop iteration - >> - something that I cannot explain (ext4 direct I/O caching effect? how's that >> even possible? reproduced several times even with numjobs=1) - the point being >> 206643 1kb IOPS @ ext4 direct-io > 131783 1kB IOPS @ raw, smells like some >> caching effect because for randwrite it does not happen. I've triple-checked with >> iostat -x... it cannot be any internal device cache as with direct I/O that doesn't >> happen: >>> >>> [root@x libaio-ext4]# grep -r -e 'write:' -e 'read :' * >>> nvme/randread/128/1k/1.txt: read : io=12108MB, bw=206644KB/s, >>> iops=206643, runt= 60001msec [b] >>> nvme/randread/128/2k/1.txt: read : io=18821MB, bw=321210KB/s, >>> iops=160604, runt= 60001msec [b] >>> nvme/randread/128/4k/1.txt: read : io=36985MB, bw=631208KB/s, >>> iops=157802, runt= 60001msec [b] >>> nvme/randread/128/8k/1.txt: read : io=57364MB, bw=976923KB/s, >>> iops=122115, runt= 60128msec >>> nvme/randwrite/128/1k/1.txt: write: io=1036.2MB, bw=17683KB/s, >>> iops=17683, runt= 60001msec [a, as before] >>> nvme/randwrite/128/2k/1.txt: write: io=2023.2MB, bw=34528KB/s, >>> iops=17263, runt= 60001msec [a, as before] >>> nvme/randwrite/128/4k/1.txt: write: io=16667MB, bw=282977KB/s, >>> iops=70744, runt= 60311msec [reproduced benefit, as per earlier email] >>> nvme/randwrite/128/8k/1.txt: write: io=22997MB, bw=391839KB/s, >>> iops=48979, runt= 60099msec >>> >> >> No idea what might be causing this. BTW so you're not using direct-io to access >> the raw device? Or am I just misreading this? > > Both scenarios (raw and fs) have had direct=1 set. I just cannot understand how having direct I/O enabled (which disablescaching) achieves better read IOPS on ext4 than on raw device... isn't it contradiction? > Thanks for the clarification. Not sure what might be causing this. Did you use the same parameters (e.g. iodepth) in both cases? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: