RE: pgcon unconference / impact of block size on performance - Mailing list pgsql-hackers

From Jakub Wartak
Subject RE: pgcon unconference / impact of block size on performance
Date
Msg-id PR3PR07MB82439880210722B647023C3FF6A59@PR3PR07MB8243.eurprd07.prod.outlook.com
Whole thread Raw
In response to Re: pgcon unconference / impact of block size on performance  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Responses Re: pgcon unconference / impact of block size on performance
List pgsql-hackers
Hi,

> The really
> puzzling thing is why is the filesystem so much slower for smaller pages. I mean,
> why would writing 1K be 1/3 of writing 4K?
> Why would a filesystem have such effect?

Ha! I don't care at this point as 1 or 2kB seems too small to handle many real world scenarios ;)

> > b) Another thing that you could also include in testing is that I've spotted a
> couple of times single-threaded fio might could be limiting factor (numjobs=1 by
> default), so I've tried with numjobs=2,group_reporting=1 and got this below
> ouput on ext4 defaults even while dropping caches (echo 3) each loop iteration -
> - something that I cannot explain (ext4 direct I/O caching effect? how's that
> even possible? reproduced several times even with numjobs=1) - the point being
> 206643 1kb IOPS @ ext4 direct-io > 131783 1kB IOPS @ raw, smells like some
> caching effect because for randwrite it does not happen. I've triple-checked with
> iostat -x... it cannot be any internal device cache as with direct I/O that doesn't
> happen:
> >
> > [root@x libaio-ext4]# grep -r -e 'write:' -e 'read :' *
> > nvme/randread/128/1k/1.txt:  read : io=12108MB, bw=206644KB/s,
> > iops=206643, runt= 60001msec [b]
> > nvme/randread/128/2k/1.txt:  read : io=18821MB, bw=321210KB/s,
> > iops=160604, runt= 60001msec [b]
> > nvme/randread/128/4k/1.txt:  read : io=36985MB, bw=631208KB/s,
> > iops=157802, runt= 60001msec [b]
> > nvme/randread/128/8k/1.txt:  read : io=57364MB, bw=976923KB/s,
> > iops=122115, runt= 60128msec
> > nvme/randwrite/128/1k/1.txt:  write: io=1036.2MB, bw=17683KB/s,
> > iops=17683, runt= 60001msec [a, as before]
> > nvme/randwrite/128/2k/1.txt:  write: io=2023.2MB, bw=34528KB/s,
> > iops=17263, runt= 60001msec [a, as before]
> > nvme/randwrite/128/4k/1.txt:  write: io=16667MB, bw=282977KB/s,
> > iops=70744, runt= 60311msec [reproduced benefit, as per earlier email]
> > nvme/randwrite/128/8k/1.txt:  write: io=22997MB, bw=391839KB/s,
> > iops=48979, runt= 60099msec
> >
>
> No idea what might be causing this. BTW so you're not using direct-io to access
> the raw device? Or am I just misreading this?

Both scenarios (raw and fs) have had direct=1 set. I just cannot understand how having direct I/O enabled (which
disablescaching) achieves better read IOPS on ext4 than on raw device... isn't it contradiction? 

-J.




pgsql-hackers by date:

Previous
From: "Euler Taveira"
Date:
Subject: Re: tablesync copy ignores publication actions
Next
From: Tomas Vondra
Date:
Subject: Re: pgcon unconference / impact of block size on performance