Hello Tomas,
> At on of the pgcon unconference sessions a couple days ago, I presented
> a bunch of benchmark results comparing performance with different
> data/WAL block size. Most of the OLTP results showed significant gains
> (up to 50%) with smaller (4k) data pages.
You wrote something about SSD a long time ago, but the link is now dead:
http://www.fuzzy.cz/en/articles/ssd-benchmark-results-read-write-pgbench/
See also:
http://www.cybertec.at/postgresql-block-sizes-getting-started/
http://blog.coelho.net/database/2014/08/08/postgresql-page-size-for-SSD.html
[...]
> The other important factor is the native SSD page, which is similar to
> sectors on HDD. SSDs however don't allow in-place updates, and have to
> reset/rewrite of the whole native page. It's actually more complicated,
> because the reset happens at a much larger scale (~8MB block), so it
> does matter how quickly we "dirty" the data. The consequence is that
> using data pages smaller than the native page (depends on the device,
> but seems 4K is the common value) either does not help or actually hurts
> the write performance.
>
> All the SSD results show this behavior - the Optane and Samsung nicely
> show that 4K is much better (in random write IOPS) than 8K, but 1-2K
> pages make it worse.
Yep. ISTM that uou should also consider the underlying FS block size. Ext4
uses 4 KiB by default, so if you write 2 KiB it will write 4 KiB anyway.
There is no much doubt that with SSD we should reduce the default page
size. There are some negative impacts (eg more space is lost because of
headers and the number of tuples that can be fitted), but I guess the
should be an overall benefit. It would help a lot if it would be possible
to initdb with a different block size, without recompiling.
--
Fabien.