Re: pgcon unconference / impact of block size on performance - Mailing list pgsql-hackers

From Fabien COELHO
Subject Re: pgcon unconference / impact of block size on performance
Date
Msg-id alpine.DEB.2.22.394.2206062226300.1240647@pseudo
Whole thread Raw
In response to pgcon unconference / impact of block size on performance  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
List pgsql-hackers
Hello Tomas,

> At on of the pgcon unconference sessions a couple days ago, I presented
> a bunch of benchmark results comparing performance with different
> data/WAL block size. Most of the OLTP results showed significant gains
> (up to 50%) with smaller (4k) data pages.

You wrote something about SSD a long time ago, but the link is now dead:

http://www.fuzzy.cz/en/articles/ssd-benchmark-results-read-write-pgbench/

See also:

http://www.cybertec.at/postgresql-block-sizes-getting-started/
http://blog.coelho.net/database/2014/08/08/postgresql-page-size-for-SSD.html

[...]

> The other important factor is the native SSD page, which is similar to
> sectors on HDD. SSDs however don't allow in-place updates, and have to
> reset/rewrite of the whole native page. It's actually more complicated,
> because the reset happens at a much larger scale (~8MB block), so it
> does matter how quickly we "dirty" the data. The consequence is that
> using data pages smaller than the native page (depends on the device,
> but seems 4K is the common value) either does not help or actually hurts
> the write performance.
>
> All the SSD results show this behavior - the Optane and Samsung nicely
> show that 4K is much better (in random write IOPS) than 8K, but 1-2K
> pages make it worse.

Yep. ISTM that uou should also consider the underlying FS block size. Ext4 
uses 4 KiB by default, so if you write 2 KiB it will write 4 KiB anyway.

There is no much doubt that with SSD we should reduce the default page 
size. There are some negative impacts (eg more space is lost because of 
headers and the number of tuples that can be fitted), but I guess the 
should be an overall benefit. It would help a lot if it would be possible 
to initdb with a different block size, without recompiling.

-- 
Fabien.



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: oat_post_create expected behavior
Next
From: Tom Lane
Date:
Subject: Re: How about a psql backslash command to show GUCs?