On Wed, 29 May 2019 15:10:06 -0300, Alexandre hadjinlian guerra
<alexhguerra@gmail.com> wrote:
> Hi
>
> Given that postgres uses 8Kb pages, im wondering why i couldnt see any
> tests at all which would format ext4 partition to 8Kb pages. Im about
> to do
> some tests, but any knowledge about such lack of tests on the internet
> makes me wonder if im looking poorly or just lack of testing. besides,
> i do
> ask if the following link remain true given XFS and EXT4 evolution since
> 2015
> https://blog.pgaddict.com/posts/postgresql-performance-on-ext4-and-xfs
>
> Thanks
One thing no one has yet mentioned is that I/O performance could suffer
greatly if the disk page size > memory page size because a single disk
page split over multiple VMM page frames may be discontiguous in memory.
That is a problem for bus-mastering disk controllers because their DMA
can operate only on *physical* addresses - not the logical addresses
used by the programs. Pages touched by external DMA need to be pinned
(locked in place) for the duration.
There is also DMA built-in on the system board. Typically built-in DMA
can work through the MMU with logical addresses and so (usually) does
not need to pin memory pages to access them.
But it's up to the device driver which DMA (if any) is used. Most
bus-mastering devices prefer to use their own DMA hardware, and their
drivers either have to pin memory pages or work through a small(ish)
buffer in a fixed location (which entails extraneous copying of data).
Postgresql's 8KB logical disk pages take up two 4KB memory pages - which
may not be adjacent - but since the filesystem and memory pages are the
same size, DMA (built-in or external) can access the pages in any
order, and without employing (or even needing) scatter/gather ability to
coalesce or distribute partial pages to/from non-contiguous locations.
Another consideration for disk page size is that program code typically
is paged in directly from the executable file. If the disk and memory
pages aren't the same size, the OS page fault handler needs to be aware
and able to deal with the difference. Obviously this could be addressed
simply by segregating "large" pages to separate data-only volumes.
AFAIK, only the Itanium has an option for 8KB memory pages. The "large"
/ "huge" memory pages available in most CPUs today are too big to be
used effectively by a filesystem.
https://en.wikipedia.org/wiki/Page_(computer_memory)#Multiple_page_sizes
Rewriting filesystem drivers and the memory manager so that 8KKB or
larger disk pages could be treated as a sort of "huge" memory page -
overlaid on adjacent 4KB physical memory pages - would be a massive
job. Since few programs other than DBMS really would benefit from it,
it isn't likely to happen.
YMMV,
George