Re: Large block sizes support in Linux - Mailing list pgsql-hackers

From Pankaj Raghav
Subject Re: Large block sizes support in Linux
Date
Msg-id 97e72ccb-193f-43e0-97ac-17359c1c874b@pankajraghav.com
Whole thread Raw
In response to Re: Large block sizes support in Linux  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 23/03/2024 03:41, Bruce Momjian wrote:
> On Fri, Mar 22, 2024 at 10:31:11PM +0100, Tomas Vondra wrote:
>> Right, but things change over time - current storage devices support
>> much larger sectors (LBA format), usually 4K. And if you do I/O with
>> this size, it's usually atomic.
>>
>> AFAIK if you built Postgres with 4K pages, on a device with 4K LBA
>> format, that would not need full-page writes - we always do I/O in 4k
>> pages, and block layer does I/O (during writeback from page cache) with
>> minimum guaranteed size = logical block size. 4K are great for OLTP
>> systems in general, it'd be even better if we didn't need to worry about
>> torn pages (but the tricky part is to be confident it's safe to disable
>> them on a particular system).
> 
> Yes, even if the file system is 8k, and the storage is 8k, we only know
> that torn pages are impossible if the file system never overwrites
> existing 8k pages, but writes new ones and then makes it active.  I
> think ZFS does that to handle snapshots.
> 

I think we can also avoid torn writes:
- if filesystem's data path always writes in multiples of 8k (with alignment)
- device supports 8k atomic writes.

Then we might be able to push the responsibility to the device without having the overhead
of a CoW FS or FPW=on. Of course, the performance here depends on the vendor specific
implementation of atomics.

We are trying to enable the former by adding LBS support to XFS in Linux.

--
Pankaj



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Add bump memory context type and use it for tuplesorts
Next
From: "Amonson, Paul D"
Date:
Subject: RE: Popcount optimization using AVX512