Re: 8192 BLCKSZ ? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: 8192 BLCKSZ ?
Date
Msg-id 13659.975389917@sss.pgh.pa.us
Whole thread Raw
In response to RE: 8192 BLCKSZ ?  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Responses Re: 8192 BLCKSZ ?  (Bruce Guenter <bruceg@em.ca>)
Re: 8192 BLCKSZ ?  (Nathan Myers <ncm@zembu.com>)
List pgsql-hackers
"Christopher Kings-Lynne" <chriskl@familyhealth.com.au> writes:
> I don't believe it's a performance issue, I believe it's that writes to
> blocks greater than 8k cannot be guaranteed 'atomic' by the operating
> system.  Hence, 32k blocks would break the transactions system.

As Nathan remarks nearby, it's hard to tell how big a write can be
assumed atomic, unless you have considerable knowledge of your OS and
hardware.  However, on traditional Unix filesystems (BSD-derived) it's
a pretty certain bet that writes larger than 8K will *not* be atomic,
since 8K is the filesystem block size.  You don't even need any crash
scenario to see why not: just consider running your disk down to zero
free space.  If there's one block left when you try to add a
multi-block page to your table, you are left with a corrupted page,
not an unwritten page.

Not sure about the wild-and-wooly world of Linux filesystems...
anybody know what the allocation unit is on the popular Linux FSes?

My feeling is that 8K is an entirely reasonable size now that we have
TOAST, and so there's no longer much interest in changing the default
value of BLCKSZ.

In theory, I think, WAL should reduce the importance of page writes
being atomic --- but it still seems like a good idea to ensure that
they are as atomic as we can make them.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Constraint names using 'user namespace'?
Next
From: Philip Warner
Date:
Subject: Re: Constraint names using 'user namespace'?