Home > mailing lists

Re: [PING] fallocate() causes btrfs to never compress postgresql files - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: [PING] fallocate() causes btrfs to never compress postgresql files
Date	December 15 09:00:16
Msg-id	CA+hUKGJbEp5fBfkS+J5OcrEzPTwugEtiusndiCx4gnN7HKfdpg@mail.gmail.com Whole thread Raw
In response to	Re: [PING] fallocate() causes btrfs to never compress postgresql files (Bruce Momjian <bruce@momjian.us>)
Responses	Re: [PING] fallocate() causes btrfs to never compress postgresql files
List	pgsql-hackers

Tree view

Here's a new version with some cleanup and documentation.  I tried to
pare it down to the minimum change for the back-branches, keeping
unnecessary changes for master.  In the process, I also thought a bit
about how to de-confused matters on Windows, where the function we
call as ftruncate() behaves differently in a crucial respect.  See
attached.

I'm proposing to back-patch 0001.  0002 and 0003 are proposals for master only.

See below for replies to separate messages from Jakub and Bruce.

On Thu, Oct 30, 2025 at 11:14 PM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> +1 to this GUCs as this would also help the nearby thread with XFS
> mysteries which are not fully solved [1]. Since the latest message in
> that discussion, I'm aware of at least one additional report of XFS
> failing at fallocate() with free space too, but without any details
> from the OS support vendor why that happened, so this $patch could be
> also used to workaround that problem too.

Yeah, that seems quite important, and the new report in psql-bugs
#19348 sounds like another case.

> Just nitpicking:
>
> > and back-patch it into 17 for the upcoming release.
> > It is working as expected on my ZFS system in light testing.  Rebasing
> > and figuring out where to add the missing documentation for last
> > chance review...
>
> Why just 17? (wasn't fallocate() introduced in 16? 4d330a61bb19 and
> 31966b151e6ab are from Apr 2023, while 16 was released on Sep 2023)

Right, fixed.

> From other things, I was wondering about this:
>
> > PGC_USERSET
>
> QQ: Do we really want to have those two GUCs to be alterable like that
> by anyone? The alternative would be like let's say PGC_SIGHUP? (on one
> end it's flexible, but are there any downsides to this as it stands
> out in 0001?). I've checked others and io_workers is PGC_SIGHUP
> (understandable), but we also have io_combine_limit &&
> effective_io_concurrency with PGC_USERSET. I'm just wondering if it
> would be sane to have one backend doing I/O with fallocate() and other
> just writing using pwrite(). One could argue you could be writing to
> two different filesystems with two different users...

Yeah.  Let's go with PGC_SIGHUP.  Let's worry about multiple
filesystems when we've figured out how to do per-tablespace settings.

This is vapourware for later, but I've been wondering if we could
invent a sysctl-style hierarchy as a scoping mechanism, something
like:

tablespace.foo.random_page_cost=1
tablespace.foo.file_extend_method=ftruncate
tablespace.foo.io_combine_limit=1MB

Obviously there are some name resolution problems with that.  I also
thought about allowing a new kind of configuration file inside
tablespace directories, but that doesn't work for PGC_USERSET stuff
like random_page_cost.  If the hierarchy idea goes somewhere, it might
also allow a reorganisation like [tablespace.foo.]io.combine_limit,
with legacy long names like io_combine_limit still supported, but
that's getting quite far off topic...

On Fri, Oct 31, 2025 at 5:59 AM Bruce Momjian <bruce@momjian.us> wrote:
> Uh, the problem with backpatching new GUCs is that the GUC variable will
> _not_ appear in any postgresql.conf file until a new initdb is run.
> This can be quite confusing for people.  The minor release notes have to
> explain this.

Yeah.  Fortunately the vast majority of users won't ever need to know
about this.  Those who run into a problem should hopefully find their
way to the docs, release notes, settings view, these threads, or write
to us?  Any other way of controlling this that we invent to avoid
back-patching a GUC would surely only be harder to find than a new
GUC, I think?  And I don't think we're anywhere near the level of
needing to revert the posix_fallocate() feature: both reported
problems are rare.  (Though there is a lesson here in terms of
off-switch planning.)

Here's my attempt at a release note:

"The new setting file_extend_method can be set to write_zeros to
disable the use of the posix_fallocate() system call when extending
relation files.  This is a workaround for users of BTRFS compression,
reported to be disabled by posix_fallocate(), and some versions of
XFS, reported to fail with spurious ENOSPC errors under some
workloads."

Attachment

pgsql-hackers by date:

From: Amit Kapila
Date: 15 December, 08:48:14
Subject: Re: Improve logical replication usability when tables lack primary keys

From: Bryan Green
Date: 15 December, 09:03:44
Subject: Re: [PATCH] Allow complex data for GUC extra.

Re: [PING] fallocate() causes btrfs to never compress postgresql files - Mailing list pgsql-hackers

Attachment

Previous

Next