Re: [PING] fallocate() causes btrfs to never compress postgresql files - Mailing list pgsql-hackers
| From | Thomas Munro |
|---|---|
| Subject | Re: [PING] fallocate() causes btrfs to never compress postgresql files |
| Date | |
| Msg-id | CA+hUKGJbEp5fBfkS+J5OcrEzPTwugEtiusndiCx4gnN7HKfdpg@mail.gmail.com Whole thread Raw |
| In response to | Re: [PING] fallocate() causes btrfs to never compress postgresql files (Bruce Momjian <bruce@momjian.us>) |
| Responses |
Re: [PING] fallocate() causes btrfs to never compress postgresql files
|
| List | pgsql-hackers |
Here's a new version with some cleanup and documentation. I tried to pare it down to the minimum change for the back-branches, keeping unnecessary changes for master. In the process, I also thought a bit about how to de-confused matters on Windows, where the function we call as ftruncate() behaves differently in a crucial respect. See attached. I'm proposing to back-patch 0001. 0002 and 0003 are proposals for master only. See below for replies to separate messages from Jakub and Bruce. On Thu, Oct 30, 2025 at 11:14 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > +1 to this GUCs as this would also help the nearby thread with XFS > mysteries which are not fully solved [1]. Since the latest message in > that discussion, I'm aware of at least one additional report of XFS > failing at fallocate() with free space too, but without any details > from the OS support vendor why that happened, so this $patch could be > also used to workaround that problem too. Yeah, that seems quite important, and the new report in psql-bugs #19348 sounds like another case. > Just nitpicking: > > > and back-patch it into 17 for the upcoming release. > > It is working as expected on my ZFS system in light testing. Rebasing > > and figuring out where to add the missing documentation for last > > chance review... > > Why just 17? (wasn't fallocate() introduced in 16? 4d330a61bb19 and > 31966b151e6ab are from Apr 2023, while 16 was released on Sep 2023) Right, fixed. > From other things, I was wondering about this: > > > PGC_USERSET > > QQ: Do we really want to have those two GUCs to be alterable like that > by anyone? The alternative would be like let's say PGC_SIGHUP? (on one > end it's flexible, but are there any downsides to this as it stands > out in 0001?). I've checked others and io_workers is PGC_SIGHUP > (understandable), but we also have io_combine_limit && > effective_io_concurrency with PGC_USERSET. I'm just wondering if it > would be sane to have one backend doing I/O with fallocate() and other > just writing using pwrite(). One could argue you could be writing to > two different filesystems with two different users... Yeah. Let's go with PGC_SIGHUP. Let's worry about multiple filesystems when we've figured out how to do per-tablespace settings. This is vapourware for later, but I've been wondering if we could invent a sysctl-style hierarchy as a scoping mechanism, something like: tablespace.foo.random_page_cost=1 tablespace.foo.file_extend_method=ftruncate tablespace.foo.io_combine_limit=1MB Obviously there are some name resolution problems with that. I also thought about allowing a new kind of configuration file inside tablespace directories, but that doesn't work for PGC_USERSET stuff like random_page_cost. If the hierarchy idea goes somewhere, it might also allow a reorganisation like [tablespace.foo.]io.combine_limit, with legacy long names like io_combine_limit still supported, but that's getting quite far off topic... On Fri, Oct 31, 2025 at 5:59 AM Bruce Momjian <bruce@momjian.us> wrote: > Uh, the problem with backpatching new GUCs is that the GUC variable will > _not_ appear in any postgresql.conf file until a new initdb is run. > This can be quite confusing for people. The minor release notes have to > explain this. Yeah. Fortunately the vast majority of users won't ever need to know about this. Those who run into a problem should hopefully find their way to the docs, release notes, settings view, these threads, or write to us? Any other way of controlling this that we invent to avoid back-patching a GUC would surely only be harder to find than a new GUC, I think? And I don't think we're anywhere near the level of needing to revert the posix_fallocate() feature: both reported problems are rare. (Though there is a lesson here in terms of off-switch planning.) Here's my attempt at a release note: "The new setting file_extend_method can be set to write_zeros to disable the use of the posix_fallocate() system call when extending relation files. This is a workaround for users of BTRFS compression, reported to be disabled by posix_fallocate(), and some versions of XFS, reported to fail with spurious ENOSPC errors under some workloads."
Attachment
pgsql-hackers by date: