Re: FileFallocate misbehaving on XFS - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: FileFallocate misbehaving on XFS
Date
Msg-id CAKZiRmzWbo_Xcv00_LC-T0xFYwJ3UFJdra7N3G1K3bqCac0qSw@mail.gmail.com
Whole thread Raw
In response to Re: FileFallocate misbehaving on XFS  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers

On Thu, Dec 19, 2024 at 7:49 AM Michael Harris <harmic@gmail.com> wrote:
Hello,

I finally managed to get the patched version installed in a production
database where the error is occurring very regularly.

Here is a sample of the output:

2024-12-19 01:08:50 CET [2533222]:  LOG:  mdzeroextend FileFallocate
failing with ENOSPC: free space for filesystem containing
"pg_tblspc/107724/PG_16_202307071/465960/2591590762.15" f_blocks:
2683831808, f_bfree: 205006167, f_bavail: 205006167 f_files:
1073741376, f_ffree: 1069933796
 
[..]
I have attached a file containing all the errors I collected. The
error is happening pretty regularly - over 400 times in a ~6 hour
period. The number of blocks being extended varies from ~9 to ~15, and
the statfs result shows plenty of available space & inodes at the
time. The errors do seem to come in bursts.

I couldn't resist: you seem to have entered the quantum realm of free disk space AKA Schrodinger's free space: you both have the space and dont have it... ;)

No one else has responded, so I'll try. My take is that we got very limited number of reports (2-3) of this stuff happening and it always seem to be >90% space used, yet the adoption of PG16 is rising, so we may or may not see more errors of those kind, but on another side of things: it's frequency is so rare that it's really wild we don't see more reports like this one. Lots of OS upgrades in the wild are performed by building new standbys (maybe that lowers the fs fragmentation), rather than in-place OS upgrades. To me it sounds like a new bug in XFS that is rare. You can probably live with #undef HAVE_POSIX_FALLOCATE as a way to survive, another would be probably to try to run xfs_fsr to defragment the fs.

Longer-term: other than collecting the eBPF data to start digging from where it is really triggered, I don't see a way forward. It would be suboptimal to just abandon fallocate() optimizations from commit 31966b151e6ab7a6284deab6e8fe5faddaf2ae4c, just because of very unusual combinations of factors (XFS bug).

Well we could be having some kludge like pseudo-code: if(posix_falloc() == ENOSPC && statfs().free_space_pct >= 1) fallback_to_pwrites(), but it is ugly. Another is GUC (or even two -- how much to extend or to use or not the posix_fallocate()), but people do not like more GUCs...

>  I have so far not installed the bpftrace that Jakub suggested before -
> as I say this is a production machine and I am wary of triggering a
> kernel panic or worse (even though it seems like the risk for that
> would be low?). While a kernel stack trace would no doubt be helpful
> to the XFS developers, from a postgres point of view, would that be
> likely to help us decide what to do about this?[..]

Well you could try having reproduction outside of production, or even clone the storage (but not using backup/restore), but literally clone the XFS LUNs on the storage itself and connect those separate VM to have a safe testbed (or even use dd(1) of some smaller XFS fs exhibiting such behaviour to some other place)

As for eBPF/bpftrace: it is safe (it's sandboxed anyway), lots of customers are using it, but as always YMMV.

There's also xfs_fsr that might help overcome.

You can also experiment if -o allocsize helps or just even try -o allocsize=0 (but that might have some negative effects on performance probably)

-J.

pgsql-hackers by date:

Previous
From: Ranier Vilela
Date:
Subject: Re: Can rs_cindex be < 0 for bitmap heap scans?
Next
From: Peter Eisentraut
Date:
Subject: Re: pure parsers and reentrant scanners