Re: FileFallocate misbehaving on XFS - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: FileFallocate misbehaving on XFS |
Date | |
Msg-id | 6m3j6rsbngcma45ckox3msfgbn2jjspkqau5bma2pq4l5nolni@2umtkdghgavf Whole thread Raw |
In response to | Re: FileFallocate misbehaving on XFS (Michael Harris <harmic@gmail.com>) |
Responses |
Re: FileFallocate misbehaving on XFS
Re: FileFallocate misbehaving on XFS |
List | pgsql-hackers |
Hi, On 2024-12-10 10:00:43 +1100, Michael Harris wrote: > On Mon, 9 Dec 2024 at 21:06, Tomas Vondra <tomas@vondra.me> wrote: > > Sounds more like an XFS bug/behavior, so it's not clear to me what we > > could do about it. I mean, if the filesystem reports bogus out-of-space, > > is there even something we can do? > > I don't disagree that it's most likely an XFS issue. However, XFS is > pretty widely used - it's the default FS for RHEL & the default in > SUSE for non-root partitions - so maybe some action should be taken. > > Some things we could consider: > > - Providing a way to configure PG not to use posix_fallocate at runtime > > - Detecting the use of XFS (probably nasty and complex to do in a > platform independent way) and disable posix_fallocate > > - In the case of posix_fallocate failing with ENOSPC, fall back to > FileZero (worst case that will fail as well, in which case we will > know that we really are out of space) > > - Documenting that XFS might not be a good choice, at least for some > kernel versions Pretty unexcited about all of these - XFS is fairly widely used for PG, but this problem doesn't seem very common. It seems to me that we're missing something that causes this to only happen in a small subset of cases. I think the source of this needs to be debugged further before we try to apply workarounds in postgres. Are you using any filesystem quotas? It'd be useful to get the xfs_info output that Jakub asked for. Perhaps also xfs_spaceman -c 'freesp -s' /mountpoint xfs_spaceman -c 'health' /mountpoint and df. What kind of storage is this on? Was the filesystem ever grown from a smaller size? Have you checked the filesystem's internal consistency? I.e. something like xfs_repair -n /dev/nvme2n1. It does require the filesystem to be read-only or unmounted though. But corrupted filesystem datastructures certainly could cause spurious ENOSPC. > > What is not clear to me is why would this affect pg_upgrade at all. We > > have the data files split into 1GB segments, and the copy/clone/... goes > > one by one. So there shouldn't be more than 1GB "extra" space needed. > > Surely you have more free space on the system? > > Yes, that also confused me. It actually fails during the schema > restore phase - where pg_upgrade calls pg_restore to restore a > schema-only dump that it takes earlier in the process. At this stage > it is only trying to restore the schema, not any actual table data. > Note that we use the --link option to pg_upgrade, so it should not be > using much space even when the table data is being upgraded. Are you using pg_upgrade -j? I'm asking because looking at linux's git tree I found this interesting recent commit: https://git.kernel.org/linus/94a0333b9212 - but IIUC it'd actually cause file creation, not fallocate to fail. > The filesystems have >1TB free space when this has occurred. > > It does continue to give this error after the upgrade, at apparently > random intervals, when data is being loaded into the DB using COPY > commands, so it might be best not to focus too much on the fact that > we first encounter it during the upgrade. I assume the file that actually errors out changes over time? It's always fallocate() that fails? Can you tell us anything about the workload / data? Lots of tiny tables, lots of big tables, write heavy, ...? Greetings, Andres Freund
pgsql-hackers by date: