Re: FileFallocate misbehaving on XFS - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: FileFallocate misbehaving on XFS
Date
Msg-id 8aa1d1d7-645f-404b-a8f8-7c49be9acd27@vondra.me
Whole thread Raw
In response to FileFallocate misbehaving on XFS  (Michael Harris <harmic@gmail.com>)
Responses Re: FileFallocate misbehaving on XFS
List pgsql-hackers

On 12/9/24 08:34, Michael Harris wrote:
> Hello PG Hackers
> 
> Our application has recently migrated to PG16, and we have experienced
> some failed upgrades. The upgrades are performed using pg_upgrade and
> have failed during the phase where the schema is restored into the new
> cluster, with the following error:
> 
> pg_restore: error: could not execute query: ERROR:  could not extend
> file "pg_tblspc/16401/PG_16_202307071/17643/1249.1" with
> FileFallocate(): No space left on device
> HINT:  Check free disk space.
> 
> This has happened multiple times on different servers, and in each
> case there was plenty of free space available.
> 
> We found this thread describing similar issues:
> 
>
https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
> 
> As is the case in that thread, all of the affected databases are using XFS.
> 
> One of my colleagues built postgres from source with
> HAVE_POSIX_FALLOCATE not defined, and using that build he was able to
> complete the pg_upgrade, and then switched to a stock postgres build
> after the upgrade. However, as you might expect, after the upgrade we
> have experienced similar errors during regular operation. We make
> heavy use of COPY, which is mentioned in the above discussion as
> pre-allocating files.
> 
> We have seen this on both Rocky Linux 8 (kernel 4.18.0) and Rocky
> Linux 9 (Kernel 5.14.0).
> 
> I am wondering if this bug might be related:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1791323
> 
>> When given an offset of 0 and a length, fallocate (man 2 fallocate) reports ENOSPC if the size of the file + the
lengthto be allocated is greater than the available space.
 
> 
> There is a reproduction procedure at the bottom of the above ubuntu
> thread, and using that procedure I get the same results on both kernel
> 4.18.0 and 5.14.0.
> When calling fallocate with offset zero on an existing file, I get
> enospc even if I am only requesting the same amount of space as the
> file already has.
> If I repeat the experiment with ext4 I don't get that behaviour.
> 
> On a surface examination of the code paths leading to the
> FileFallocate call, it does not look like it should be trying to
> allocate already allocated space, but I might have missed something
> there.
> 
> Is this already being looked into?
> 

Sounds more like an XFS bug/behavior, so it's not clear to me what we
could do about it. I mean, if the filesystem reports bogus out-of-space,
is there even something we can do?

What is not clear to me is why would this affect pg_upgrade at all. We
have the data files split into 1GB segments, and the copy/clone/... goes
one by one. So there shouldn't be more than 1GB "extra" space needed.
Surely you have more free space on the system?


regards

-- 
Tomas Vondra




pgsql-hackers by date:

Previous
From: Andrea Gelmini
Date:
Subject: Re: FileFallocate misbehaving on XFS
Next
From: Amit Kapila
Date:
Subject: Re: Memory leak in WAL sender with pgoutput (v10~)