Re: FileFallocate misbehaving on XFS - Mailing list pgsql-hackers

From Andres Freund
Subject Re: FileFallocate misbehaving on XFS
Date
Msg-id qhy5z65zhfui5b7vmwkqclbu7aksdvdkohxnb3bgzflvrnhugv@vy3pyzwpm3uv
Whole thread Raw
In response to Re: FileFallocate misbehaving on XFS  (Tomas Vondra <tomas@vondra.me>)
List pgsql-hackers
Hi,

On 2024-12-09 15:47:55 +0100, Tomas Vondra wrote:
> On 12/9/24 11:27, Jakub Wartak wrote:
> > On Mon, Dec 9, 2024 at 10:19 AM Michael Harris <harmic@gmail.com
> > <mailto:harmic@gmail.com>> wrote:
> > 
> > Hi Michael,
> > 
> >     We found this thread describing similar issues:
> > 
> >     https://www.postgresql.org/message-id/flat/
> >     AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com
<https://www.postgresql.org/message-id/flat/AS1PR05MB91059AC8B525910A5FCD6E699F9A2%40AS1PR05MB9105.eurprd05.prod.outlook.com>
> > 
> > 
> > We've got some case in the past here in EDB, where an OS vendor has
> > blamed XFS AG fragmentation (too many AGs, and if one AG is not having
> > enough space -> error). Could You perhaps show us output of on that LUN:
> > 1. xfs_info
> > 2. run that script from https://www.suse.com/support/kb/doc/?
> > id=000018219 <https://www.suse.com/support/kb/doc/?id=000018219> for
> > Your AG range
> > 
> 
> But this can be reproduced on a brand new filesystem - I just tried
> creating a 1GB image, create XFS on it, mount it, and fallocate a 600MB
> file twice. Which that fails, and there can't be any real fragmentation.

If I understand correctly xfs, before even looking at the file's current
layout, checks if there's enough free space for the fallocate() to
succeed.  Here's an explanation for why:
https://www.spinics.net/lists/linux-xfs/msg55429.html

  The real problem with preallocation failing part way through due to
  overcommit of space is that we can't go back an undo the
  allocation(s) made by fallocate because when we get ENOSPC we have
  lost all the state of the previous allocations made. If fallocate is
  filling holes between unwritten extents already in the file, then we
  have no way of knowing where the holes we filled were and hence
  cannot reliably free the space we've allocated before ENOSPC was
  hit.

I.e. reserving space as you go would leave you open to ending up with some,
but not all, of those allocations having been made. Whereas pre-reserving the
worst case space needed, ahead of time, ensures that you have enough space to
go through it all.

You can't just go through the file [range] and compute how much free space you
will need allocate and then do the a second pass through the file, because the
file layout might have changed concurrently...


This issue seems independent of the issue Michael is having though. Postgres,
afaik, won't fallocate huge ranges with already allocated space.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: NOT ENFORCED constraint feature
Next
From: "David G. Johnston"
Date:
Subject: Re: Document NULL