Thread: Re: Error:could not extend file " with FileFallocate(): No space left on device

On 2024-Sep-11, Pecsök Ján wrote:

> In our case:
> Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version
8.5.020210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024
 
> File systém type:xfs

Can you please share the output of xfs_info for the filesystem(s) used?

Apparently, it's possible for allocation groups to be suboptimally laid
out in a way that leads to ENOSPC with space still available.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto,
sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)



Output of  xfs_info:
[]# xfs_info /data/aisgamp1/pgdata_system
meta-data=/dev/mapper/dataamp1vg-lv_aisgamp1_pgsys isize=512    agcount=118, agsize=134217720 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1    bigtime=0 inobtcount=0
data     =                       bsize=8192   blocks=15703474176, imaxpct=1
         =                       sunit=8      swidth=32 blks
naming   =version 2              bsize=8192   ascii-ci=0, ftype=1
log      =internal log           bsize=8192   blocks=260864, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=8192   blocks=0, rtextents=0


It is also interesting, that there are over 1 milion files in ll
/data/aisgamp1/pgdata_system/aisgamp1/PG_16_202307071/17820/

# ll /data/aisgamp1/pgdata_system/aisgamp1/PG_16_202307071/17820/ | wc -l
1129340

df -h /data/aisgamp1/pgdata_system/aisgamp1/PG_16_202307071/17820
/data/aisgamp1/pgdata_system/temp/PG_16_202307071/17820
Filesystem                                Size  Used Avail Use% Mounted on
/dev/mapper/dataamp1vg-lv_aisgamp1_pgsys  117T   91T   27T  78% /data/aisgamp1/pgdata_system
/dev/mapper/dataamp1vg-lv_aisgamp1_pgsys  117T   91T   27T  78% /data/aisgamp1/pgdata_system

-----Original Message-----
From: Alvaro Herrera <alvherre@alvh.no-ip.org> 
Sent: Wednesday, September 11, 2024 2:39 PM
To: Pecsök Ján <jan.pecsok@profinit.eu>
Cc: Thomas Munro <thomas.munro@gmail.com>; pgsql-general@lists.postgresql.org; Andres Freund <andres@anarazel.de>
Subject: Re: Error:could not extend file " with FileFallocate(): No space left on device

On 2024-Sep-11, Pecsök Ján wrote:

> In our case:
> Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le 
> (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0 
> 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024 
> File systém type:xfs

Can you please share the output of xfs_info for the filesystem(s) used?

Apparently, it's possible for allocation groups to be suboptimally laid out in a way that leads to ENOSPC with space
stillavailable.
 

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura"
(Perelandra,C.S. Lewis)
 

On Thu, Sep 12, 2024 at 12:39 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> On 2024-Sep-11, Pecsök Ján wrote:
> > In our case:
> > Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version
8.5.020210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 2024 
> > File systém type:xfs
>
> Can you please share the output of xfs_info for the filesystem(s) used?
>
> Apparently, it's possible for allocation groups to be suboptimally laid
> out in a way that leads to ENOSPC with space still available.

Hmm, I have no clues about that, though I do remember reports of
spurious ENOSPC errors from xfs many years ago on some other database
I was around maybe in the era of that kernel or a bit older.

Actually I was already wondering if we need to add a tunable to
control that the heuristic that redirects to posix_fallocate():

https://www.postgresql.org/message-id/flat/CAMazQQfp%2B3f8tD_Q23rCR%3DO%2BJj4jouSRVigbD8OmrTOfHV%2B8gA%40mail.gmail.com

There's no confirmation that writing zeros would be a useful
workaround here, though.  Two things changed in 16: the fallocate()
path was invented, but also we started extending by more than one
block at a time, which might take the pwritev() path or the
fallocate() path, for bulk insertion via COPY.  That btrfs user would
prefer pwritev() always IIRC, but if some version of xfs is alergic to
this pattern I don't know if it's the size or the system call that's
triggering it...

Is COPY used here?

And just for curiosity (I don't see any particular connection or what
to do about it either way in the short term), are we talking about
really big tables with lots of 1GB files named N.1, N.2, N.3, ...
files, or millions of smaller tables?  I kinda wonder if xfs (and any
file system really) would really prefer us to use large files instead
(patches exist for this), and when many-terabyte clusters start
working with huge numbers of segments, we reach fun new kinds of
internal resource exhaustion, or something like that....

. o O { I particularly dislike our habit of synthesising fake ENOSPC
errors in a few code paths... grumble }



I don't understand what ENOSPC has to do with the file descriptor
limits, but this person reported:

# touch test
touch: cannot touch ‘test’: No space left on device

https://serverfault.com/questions/746032/rsync-and-scp-failing-with-no-space-left-on-xfs-device

... with plenty of free space, and it went away with ulimit -Hn and
-Sn changes.  Huh?  Could this have failed in FileAcces() when trying
to re-open a vfd?



In link you provided there is mention, that in PostgreSQL 16 data is not being
compressed for PostgreSQL 16 server. Does it mean, that PosgreSQL 16 use much more space while computing queries?
If that is the case, it can be our problem, because our queries use sometimes several TB of disk space for computation
andif there is considerable increase in disk usage during the queries, it can happen, that sometimes 27TB is not
enough.

I have 2 questions, 

Is there any workaround, that Posgres wont use FileFallocate? Maybe set something in Linux not to allow Posgres to use
it?
The change was introduced in Posgres 16, does it mean, that Posgres 15.8 should have old behaviour?

We dont use COPY in our queries.




-----Original Message-----
From: Thomas Munro <thomas.munro@gmail.com> 
Sent: Wednesday, September 11, 2024 11:37 PM
To: Alvaro Herrera <alvherre@alvh.no-ip.org>
Cc: Pecsök Ján <jan.pecsok@profinit.eu>; pgsql-general@lists.postgresql.org; Andres Freund <andres@anarazel.de>
Subject: Re: Error:could not extend file " with FileFallocate(): No space left on device

On Thu, Sep 12, 2024 at 12:39 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>> On 2024-Sep-11, Pecsök Ján wrote:
> > In our case:
> > Kernel: Linux version 4.18.0-513.18.1.el8_9.ppc64le 
> > (mockbuild@ppc-hv-13.build.eng.rdu2.redhat.com) (gcc version 8.5.0 
> > 20210514 (Red Hat 8.5.0-20) (GCC)) #1 SMP Thu Feb 1 02:52:53 EST 
> > 2024 File systém type:xfs
>
> Can you please share the output of xfs_info for the filesystem(s) used?
>
> Apparently, it's possible for allocation groups to be suboptimally 
> laid out in a way that leads to ENOSPC with space still available.

Hmm, I have no clues about that, though I do remember reports of spurious ENOSPC errors from xfs many years ago on some
otherdatabase I was around maybe in the era of that kernel or a bit older.
 

Actually I was already wondering if we need to add a tunable to control that the heuristic that redirects to
posix_fallocate():


https://www.postgresql.org/message-id/flat/CAMazQQfp%2B3f8tD_Q23rCR%3DO%2BJj4jouSRVigbD8OmrTOfHV%2B8gA%40mail.gmail.com

There's no confirmation that writing zeros would be a useful workaround here, though.  Two things changed in 16: the
fallocate()path was invented, but also we started extending by more than one block at a time, which might take the
pwritev()path or the
 
fallocate() path, for bulk insertion via COPY.  That btrfs user would prefer pwritev() always IIRC, but if some version
ofxfs is alergic to this pattern I don't know if it's the size or the system call that's triggering it...
 

Is COPY used here?

And just for curiosity (I don't see any particular connection or what to do about it either way in the short term), are
wetalking about really big tables with lots of 1GB files named N.1, N.2, N.3, ...
 
files, or millions of smaller tables?  I kinda wonder if xfs (and any file system really) would really prefer us to use
largefiles instead (patches exist for this), and when many-terabyte clusters start working with huge numbers of
segments,we reach fun new kinds of internal resource exhaustion, or something like that....
 

. o O { I particularly dislike our habit of synthesising fake ENOSPC errors in a few code paths... grumble }

On Thu, Sep 12, 2024 at 8:54 PM Pecsök Ján <jan.pecsok@profinit.eu> wrote:
> In link you provided there is mention, that in PostgreSQL 16 data is not being
> compressed for PostgreSQL 16 server. Does it mean, that PosgreSQL 16 use much more space while computing queries?
> If that is the case, it can be our problem, because our queries use sometimes several TB of disk space for
computationand if there is considerable increase in disk usage during the queries, it can happen, that sometimes 27TB
isnot enough. 

The kind of compression discussed there is a btrfs feature.  Xfs
doesn't have compression.

> I have 2 questions,
>
> Is there any workaround, that Posgres wont use FileFallocate? Maybe set something in Linux not to allow Posgres to
useit? 

Not currently.  I was thinking of proposing to introduce a setting and
back-patching it into 16, because it's a sort of regression for btrfs
users (and a hard one to foresee).  It is not at all clear what
exactly is happening on this xfs system, but something else...

> The change was introduced in Posgres 16, does it mean, that Posgres 15.8 should have old behaviour?

Yes.

> We dont use COPY in our queries.

OK so it's presumably due to having lots of concurrent DML operations
(most likely INSERT, could also be UPDATE) that need to extend the
relation.  I'm not sure of the exact behaviour of the heuristics
off the top of my head (but basically it's driven by waitcount[1])...
perhaps if you had only 7 concurrent DML operations and not 8+, it
would be less likely to take the fallocate path, something like
that...  That "8" is the threshold I was thinking of turning into a
GUC, perhaps in the November minor release, but it's not exactly clear
that posix_fallocate() is really the problem.  (I see that there have
been bugs in xfs's posix_fallocate() space accounting, but the one
that I found was about redundant posix_fallocate() over a region that
is already allocated, which PostgreSQL doesn't do.)

However it is far from clear what is actually going wrong here.
Although it seems to imply a pretty weird/bogus use of ENOSPC by the
kernel, that link I posted seems to be hinting that something a bit
different is going on.  It may be clutching at straws, but you might
try increasing those ulimits.  I'm not sure how to try to reproduce it
in lab conditions since it's apparently pretty hard to hit, based on
your 1-2 week MTBF on what sounds like a massive and busy system.
Hmm...

[1] https://github.com/postgres/postgres/commit/00d1e02be24987180115e371abaeb84738257ae2