Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call - Mailing list pgsql-hackers

From Melanie Plageman
Subject Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call
Date
Msg-id 20230424155855.roatu3odubmue4i2@liskov
Whole thread Raw
In response to could not extend file "base/5/3501" with FileFallocate(): Interrupted system call  (Christoph Berg <myon@debian.org>)
List pgsql-hackers
On Mon, Apr 24, 2023 at 10:53:35AM +0200, Christoph Berg wrote:
> Re: Andres Freund
> > Add smgrzeroextend(), FileZero(), FileFallocate()
> 
> Hi,
> 
> I'm often seeing PG16 builds erroring out in the pgbench tests:
> 
> 00:33:12 make[2]: Entering directory '/<<PKGBUILDDIR>>/build/src/bin/pgbench'
> 00:33:12 echo "# +++ tap check in src/bin/pgbench +++" && rm -rf '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check
&&/bin/mkdir -p '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check && cd /<<PKGBUILDDIR>>/build/../src/bin/pgbench &&
TESTLOGDIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/log'
TESTDATADIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check'
PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/postgresql/16/bin:/<<PKGBUILDDIR>>/build/src/bin/pgbench:$PATH"
LD_LIBRARY_PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/aarch64-linux-gnu" PGPORT='65432'
top_builddir='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../..'
PG_REGRESS='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../../src/test/regress/pg_regress'/usr/bin/prove -I
/<<PKGBUILDDIR>>/build/../src/test/perl/-I /<<PKGBUILDDIR>>/build/../src/bin/pgbench --verbose t/*.pl
 
> 00:33:12 # +++ tap check in src/bin/pgbench +++
> 00:33:14 #   Failed test 'concurrent OID generation status (got 2 vs expected 0)'
> 00:33:14 #   at t/001_pgbench_with_server.pl line 31.
> 00:33:14 #   Failed test 'concurrent OID generation stdout /(?^:processed: 125/125)/'
> 00:33:14 #   at t/001_pgbench_with_server.pl line 31.
> 00:33:14 #                   'pgbench (16devel (Debian 16~~devel-1.pgdg100+~20230423.1656.g8bbd0cc))
> 00:33:14 # transaction type:
/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_concurrent_insert
> 00:33:14 # scaling factor: 1
> 00:33:14 # query mode: prepared
> 00:33:14 # number of clients: 5
> 00:33:14 # number of threads: 1
> 00:33:14 # maximum number of tries: 1
> 00:33:14 # number of transactions per client: 25
> 00:33:14 # number of transactions actually processed: 118/125
> 00:33:14 # number of failed transactions: 0 (0.000%)
> 00:33:14 # latency average = 26.470 ms
> 00:33:14 # initial connection time = 66.583 ms
> 00:33:14 # tps = 188.889760 (without initial connection time)
> 00:33:14 # '
> 00:33:14 #     doesn't match '(?^:processed: 125/125)'
> 00:33:14 #   Failed test 'concurrent OID generation stderr /(?^:^$)/'
> 00:33:14 #   at t/001_pgbench_with_server.pl line 31.
> 00:33:14 #                   'pgbench: error: client 2 script 0 aborted in command 0 query 0: ERROR:  could not
extendfile "base/5/3501" with FileFallocate(): Interrupted system call
 
> 00:33:14 # HINT:  Check free disk space.
> 00:33:14 # pgbench: error: Run was aborted; the above results are incomplete.
> 00:33:14 # '
> 00:33:14 #     doesn't match '(?^:^$)'
> 00:33:26 # Looks like you failed 3 tests of 428.
> 00:33:26 t/001_pgbench_with_server.pl ..
> 00:33:26 not ok 1 - concurrent OID generation status (got 2 vs expected 0)
> 
> I don't think the disk is full since it's always hitting that same
> spot, on some of the builds:
> 
> https://pgdgbuild.dus.dg-i.net/job/postgresql-16-binaries-snapshot/833/
> 
> This is overlayfs with tmpfs (upper)/ext4 (lower). Manually running
> that test works though, and the FS seems to support posix_fallocate:
> 
> #include <fcntl.h>
> #include <stdio.h>
> 
> int main ()
> {
>         int f;
>         int err;
> 
>         if (!(f = open("moo", O_CREAT | O_RDWR, 0666)))
>                 perror("open");
> 
>         err = posix_fallocate(f, 0, 10);
>         perror("posix_fallocate");
> 
>         return 0;
> }
> 
> $ ./a.out
> posix_fallocate: Success
> 
> The problem has been there for some weeks - I didn't report it earlier
> as I was on vacation, in the free time trying to bootstrap s390x
> support for apt.pg.o, and there was this other direct IO problem
> making all the builds fail for some time.

I noticed that dsm_impl_posix_resize() does a do while rc==EINTR and
FileFallocate() doesn't. From what the comment says in
dsm_impl_posix_resize() and some cursory googling, posix_fallocate()
doesn't restart automatically on most systems, so a do while() rc==EINTR
is often used. Is there a reason it isn't used in FileFallocate() I
wonder?

- Melanie



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Missing update of all_hasnulls in BRIN opclasses
Next
From: Alvaro Herrera
Date:
Subject: Re: Memory leak in CachememoryContext