Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call |
Date | |
Msg-id | 20230424155855.roatu3odubmue4i2@liskov Whole thread Raw |
In response to | could not extend file "base/5/3501" with FileFallocate(): Interrupted system call (Christoph Berg <myon@debian.org>) |
List | pgsql-hackers |
On Mon, Apr 24, 2023 at 10:53:35AM +0200, Christoph Berg wrote: > Re: Andres Freund > > Add smgrzeroextend(), FileZero(), FileFallocate() > > Hi, > > I'm often seeing PG16 builds erroring out in the pgbench tests: > > 00:33:12 make[2]: Entering directory '/<<PKGBUILDDIR>>/build/src/bin/pgbench' > 00:33:12 echo "# +++ tap check in src/bin/pgbench +++" && rm -rf '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check &&/bin/mkdir -p '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check && cd /<<PKGBUILDDIR>>/build/../src/bin/pgbench && TESTLOGDIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/log' TESTDATADIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check' PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/postgresql/16/bin:/<<PKGBUILDDIR>>/build/src/bin/pgbench:$PATH" LD_LIBRARY_PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/aarch64-linux-gnu" PGPORT='65432' top_builddir='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../..' PG_REGRESS='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../../src/test/regress/pg_regress'/usr/bin/prove -I /<<PKGBUILDDIR>>/build/../src/test/perl/-I /<<PKGBUILDDIR>>/build/../src/bin/pgbench --verbose t/*.pl > 00:33:12 # +++ tap check in src/bin/pgbench +++ > 00:33:14 # Failed test 'concurrent OID generation status (got 2 vs expected 0)' > 00:33:14 # at t/001_pgbench_with_server.pl line 31. > 00:33:14 # Failed test 'concurrent OID generation stdout /(?^:processed: 125/125)/' > 00:33:14 # at t/001_pgbench_with_server.pl line 31. > 00:33:14 # 'pgbench (16devel (Debian 16~~devel-1.pgdg100+~20230423.1656.g8bbd0cc)) > 00:33:14 # transaction type: /<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_concurrent_insert > 00:33:14 # scaling factor: 1 > 00:33:14 # query mode: prepared > 00:33:14 # number of clients: 5 > 00:33:14 # number of threads: 1 > 00:33:14 # maximum number of tries: 1 > 00:33:14 # number of transactions per client: 25 > 00:33:14 # number of transactions actually processed: 118/125 > 00:33:14 # number of failed transactions: 0 (0.000%) > 00:33:14 # latency average = 26.470 ms > 00:33:14 # initial connection time = 66.583 ms > 00:33:14 # tps = 188.889760 (without initial connection time) > 00:33:14 # ' > 00:33:14 # doesn't match '(?^:processed: 125/125)' > 00:33:14 # Failed test 'concurrent OID generation stderr /(?^:^$)/' > 00:33:14 # at t/001_pgbench_with_server.pl line 31. > 00:33:14 # 'pgbench: error: client 2 script 0 aborted in command 0 query 0: ERROR: could not extendfile "base/5/3501" with FileFallocate(): Interrupted system call > 00:33:14 # HINT: Check free disk space. > 00:33:14 # pgbench: error: Run was aborted; the above results are incomplete. > 00:33:14 # ' > 00:33:14 # doesn't match '(?^:^$)' > 00:33:26 # Looks like you failed 3 tests of 428. > 00:33:26 t/001_pgbench_with_server.pl .. > 00:33:26 not ok 1 - concurrent OID generation status (got 2 vs expected 0) > > I don't think the disk is full since it's always hitting that same > spot, on some of the builds: > > https://pgdgbuild.dus.dg-i.net/job/postgresql-16-binaries-snapshot/833/ > > This is overlayfs with tmpfs (upper)/ext4 (lower). Manually running > that test works though, and the FS seems to support posix_fallocate: > > #include <fcntl.h> > #include <stdio.h> > > int main () > { > int f; > int err; > > if (!(f = open("moo", O_CREAT | O_RDWR, 0666))) > perror("open"); > > err = posix_fallocate(f, 0, 10); > perror("posix_fallocate"); > > return 0; > } > > $ ./a.out > posix_fallocate: Success > > The problem has been there for some weeks - I didn't report it earlier > as I was on vacation, in the free time trying to bootstrap s390x > support for apt.pg.o, and there was this other direct IO problem > making all the builds fail for some time. I noticed that dsm_impl_posix_resize() does a do while rc==EINTR and FileFallocate() doesn't. From what the comment says in dsm_impl_posix_resize() and some cursory googling, posix_fallocate() doesn't restart automatically on most systems, so a do while() rc==EINTR is often used. Is there a reason it isn't used in FileFallocate() I wonder? - Melanie
pgsql-hackers by date: