could not extend file "base/5/3501" with FileFallocate(): Interrupted system call - Mailing list pgsql-hackers

From Christoph Berg
Subject could not extend file "base/5/3501" with FileFallocate(): Interrupted system call
Date
Msg-id ZEZDj1H61ryrmY9o@msg.df7cb.de
Whole thread Raw
Responses Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call
Re: could not extend file "base/5/3501" with FileFallocate(): Interrupted system call
List pgsql-hackers
Re: Andres Freund
> Add smgrzeroextend(), FileZero(), FileFallocate()

Hi,

I'm often seeing PG16 builds erroring out in the pgbench tests:

00:33:12 make[2]: Entering directory '/<<PKGBUILDDIR>>/build/src/bin/pgbench'
00:33:12 echo "# +++ tap check in src/bin/pgbench +++" && rm -rf '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check &&
/bin/mkdir-p '/<<PKGBUILDDIR>>/build/src/bin/pgbench'/tmp_check && cd /<<PKGBUILDDIR>>/build/../src/bin/pgbench &&
TESTLOGDIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/log'
TESTDATADIR='/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check'
PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/postgresql/16/bin:/<<PKGBUILDDIR>>/build/src/bin/pgbench:$PATH"
LD_LIBRARY_PATH="/<<PKGBUILDDIR>>/build/tmp_install/usr/lib/aarch64-linux-gnu" PGPORT='65432'
top_builddir='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../..'
PG_REGRESS='/<<PKGBUILDDIR>>/build/src/bin/pgbench/../../../src/test/regress/pg_regress'/usr/bin/prove -I
/<<PKGBUILDDIR>>/build/../src/test/perl/-I /<<PKGBUILDDIR>>/build/../src/bin/pgbench --verbose t/*.pl
 
00:33:12 # +++ tap check in src/bin/pgbench +++
00:33:14 #   Failed test 'concurrent OID generation status (got 2 vs expected 0)'
00:33:14 #   at t/001_pgbench_with_server.pl line 31.
00:33:14 #   Failed test 'concurrent OID generation stdout /(?^:processed: 125/125)/'
00:33:14 #   at t/001_pgbench_with_server.pl line 31.
00:33:14 #                   'pgbench (16devel (Debian 16~~devel-1.pgdg100+~20230423.1656.g8bbd0cc))
00:33:14 # transaction type:
/<<PKGBUILDDIR>>/build/src/bin/pgbench/tmp_check/t_001_pgbench_with_server_main_data/001_pgbench_concurrent_insert
00:33:14 # scaling factor: 1
00:33:14 # query mode: prepared
00:33:14 # number of clients: 5
00:33:14 # number of threads: 1
00:33:14 # maximum number of tries: 1
00:33:14 # number of transactions per client: 25
00:33:14 # number of transactions actually processed: 118/125
00:33:14 # number of failed transactions: 0 (0.000%)
00:33:14 # latency average = 26.470 ms
00:33:14 # initial connection time = 66.583 ms
00:33:14 # tps = 188.889760 (without initial connection time)
00:33:14 # '
00:33:14 #     doesn't match '(?^:processed: 125/125)'
00:33:14 #   Failed test 'concurrent OID generation stderr /(?^:^$)/'
00:33:14 #   at t/001_pgbench_with_server.pl line 31.
00:33:14 #                   'pgbench: error: client 2 script 0 aborted in command 0 query 0: ERROR:  could not extend
file"base/5/3501" with FileFallocate(): Interrupted system call
 
00:33:14 # HINT:  Check free disk space.
00:33:14 # pgbench: error: Run was aborted; the above results are incomplete.
00:33:14 # '
00:33:14 #     doesn't match '(?^:^$)'
00:33:26 # Looks like you failed 3 tests of 428.
00:33:26 t/001_pgbench_with_server.pl ..
00:33:26 not ok 1 - concurrent OID generation status (got 2 vs expected 0)

I don't think the disk is full since it's always hitting that same
spot, on some of the builds:

https://pgdgbuild.dus.dg-i.net/job/postgresql-16-binaries-snapshot/833/

This is overlayfs with tmpfs (upper)/ext4 (lower). Manually running
that test works though, and the FS seems to support posix_fallocate:

#include <fcntl.h>
#include <stdio.h>

int main ()
{
        int f;
        int err;

        if (!(f = open("moo", O_CREAT | O_RDWR, 0666)))
                perror("open");

        err = posix_fallocate(f, 0, 10);
        perror("posix_fallocate");

        return 0;
}

$ ./a.out
posix_fallocate: Success

The problem has been there for some weeks - I didn't report it earlier
as I was on vacation, in the free time trying to bootstrap s390x
support for apt.pg.o, and there was this other direct IO problem
making all the builds fail for some time.

Christoph



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Add two missing tests in 035_standby_logical_decoding.pl
Next
From: Daniel Gustafsson
Date:
Subject: Re: duplicate function declaration in multirangetypes_selfuncs.c