Checkpoint not retrying failed fsync? - Mailing list pgsql-hackers

From Andrew Gierth
Subject Checkpoint not retrying failed fsync?
Date
Msg-id 87y3i1ia4w.fsf@news-spur.riddles.org.uk
Whole thread Raw
Responses Re: Checkpoint not retrying failed fsync?  (Thomas Munro <thomas.munro@enterprisedb.com>)
List pgsql-hackers
This is only a preliminary report, I'm still trying to analyze what's
going on, but:

In doing testing on FreeBSD with a filesystem set up to induce errors
controllably (using gconcat+gnop), I can get this to happen (on 11devel):

(note that "mytable" is on a tablespace on the erroring filesystem,
while "x" is on a clean filesystem)

postgres=# insert into mytable values (-1);
INSERT 0 1
postgres=# checkpoint;
ERROR:  checkpoint request failed
HINT:  Consult recent messages in the server log for details.
postgres=# insert into x values (3);
INSERT 0 1
postgres=# checkpoint;
CHECKPOINT

(the message in the server log is the expected one about fsync failing)

Checking the WAL shows that there is indeed a checkpoint record for the
second checkpoint and pg_control points to it, so a crash restart at
this point would not try and replay the "mytable" write.

Furthermore, checking the trace output from the checkpointer process, it
is not even attempting an fsync of the failing file; this isn't like the
Linux fsync issue, I've confirmed that fsync will repeatedly fail on the
file until the underlying errors stop.

As far as I can tell from reading the code, if a checkpoint fails the
checkpointer is supposed to keep all the outstanding fsync requests for
next time. Am I wrong, or is there some failure in the logic to do this?

-- 
Andrew (irc:RhodiumToad)


pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Parallel Aggregates for string_agg and array_agg
Next
From: Alvaro Herrera
Date:
Subject: Re: [HACKERS] path toward faster partition pruning