Home > mailing lists

Re: Seek failure at end of FSM file during WAL replay (in 11) - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Seek failure at end of FSM file during WAL replay (in 11)
Date	July 24, 2019 17:30:42
Msg-id	31570.1563989442@sss.pgh.pa.us Whole thread Raw
In response to	Seek failure at end of FSM file during WAL replay (in 11) (Michael Paquier <michael@paquier.xyz>)
Responses	Re: Seek failure at end of FSM file during WAL replay (in 11)
List	pgsql-hackers

Tree view

Michael Paquier <michael@paquier.xyz> writes:
> Recently, one of the test beds we use has blown up once when doing
> streaming replication like that:
> FATAL:  could not seek to end of file "base/16386/19817_fsm": No such
>    file or directory 
> CONTEXT:  WAL redo at 60/8DA22448 for Heap2/CLEAN: remxid 65751197
> LOG:  startup process (PID 44886) exited with exit code 1

> All the WAL records have been wiped out since, so I don't know exactly
> what happened, but I could track down that this FSM file got removed
> a couple of hours before as I got my hands on some FS-level logs which
> showed a deletion.

Hm.  AFAICS the immediate issuer of the error must have been
_mdnblocks(); there are other matches to that error string but
they are in places where we can tell which file the seek must
have been applied to, and it wasn't a FSM file.

> Before blaming a lower level of
> the application stack, I am wondering if we have some issues with
> mdfd_vfd meaning that the file has been removed but that it is still
> tracked as opened.

lseek() per se presumably would never return ENOENT.  A more likely
theory is that the file wasn't actually open but only had a leftover
VFD entry, and when FileSize() -> FileAccess() tried to open it,
the open failed with ENOENT --- but _mdnblocks() would still call it
a seek failure.

So I'd opine that this is a pretty high-level failure --- what are
we doing trying to replay WAL against a table that's been dropped?
Or if it wasn't dropped, why was the FSM removed?

            regards, tom lane

pgsql-hackers by date:

From: Heikki Linnakangas
Date: 24 July 2019, 17:30:09
Subject: Re: GiST VACUUM

From: vignesh C
Date: 24 July 2019, 17:34:24
Subject: Re: POC: Cleaning up orphaned files using undo logs

Re: Seek failure at end of FSM file during WAL replay (in 11) - Mailing list pgsql-hackers

Previous

Next