Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Date
Msg-id 1715251.1624371066@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Michael Paquier <michael@paquier.xyz>)
Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Thomas Munro <thomas.munro@gmail.com>)
List pgsql-bugs
Thomas Munro <thomas.munro@gmail.com> writes:
> Your analysis seems right to me.  We have to worry about both things:
> atomicity of writes on power failure (assumed to be sector-level,
> hence our 512 byte struct -- all good), and atomicity of concurrent
> reads and writes (we can't assume anything at all, so r/w locking is
> the simplest way to get a consistent read).  Shouldn't relmap_redo()
> also acquire the lock exclusively?

Shouldn't we instead file a kernel bug report?  I seem to recall that
POSIX guarantees atomicity of these things up to some operation size.
Or is that just for pipe I/O?

If we can't assume atomicity of relmapper file I/O, I wonder about
pg_control as well.  But on the whole, what I'm smelling is a moderately
recently introduced kernel bug.  We've been doing this this way for
years and heard no previous reports.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17068: Incorrect ordering of a particular row.
Next
From: David Rowley
Date:
Subject: Re: BUG #17068: Incorrect ordering of a particular row.