Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum" - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Date
Msg-id CA+hUKGKKgJbFycOfr6JTBLp_A-zKY1Ow8zndB6qjwSJpw7Yzgg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Thomas Munro <thomas.munro@gmail.com>)
Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"  (Heikki Linnakangas <hlinnaka@iki.fi>)
List pgsql-bugs
On Wed, Jun 23, 2021 at 2:11 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Your analysis seems right to me.  We have to worry about both things:
> > atomicity of writes on power failure (assumed to be sector-level,
> > hence our 512 byte struct -- all good), and atomicity of concurrent
> > reads and writes (we can't assume anything at all, so r/w locking is
> > the simplest way to get a consistent read).  Shouldn't relmap_redo()
> > also acquire the lock exclusively?
>
> Shouldn't we instead file a kernel bug report?  I seem to recall that
> POSIX guarantees atomicity of these things up to some operation size.
> Or is that just for pipe I/O?

The spec doesn't cover us according to some opinions, at least:

https://utcc.utoronto.ca/~cks/space/blog/unix/WriteNotVeryAtomic

But at the same time, the behaviour seems quite surprising given the
parameters involved and how at least I thought this stuff worked in
practice (ie what the rules about the visibility of writes that
precede reads imply for the unspoken locking rule that must be the
obvious reasonable implementation, and the reality of the inode-level
read/write locking plainly visible in the source).  It's possible that
it's not working as designed in some weird edge case.  I guess the
next thing to do is write a minimal repro and find an expert to ask
about what it's supposed to do.



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"
Next
From: Thomas Munro
Date:
Subject: Re: BUG #17064: Parallel VACUUM operations cause the error "global/pg_filenode.map contains incorrect checksum"