Antonin Houska <ah@cybertec.at> wrote:
> Michael Paquier <michael@paquier.xyz> wrote:
>
> > On Mon, Nov 11, 2019 at 10:03:14AM +0100, Antonin Houska wrote:
> > > This looks good to me.
> >
> > Actually, no, this is not good. I have been studying more the patch,
> > and after stressing more this code path with a cluster having
> > checksums enabled and shared_buffers at 1MB, I have been able to make
> > a couple of page's LSNs go backwards with pgbench -s 100. The cause
> > was simply that the page got flushed with a newer LSN than what was
> > returned by XLogSaveBufferForHint() before taking the buffer header
> > lock, so updating only the LSN for a non-dirty page was simply
> > guarding against that.
>
> Interesting. Now that I know about the problem, I could have reproduced it
> using gdb: MarkBufferDirtyHint() was called by 2 backends concurrently in such
> a way that the first backend generates the LSN, but before it manages to
> assign it to the page, another backend generates another LSN and sets it.
>
> Can't we just apply the attached diff on the top of your patch?
I wanted to register the patch for the next CF so it's not forgotten, but see
it's already there. Why have you set the status to "withdrawn"?
--
Antonin Houska
Web: https://www.cybertec-postgresql.com