Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Date
Msg-id 20190201151411.2bnn5mdrgairfdyf@alap3.anarazel.de
Whole thread Raw
In response to Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?  (Andres Freund <andres@anarazel.de>)
Responses Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
List pgsql-hackers
Hi,

On 2019-01-29 12:23:51 -0800, Andres Freund wrote:
> On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> > On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > > runs. Looking at the buildfarm it looks like the failures were,
> > > > excluding handfish which failed without recognizable symptoms before and
> > > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > > is interesting.
> > > 
> > > Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> > > got four or five different BSD-ish platforms that reported failures,
> > > and it's hard to believe they've all got identical schedulers.
> > > 
> > > That second handfish failure does match the symptoms elsewhere:
> > > 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> > > 
> > > ---
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
2018-10-3020:11:45.551967381 -0300
 
> > > +++
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
2019-01-2822:38:20.614211568 -0200
 
> > > @@ -0,0 +1,20 @@
> > > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
> > > 
> > > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > > way higher than elsewhere.  Puzzling.
> > 
> > Interesting.
> > 
> > While chatting with Robert about this issue I came across the following
> > section of code:
> > 
> >         /*
> >          * If the FSM knows nothing of the rel, try the last page before we
> >          * give up and extend.  This avoids one-tuple-per-page syndrome during
> >          * bootstrapping or in a recently-started system.
> >          */
> >         if (targetBlock == InvalidBlockNumber)
> >         {
> >             BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> > 
> >             if (nblocks > 0)
> >                 targetBlock = nblocks - 1;
> >         }
> > 
> > 
> > I think that explains the issue (albeit not why it is much more frequent
> > on BSDs).  Because we're not going through the FSM, it's perfectly
> > possible to find a page that is uninitialized, *and* is not yet in the
> > FSM. The only reason this wasn't previously actively broken, I think, is
> > that while we previously *also* looked that page (before the extending
> > backend acquired a lock!), when looking at the page
> > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> > space because it just interprets the zeroes in pd_upper - pd_lower as no
> > free space.
> > 
> > Hm, thinking about what a good solution here could be.
> 
> I wonder if we should just expand the logic we have for
> RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
> just use it without any changes, but the name seems a bit confusing) -
> because that'd prevent the current weirdness that it's possible that the
> buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
> the LockBuffer(BUFFER_LOCK_EXCLUSIVE).  I think that'd allow us to
> alltogether drop the cleanup lock logic we currently have, and also
> protect us against the FSM issue I'd outlined upthread?

Here's a version of the patch implementing this approach.  I assume this
solves the FreeBSD issue, but I'm running tests in a loop on Thomas'
machine.

I did not rename RBM_ZERO_AND_LOCK. New buffers are zeroed too, so that
still seems apt enough.

Greetings,

Andres Freund

Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: pg_dump multi VALUES INSERT
Next
From: "REIX, Tony"
Date:
Subject: RE: Shared Memory: How to use SYSV rather than MMAP ?