Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()? |
Date | |
Msg-id | 20190201151411.2bnn5mdrgairfdyf@alap3.anarazel.de Whole thread Raw |
In response to | Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
|
List | pgsql-hackers |
Hi, On 2019-01-29 12:23:51 -0800, Andres Freund wrote: > On 2019-01-29 11:25:41 -0800, Andres Freund wrote: > > On 2019-01-28 22:37:53 -0500, Tom Lane wrote: > > > Andres Freund <andres@anarazel.de> writes: > > > > I did that now. I couldn't reproduce it locally, despite a lot of > > > > runs. Looking at the buildfarm it looks like the failures were, > > > > excluding handfish which failed without recognizable symptoms before and > > > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly > > > > is interesting. > > > > > > Isn't it now. Something about the BSD scheduler perhaps? But we've > > > got four or five different BSD-ish platforms that reported failures, > > > and it's hard to believe they've all got identical schedulers. > > > > > > That second handfish failure does match the symptoms elsewhere: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22 > > > > > > --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-3020:11:45.551967381 -0300 > > > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-2822:38:20.614211568 -0200 > > > @@ -0,0 +1,20 @@ > > > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125 > > > > > > so it's not quite 100% BSD, but certainly the failure rate on BSD is > > > way higher than elsewhere. Puzzling. > > > > Interesting. > > > > While chatting with Robert about this issue I came across the following > > section of code: > > > > /* > > * If the FSM knows nothing of the rel, try the last page before we > > * give up and extend. This avoids one-tuple-per-page syndrome during > > * bootstrapping or in a recently-started system. > > */ > > if (targetBlock == InvalidBlockNumber) > > { > > BlockNumber nblocks = RelationGetNumberOfBlocks(relation); > > > > if (nblocks > 0) > > targetBlock = nblocks - 1; > > } > > > > > > I think that explains the issue (albeit not why it is much more frequent > > on BSDs). Because we're not going through the FSM, it's perfectly > > possible to find a page that is uninitialized, *and* is not yet in the > > FSM. The only reason this wasn't previously actively broken, I think, is > > that while we previously *also* looked that page (before the extending > > backend acquired a lock!), when looking at the page > > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free > > space because it just interprets the zeroes in pd_upper - pd_lower as no > > free space. > > > > Hm, thinking about what a good solution here could be. > > I wonder if we should just expand the logic we have for > RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could > just use it without any changes, but the name seems a bit confusing) - > because that'd prevent the current weirdness that it's possible that the > buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and > the LockBuffer(BUFFER_LOCK_EXCLUSIVE). I think that'd allow us to > alltogether drop the cleanup lock logic we currently have, and also > protect us against the FSM issue I'd outlined upthread? Here's a version of the patch implementing this approach. I assume this solves the FreeBSD issue, but I'm running tests in a loop on Thomas' machine. I did not rename RBM_ZERO_AND_LOCK. New buffers are zeroed too, so that still seems apt enough. Greetings, Andres Freund
Attachment
pgsql-hackers by date: