Thread: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?
Hi, The zheap patchset, even after being based on pluggable storage, currently has the following condition in RelationAddExtraBlocks(): if (RelationStorageIsZHeap(relation)) { Assert(BufferGetBlockNumber(buffer) != ZHEAP_METAPAGE); ZheapInitPage(page, BufferGetPageSize(buffer)); freespace = PageGetZHeapFreeSpace(page); } else { PageInit(page, BufferGetPageSize(buffer), 0); freespace = PageGetHeapFreeSpace(page); } I.e. it initializes the page differently when zheap is used versus heap. Thinking about whether it's worth to allow to extend that function in an extensible manner made me wonder: Is it actually a good idea to initialize the page at that point, including marking it dirty? As far as I can tell that that has several downsides: - Dirtying the buffer for initialization will cause potentially superfluous IO, with no interesting data in the write except for a newly initialized page. - As there's no sort of interlock, it's entirely possible that, after a crash, the blocks will come up empty, but with the FSM returning it as as empty, so that path would be good to support anyway. - It adds heap specific code to a routine that otherwise could be generic for different table access methods It seems to me, this could be optimized by *not* initializing the page, and having a PageIsNew(), check at the places that check whether the page is new, and initialize it in that case. Right now we'll just ignore such pages when we encounter them (e.g. after a crash) until vacuum initializes them. But given that we've accepted that we'll potentially create empty pages anyway, I don't see what advantages that provides? The warnings emitted by vacuum are pretty scary, and can happen in a lot of legitimate cases, so removing them seems like a good idea anyway. I'm pretty tired, so I might be missing something large and obvious here. Greetings, Andres Freund
On Wed, Dec 19, 2018 at 2:09 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > The zheap patchset, even after being based on pluggable storage, > currently has the following condition in RelationAddExtraBlocks(): > if (RelationStorageIsZHeap(relation)) > { > Assert(BufferGetBlockNumber(buffer) != ZHEAP_METAPAGE); > ZheapInitPage(page, BufferGetPageSize(buffer)); > freespace = PageGetZHeapFreeSpace(page); > } > else > { > PageInit(page, BufferGetPageSize(buffer), 0); > freespace = PageGetHeapFreeSpace(page); > } > > I.e. it initializes the page differently when zheap is used versus > heap. > > Thinking about whether it's worth to allow to extend that function in an > extensible manner made me wonder: Is it actually a good idea to > initialize the page at that point, including marking it dirty? > > As far as I can tell that that has several downsides: > - Dirtying the buffer for initialization will cause potentially > superfluous IO, with no interesting data in the write except for a > newly initialized page. > - As there's no sort of interlock, it's entirely possible that, after a > crash, the blocks will come up empty, but with the FSM returning it as > as empty, so that path would be good to support anyway. > - It adds heap specific code to a routine that otherwise could be > generic for different table access methods > IIUC, your proposal is to remove page initialization and MarkBufferDirty from RelationAddExtraBlocks(), but record them in FSM. Is my understanding correct, if so, I don't see any problem with that and as you have mentioned, it will be generally advantageous as well? > It seems to me, this could be optimized by *not* initializing the page, > and having a PageIsNew(), check at the places that check whether the > page is new, and initialize it in that case. > makes sense to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Wed, Dec 19, 2018 at 3:39 AM Andres Freund <andres@anarazel.de> wrote: > Thinking about whether it's worth to allow to extend that function in an > extensible manner made me wonder: Is it actually a good idea to > initialize the page at that point, including marking it dirty? As far as I can recall, my motivation was to avoid increasing the number of warnings produced by VACUUM. If those warnings are going away, then I don't know that there's any reason to keep that code as it is. But I am not sure whether such a move would provoke any opposition. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2018-12-19 17:28:17 -0500, Robert Haas wrote: > On Wed, Dec 19, 2018 at 3:39 AM Andres Freund <andres@anarazel.de> wrote: > > Thinking about whether it's worth to allow to extend that function in an > > extensible manner made me wonder: Is it actually a good idea to > > initialize the page at that point, including marking it dirty? > > As far as I can recall, my motivation was to avoid increasing the > number of warnings produced by VACUUM. If those warnings are going > away, then I don't know that there's any reason to keep that code as > it is. But I am not sure whether such a move would provoke any > opposition. What's gained by the logic of emitting that warning in VACUUM after a crash? I don't really see any robustness advantages in it. If the logic were that we'd never reuse empty pages because they can hide corruption that normally would discovered by checksums, then we shouldn't reinitialize them at all and instead error out hard - but we can't do that, because it's normal that they occur. Right now we have empty pages on-disk whenever a busy server is restarted in immediate mode, after all. Greetings, Andres Freund
On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote: > What's gained by the logic of emitting that warning in VACUUM after a > crash? I don't really see any robustness advantages in it. If the logic > were that we'd never reuse empty pages because they can hide corruption > that normally would discovered by checksums, then we shouldn't > reinitialize them at all and instead error out hard - but we can't do > that, because it's normal that they occur. Right now we have empty > pages on-disk whenever a busy server is restarted in immediate mode, > after all. I don't know. I am just normally reluctant to change things precipitously that are of long tenure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote: >> What's gained by the logic of emitting that warning in VACUUM after a >> crash? I don't really see any robustness advantages in it. > I don't know. I am just normally reluctant to change things > precipitously that are of long tenure. Me too, but I think Andres has a point here. Those warnings in VACUUM are ancient, probably predating the introduction of WAL :-(. At the time there was good reason to be suspicious of zeroed pages in tables. Now, though, we have (what we think is) a bulletproof crash recovery procedure in which possibly-zeroed pages are to be expected; so we're just causing users unnecessary alarm by warning about them. I think it's reasonable to, if not remove the messages entirely, at least downgrade them to a low DEBUG level. regards, tom lane
Hi, On 2018-12-19 19:39:33 -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote: > >> What's gained by the logic of emitting that warning in VACUUM after a > >> crash? I don't really see any robustness advantages in it. > > > I don't know. I am just normally reluctant to change things > > precipitously that are of long tenure. > > Me too, but I think Andres has a point here. Those warnings in VACUUM > are ancient, probably predating the introduction of WAL :-(. At the > time there was good reason to be suspicious of zeroed pages in tables. > Now, though, we have (what we think is) a bulletproof crash recovery > procedure in which possibly-zeroed pages are to be expected; so we're > just causing users unnecessary alarm by warning about them. I think > it's reasonable to, if not remove the messages entirely, at least > downgrade them to a low DEBUG level. Yea, I think that'd be reasonable. I think we ought to add them, as new/zero pages, to the FSM. If we did so both during vacuum and RelationAddExtraBlocks() we'd avoid the redundant writes, and avoid depending on heap layout in RelationAddExtraBlocks(). I wonder if it'd make sense to only log a DEBUG if there's no corresponding FSM entry. That'd obviously not be bulltproof, but it'd reduce the spammyness considerably. Greetings, Andres Freund
Hi, On 2018-12-19 16:56:36 -0800, Andres Freund wrote: > On 2018-12-19 19:39:33 -0500, Tom Lane wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > > > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote: > > >> What's gained by the logic of emitting that warning in VACUUM after a > > >> crash? I don't really see any robustness advantages in it. > > > > > I don't know. I am just normally reluctant to change things > > > precipitously that are of long tenure. > > > > Me too, but I think Andres has a point here. Those warnings in VACUUM > > are ancient, probably predating the introduction of WAL :-(. At the > > time there was good reason to be suspicious of zeroed pages in tables. > > Now, though, we have (what we think is) a bulletproof crash recovery > > procedure in which possibly-zeroed pages are to be expected; so we're > > just causing users unnecessary alarm by warning about them. I think > > it's reasonable to, if not remove the messages entirely, at least > > downgrade them to a low DEBUG level. > > Yea, I think that'd be reasonable. > > I think we ought to add them, as new/zero pages, to the FSM. If we did > so both during vacuum and RelationAddExtraBlocks() we'd avoid the > redundant writes, and avoid depending on heap layout in > RelationAddExtraBlocks(). > > I wonder if it'd make sense to only log a DEBUG if there's no > corresponding FSM entry. That'd obviously not be bulltproof, but it'd > reduce the spammyness considerably. Here's a prototype patch implementing this change. I'm not sure the new DEBUG message is really worth it? Looking at the surrounding code made me wonder about the wisdom of entering empty pages as all-visible and all-frozen into the VM. That'll mean we'll never re-discover them on a primary, after promotion. There's no mechanism to add such pages to the FSM on a standby (in contrast to e.g. pages where tuples are modified), as vacuum will never visit that page again. Now obviously it has the advantage of avoiding re-processing the page in the next vacuum, but is that really an important goal? If there's frequent vacuums, there got to be a fair amount of modifications, which ought to lead to re-use of such pages at a not too far away point? Greetings, Andres Freund
Attachment
Hi, On 2018-12-20 15:04:11 -0800, Andres Freund wrote: > On 2018-12-19 16:56:36 -0800, Andres Freund wrote: > > On 2018-12-19 19:39:33 -0500, Tom Lane wrote: > > > Robert Haas <robertmhaas@gmail.com> writes: > > > > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote: > > > >> What's gained by the logic of emitting that warning in VACUUM after a > > > >> crash? I don't really see any robustness advantages in it. > > > > > > > I don't know. I am just normally reluctant to change things > > > > precipitously that are of long tenure. > > > > > > Me too, but I think Andres has a point here. Those warnings in VACUUM > > > are ancient, probably predating the introduction of WAL :-(. At the > > > time there was good reason to be suspicious of zeroed pages in tables. > > > Now, though, we have (what we think is) a bulletproof crash recovery > > > procedure in which possibly-zeroed pages are to be expected; so we're > > > just causing users unnecessary alarm by warning about them. I think > > > it's reasonable to, if not remove the messages entirely, at least > > > downgrade them to a low DEBUG level. > > > > Yea, I think that'd be reasonable. > > > > I think we ought to add them, as new/zero pages, to the FSM. If we did > > so both during vacuum and RelationAddExtraBlocks() we'd avoid the > > redundant writes, and avoid depending on heap layout in > > RelationAddExtraBlocks(). > > > > I wonder if it'd make sense to only log a DEBUG if there's no > > corresponding FSM entry. That'd obviously not be bulltproof, but it'd > > reduce the spammyness considerably. > > Here's a prototype patch implementing this change. I'm not sure the new > DEBUG message is really worth it? > > > Looking at the surrounding code made me wonder about the wisdom of > entering empty pages as all-visible and all-frozen into the VM. That'll > mean we'll never re-discover them on a primary, after promotion. There's > no mechanism to add such pages to the FSM on a standby (in contrast to > e.g. pages where tuples are modified), as vacuum will never visit that > page again. Now obviously it has the advantage of avoiding > re-processing the page in the next vacuum, but is that really an > important goal? If there's frequent vacuums, there got to be a fair > amount of modifications, which ought to lead to re-use of such pages at > a not too far away point? Any comments on the approach in this patch? Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: >> Looking at the surrounding code made me wonder about the wisdom of >> entering empty pages as all-visible and all-frozen into the VM. That'll >> mean we'll never re-discover them on a primary, after promotion. There's >> no mechanism to add such pages to the FSM on a standby (in contrast to >> e.g. pages where tuples are modified), as vacuum will never visit that >> page again. Now obviously it has the advantage of avoiding >> re-processing the page in the next vacuum, but is that really an >> important goal? If there's frequent vacuums, there got to be a fair >> amount of modifications, which ought to lead to re-use of such pages at >> a not too far away point? > Any comments on the approach in this patch? I agree with the concept of postponing page init till we're actually going to do something with the page. However, the patch seems to need some polish: * There's a comment in RelationAddExtraBlocks, just above where you changed, that * Extend by one page. This should generally match the main-line * extension code in RelationGetBufferForTuple, except that we hold * the relation extension lock throughout. This seems to be falsified by this patch, in that one of the two paths does PageInit and the other doesn't. * s/unitialized pages/uninitialized pages/ * This bit in vacuumlazy seems unnecessarily confusing: + Size freespace = 0; ... + if (GetRecordedFreeSpace(onerel, blkno) == 0) + freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData; + + if (freespace > 0) + { + RecordPageWithFreeSpace(onerel, blkno, freespace); I'd write that as just + if (GetRecordedFreeSpace(onerel, blkno) == 0) + { + Size freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData; + RecordPageWithFreeSpace(onerel, blkno, freespace); I tend to agree that the DEBUG message isn't very necessary, or at least could be lower than DEBUG1. regards, tom lane
Hi, On 2019-01-15 17:35:21 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > >> Looking at the surrounding code made me wonder about the wisdom of > >> entering empty pages as all-visible and all-frozen into the VM. That'll > >> mean we'll never re-discover them on a primary, after promotion. There's > >> no mechanism to add such pages to the FSM on a standby (in contrast to > >> e.g. pages where tuples are modified), as vacuum will never visit that > >> page again. Now obviously it has the advantage of avoiding > >> re-processing the page in the next vacuum, but is that really an > >> important goal? If there's frequent vacuums, there got to be a fair > >> amount of modifications, which ought to lead to re-use of such pages at > >> a not too far away point? > > > Any comments on the approach in this patch? > > I agree with the concept of postponing page init till we're actually > going to do something with the page. However, the patch seems to need > some polish: Yea, as I'd written in an earlier message, it was really meant as a prototype to see whether there's buyin to the change. > * There's a comment in RelationAddExtraBlocks, just above where you > changed, that > > * Extend by one page. This should generally match the main-line > * extension code in RelationGetBufferForTuple, except that we hold > * the relation extension lock throughout. > > This seems to be falsified by this patch, in that one of the two paths > does PageInit and the other doesn't. > > * s/unitialized pages/uninitialized pages/ > > * This bit in vacuumlazy seems unnecessarily confusing: > > + Size freespace = 0; > ... > + if (GetRecordedFreeSpace(onerel, blkno) == 0) > + freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData; > + > + if (freespace > 0) > + { > + RecordPageWithFreeSpace(onerel, blkno, freespace); > > I'd write that as just > > + if (GetRecordedFreeSpace(onerel, blkno) == 0) > + { > + Size freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData; > + RecordPageWithFreeSpace(onerel, blkno, freespace); Fixed, thanks for looking! > I tend to agree that the DEBUG message isn't very necessary, or at least > could be lower than DEBUG1. After a bit of back and forth waffling, I've removed it now. Besides a fair bit of comment changes the latest version has just one functional change: lazy_check_needs_freeze() doesn't indicate requiring freezing for new pages anymore, if we can't get a cleanup lock on those, it's about to be written to, and we'd not do more than enter it into the FSM. Greetings, Andres Freund
Hi, So, I'd pushed the latest version. And longfin came back with an interesting error: ERROR: page 135 of relation "pg_class" should be empty but is not The only way I can currently imagine this happening is that there's a concurrent vacuum that discovers the page is empty, enters it into the FSM (which now isn't happening under an extension lock anymore), and then a separate backend starts to actually use that buffer. That seems tight but possible. Seems we need to re-add the LockRelationForExtension/UnlockRelationForExtension() logic :( Greetings, Andres Freund
Hi, On 2019-01-28 14:10:55 -0800, Andres Freund wrote: > So, I'd pushed the latest version. And longfin came back with an > interesting error: > > ERROR: page 135 of relation "pg_class" should be empty but is not > > The only way I can currently imagine this happening is that there's a > concurrent vacuum that discovers the page is empty, enters it into the > FSM (which now isn't happening under an extension lock anymore), and > then a separate backend starts to actually use that buffer. That seems > tight but possible. Seems we need to re-add the > LockRelationForExtension/UnlockRelationForExtension() logic :( Hm, but thinking about this, isn't this a risk independent of this change? The FSM isn't WAL logged, and given it's tree-ish structure it's possible to "rediscover" free space (e.g. FreeSpaceMapVacuum()). ISTM that after a crash the FSM might point to free space that doesn't actually exist, and is rediscovered after independent changes. Not sure if that's actually a realistic issue. I'm inclined to put back the LockBuffer(buf, BUFFER_LOCK_UNLOCK); LockRelationForExtension(onerel, ExclusiveLock); UnlockRelationForExtension(onerel, ExclusiveLock); LockBufferForCleanup(buf); if (PageIsNew(page)) dance regardless, just to get the buildfarm to green? But I do wonder if we should just make hio.c cope with this instead. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I'm inclined to put back the > LockBuffer(buf, BUFFER_LOCK_UNLOCK); > LockRelationForExtension(onerel, ExclusiveLock); > UnlockRelationForExtension(onerel, ExclusiveLock); > LockBufferForCleanup(buf); > if (PageIsNew(page)) > dance regardless, just to get the buildfarm to green? The buildfarm's got half a dozen reports now of a failure of this ilk, so you'd better do something. > But I do wonder if we should just make hio.c cope with this instead. Probably should not try to go that way under time presssure. regards, tom lane
Hi, On 2019-01-28 18:08:59 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I'm inclined to put back the > > LockBuffer(buf, BUFFER_LOCK_UNLOCK); > > LockRelationForExtension(onerel, ExclusiveLock); > > UnlockRelationForExtension(onerel, ExclusiveLock); > > LockBufferForCleanup(buf); > > if (PageIsNew(page)) > > dance regardless, just to get the buildfarm to green? > > The buildfarm's got half a dozen reports now of a failure of this ilk, > so you'd better do something. Yea, I was working on a patch. Was trying to come up with an explanation of how this can be realistically hit on the BF, but failed. I've pushed something now, let's see whether that fixes it. > > But I do wonder if we should just make hio.c cope with this instead. > > Probably should not try to go that way under time presssure. Definitely. Greetings, Andres Freund
Hi, On 2019-01-28 15:49:33 -0800, Andres Freund wrote: > On 2019-01-28 18:08:59 -0500, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I'm inclined to put back the > > > LockBuffer(buf, BUFFER_LOCK_UNLOCK); > > > LockRelationForExtension(onerel, ExclusiveLock); > > > UnlockRelationForExtension(onerel, ExclusiveLock); > > > LockBufferForCleanup(buf); > > > if (PageIsNew(page)) > > > dance regardless, just to get the buildfarm to green? > > > > The buildfarm's got half a dozen reports now of a failure of this ilk, > > so you'd better do something. > > Yea, I was working on a patch. Was trying to come up with an explanation > of how this can be realistically hit on the BF, but failed. I've pushed > something now, let's see whether that fixes it. It has not. Given that I don't understand what's happening here I'm going to revert both commits unless I figure it out in the next ~30min. Greetings, Andres Freund
Hi, On 2019-01-28 16:40:36 -0800, Andres Freund wrote: > On 2019-01-28 15:49:33 -0800, Andres Freund wrote: > > On 2019-01-28 18:08:59 -0500, Tom Lane wrote: > > > Andres Freund <andres@anarazel.de> writes: > > > > I'm inclined to put back the > > > > LockBuffer(buf, BUFFER_LOCK_UNLOCK); > > > > LockRelationForExtension(onerel, ExclusiveLock); > > > > UnlockRelationForExtension(onerel, ExclusiveLock); > > > > LockBufferForCleanup(buf); > > > > if (PageIsNew(page)) > > > > dance regardless, just to get the buildfarm to green? > > > > > > The buildfarm's got half a dozen reports now of a failure of this ilk, > > > so you'd better do something. > > > > Yea, I was working on a patch. Was trying to come up with an explanation > > of how this can be realistically hit on the BF, but failed. I've pushed > > something now, let's see whether that fixes it. > > It has not. Given that I don't understand what's happening here I'm > going to revert both commits unless I figure it out in the next ~30min. I did that now. I couldn't reproduce it locally, despite a lot of runs. Looking at the buildfarm it looks like the failures were, excluding handfish which failed without recognizable symptoms before and after, on BSD derived platforms (netbsd, freebsd, OX), which certainly is interesting. I asked Thomas Munro whether he could run on freebsd, and he gave me a login, where I just now reproduced the issue (2 of 5 make check runs failing). Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > I did that now. I couldn't reproduce it locally, despite a lot of > runs. Looking at the buildfarm it looks like the failures were, > excluding handfish which failed without recognizable symptoms before and > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly > is interesting. Isn't it now. Something about the BSD scheduler perhaps? But we've got four or five different BSD-ish platforms that reported failures, and it's hard to believe they've all got identical schedulers. That second handfish failure does match the symptoms elsewhere: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22 --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-3020:11:45.551967381 -0300 +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-2822:38:20.614211568 -0200 @@ -0,0 +1,20 @@ +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125 so it's not quite 100% BSD, but certainly the failure rate on BSD is way higher than elsewhere. Puzzling. regards, tom lane
On 2019-01-28 22:37:53 -0500, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I did that now. I couldn't reproduce it locally, despite a lot of > > runs. Looking at the buildfarm it looks like the failures were, > > excluding handfish which failed without recognizable symptoms before and > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly > > is interesting. > > Isn't it now. Something about the BSD scheduler perhaps? But we've > got four or five different BSD-ish platforms that reported failures, > and it's hard to believe they've all got identical schedulers. > > That second handfish failure does match the symptoms elsewhere: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22 > > --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-30 20:11:45.551967381 -0300 > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-28 22:38:20.614211568 -0200 > @@ -0,0 +1,20 @@ > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125 > > so it's not quite 100% BSD, but certainly the failure rate on BSD is > way higher than elsewhere. Puzzling. Interesting. While chatting with Robert about this issue I came across the following section of code: /* * If the FSM knows nothing of the rel, try the last page before we * give up and extend. This avoids one-tuple-per-page syndrome during * bootstrapping or in a recently-started system. */ if (targetBlock == InvalidBlockNumber) { BlockNumber nblocks = RelationGetNumberOfBlocks(relation); if (nblocks > 0) targetBlock = nblocks - 1; } I think that explains the issue (albeit not why it is much more frequent on BSDs). Because we're not going through the FSM, it's perfectly possible to find a page that is uninitialized, *and* is not yet in the FSM. The only reason this wasn't previously actively broken, I think, is that while we previously *also* looked that page (before the extending backend acquired a lock!), when looking at the page PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free space because it just interprets the zeroes in pd_upper - pd_lower as no free space. Hm, thinking about what a good solution here could be. Greetings, Andres Freund
On 2019-01-29 11:25:41 -0800, Andres Freund wrote: > On 2019-01-28 22:37:53 -0500, Tom Lane wrote: > > Andres Freund <andres@anarazel.de> writes: > > > I did that now. I couldn't reproduce it locally, despite a lot of > > > runs. Looking at the buildfarm it looks like the failures were, > > > excluding handfish which failed without recognizable symptoms before and > > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly > > > is interesting. > > > > Isn't it now. Something about the BSD scheduler perhaps? But we've > > got four or five different BSD-ish platforms that reported failures, > > and it's hard to believe they've all got identical schedulers. > > > > That second handfish failure does match the symptoms elsewhere: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22 > > > > --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-3020:11:45.551967381 -0300 > > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-28 22:38:20.614211568 -0200 > > @@ -0,0 +1,20 @@ > > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125 > > > > so it's not quite 100% BSD, but certainly the failure rate on BSD is > > way higher than elsewhere. Puzzling. > > Interesting. > > While chatting with Robert about this issue I came across the following > section of code: > > /* > * If the FSM knows nothing of the rel, try the last page before we > * give up and extend. This avoids one-tuple-per-page syndrome during > * bootstrapping or in a recently-started system. > */ > if (targetBlock == InvalidBlockNumber) > { > BlockNumber nblocks = RelationGetNumberOfBlocks(relation); > > if (nblocks > 0) > targetBlock = nblocks - 1; > } > > > I think that explains the issue (albeit not why it is much more frequent > on BSDs). Because we're not going through the FSM, it's perfectly > possible to find a page that is uninitialized, *and* is not yet in the > FSM. The only reason this wasn't previously actively broken, I think, is > that while we previously *also* looked that page (before the extending > backend acquired a lock!), when looking at the page > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free > space because it just interprets the zeroes in pd_upper - pd_lower as no > free space. > > Hm, thinking about what a good solution here could be. I wonder if we should just expand the logic we have for RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could just use it without any changes, but the name seems a bit confusing) - because that'd prevent the current weirdness that it's possible that the buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and the LockBuffer(BUFFER_LOCK_EXCLUSIVE). I think that'd allow us to alltogether drop the cleanup lock logic we currently have, and also protect us against the FSM issue I'd outlined upthread? Greetings, Andres Freund
On 2019-01-29 11:25:41 -0800, Andres Freund wrote: > While chatting with Robert about this issue I came across the following > section of code: > > /* > * If the FSM knows nothing of the rel, try the last page before we > * give up and extend. This avoids one-tuple-per-page syndrome during > * bootstrapping or in a recently-started system. > */ > if (targetBlock == InvalidBlockNumber) > { > BlockNumber nblocks = RelationGetNumberOfBlocks(relation); > > if (nblocks > 0) > targetBlock = nblocks - 1; > } > > > I think that explains the issue (albeit not why it is much more frequent > on BSDs). Because we're not going through the FSM, it's perfectly > possible to find a page that is uninitialized, *and* is not yet in the > FSM. The only reason this wasn't previously actively broken, I think, is > that while we previously *also* looked that page (before the extending > backend acquired a lock!), when looking at the page > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free > space because it just interprets the zeroes in pd_upper - pd_lower as no > free space. FWIW, after commenting out that block and adapting a few regression tests to changed plans, I could not reproduce the issue on a FreeBSD machine in 31 runs, where it previously triggered in roughly 1/3 cases. Still don't quite understand why so much more likely on BSD... Greetings, Andres Freund
Hi, On 2019-01-29 12:23:51 -0800, Andres Freund wrote: > On 2019-01-29 11:25:41 -0800, Andres Freund wrote: > > On 2019-01-28 22:37:53 -0500, Tom Lane wrote: > > > Andres Freund <andres@anarazel.de> writes: > > > > I did that now. I couldn't reproduce it locally, despite a lot of > > > > runs. Looking at the buildfarm it looks like the failures were, > > > > excluding handfish which failed without recognizable symptoms before and > > > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly > > > > is interesting. > > > > > > Isn't it now. Something about the BSD scheduler perhaps? But we've > > > got four or five different BSD-ish platforms that reported failures, > > > and it's hard to believe they've all got identical schedulers. > > > > > > That second handfish failure does match the symptoms elsewhere: > > > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22 > > > > > > --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr 2018-10-3020:11:45.551967381 -0300 > > > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr 2019-01-2822:38:20.614211568 -0200 > > > @@ -0,0 +1,20 @@ > > > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125 > > > > > > so it's not quite 100% BSD, but certainly the failure rate on BSD is > > > way higher than elsewhere. Puzzling. > > > > Interesting. > > > > While chatting with Robert about this issue I came across the following > > section of code: > > > > /* > > * If the FSM knows nothing of the rel, try the last page before we > > * give up and extend. This avoids one-tuple-per-page syndrome during > > * bootstrapping or in a recently-started system. > > */ > > if (targetBlock == InvalidBlockNumber) > > { > > BlockNumber nblocks = RelationGetNumberOfBlocks(relation); > > > > if (nblocks > 0) > > targetBlock = nblocks - 1; > > } > > > > > > I think that explains the issue (albeit not why it is much more frequent > > on BSDs). Because we're not going through the FSM, it's perfectly > > possible to find a page that is uninitialized, *and* is not yet in the > > FSM. The only reason this wasn't previously actively broken, I think, is > > that while we previously *also* looked that page (before the extending > > backend acquired a lock!), when looking at the page > > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free > > space because it just interprets the zeroes in pd_upper - pd_lower as no > > free space. > > > > Hm, thinking about what a good solution here could be. > > I wonder if we should just expand the logic we have for > RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could > just use it without any changes, but the name seems a bit confusing) - > because that'd prevent the current weirdness that it's possible that the > buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and > the LockBuffer(BUFFER_LOCK_EXCLUSIVE). I think that'd allow us to > alltogether drop the cleanup lock logic we currently have, and also > protect us against the FSM issue I'd outlined upthread? Here's a version of the patch implementing this approach. I assume this solves the FreeBSD issue, but I'm running tests in a loop on Thomas' machine. I did not rename RBM_ZERO_AND_LOCK. New buffers are zeroed too, so that still seems apt enough. Greetings, Andres Freund
Attachment
Hi, On 2019-02-01 07:14:11 -0800, Andres Freund wrote: > Here's a version of the patch implementing this approach. I assume this > solves the FreeBSD issue, but I'm running tests in a loop on Thomas' > machine. I pushed this, and the buildfarm so far is showing more love. There's a failure on jacana, but it looks like it's a machine issue: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-02-03%2013%3A17%3A57 Feb 03 08:36:16 rm: cannot remove `/home/pgrunner/bf/root/HEAD/pgsql.build/tmp_install/home/pgrunner/bf/root/HEAD/inst/bin/postgres.exe':Permission denied And one on eelpout, which appears to be unrelated as well: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2019-02-03%2011%3A54%3A13 # Failed test 'intermediate client certificate is missing: matches' # at t/001_ssltests.pl line 385. # 'psql: server closed the connection unexpectedly # This probably means the server terminated abnormally # before or while processing the request. # could not send startup packet: Connection reset by peer # ' # doesn't match '(?^:SSL error)' # Looks like you failed 1 test of 71. [12:15:36] t/001_ssltests.pl .. Dubious, test returned 1 (wstat 256, 0x100) Greetings, Andres Freund
On Sun, Feb 3, 2019 at 8:48 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2019-02-01 07:14:11 -0800, Andres Freund wrote: > > Here's a version of the patch implementing this approach. I assume this > > solves the FreeBSD issue, but I'm running tests in a loop on Thomas' > > machine. > > I pushed this, and the buildfarm so far is showing more love. > > There's a failure on jacana, but it looks like it's a machine issue: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-02-03%2013%3A17%3A57 > Feb 03 08:36:16 rm: cannot remove `/home/pgrunner/bf/root/HEAD/pgsql.build/tmp_install/home/pgrunner/bf/root/HEAD/inst/bin/postgres.exe':Permission denied > > And one on eelpout, which appears to be unrelated as well: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2019-02-03%2011%3A54%3A13 > > # Failed test 'intermediate client certificate is missing: matches' > # at t/001_ssltests.pl line 385. > # 'psql: server closed the connection unexpectedly > # This probably means the server terminated abnormally > # before or while processing the request. > # could not send startup packet: Connection reset by peer > # ' > # doesn't match '(?^:SSL error)' > # Looks like you failed 1 test of 71. > [12:15:36] t/001_ssltests.pl .. > Dubious, test returned 1 (wstat 256, 0x100) > None of these failures seem to be related to your commit. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On Mon, Feb 4, 2019 at 1:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > On Sun, Feb 3, 2019 at 8:48 PM Andres Freund <andres@anarazel.de> wrote: > > And one on eelpout, which appears to be unrelated as well: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2019-02-03%2011%3A54%3A13 > None of these failures seem to be related to your commit. Yeah, that seems to be a problem with recent OpenSSL. I have some analysis over here, but I don't know what exactly is going on yet: https://www.postgresql.org/message-id/flat/CAEepm%3D2n6Nv%2B5tFfe8YnkUm1fXgvxR0Mm1FoD%2BQKG-vLNGLyKg%40mail.gmail.com -- Thomas Munro http://www.enterprisedb.com