Thread: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

The zheap patchset, even after being based on pluggable storage,
currently has the following condition in RelationAddExtraBlocks():
        if (RelationStorageIsZHeap(relation))
        {
            Assert(BufferGetBlockNumber(buffer) != ZHEAP_METAPAGE);
            ZheapInitPage(page, BufferGetPageSize(buffer));
            freespace = PageGetZHeapFreeSpace(page);
        }
        else
        {
            PageInit(page, BufferGetPageSize(buffer), 0);
            freespace = PageGetHeapFreeSpace(page);
        }

I.e. it initializes the page differently when zheap is used versus
heap.

Thinking about whether it's worth to allow to extend that function in an
extensible manner made me wonder:  Is it actually a good idea to
initialize the page at that point, including marking it dirty?

As far as I can tell that that has several downsides:
- Dirtying the buffer for initialization will cause potentially
  superfluous IO, with no interesting data in the write except for a
  newly initialized page.
- As there's no sort of interlock, it's entirely possible that, after a
  crash, the blocks will come up empty, but with the FSM returning it as
  as empty, so that path would be good to support anyway.
- It adds heap specific code to a routine that otherwise could be
  generic for different table access methods

It seems to me, this could be optimized by *not* initializing the page,
and having a PageIsNew(), check at the places that check whether the
page is new, and initialize it in that case.

Right now we'll just ignore such pages when we encounter them
(e.g. after a crash) until vacuum initializes them. But given that we've
accepted that we'll potentially create empty pages anyway, I don't see
what advantages that provides?  The warnings emitted by vacuum are
pretty scary, and can happen in a lot of legitimate cases, so removing
them seems like a good idea anyway.

I'm pretty tired, so I might be missing something large and obvious
here.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Amit Kapila
Date:
On Wed, Dec 19, 2018 at 2:09 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> The zheap patchset, even after being based on pluggable storage,
> currently has the following condition in RelationAddExtraBlocks():
>                 if (RelationStorageIsZHeap(relation))
>                 {
>                         Assert(BufferGetBlockNumber(buffer) != ZHEAP_METAPAGE);
>                         ZheapInitPage(page, BufferGetPageSize(buffer));
>                         freespace = PageGetZHeapFreeSpace(page);
>                 }
>                 else
>                 {
>                         PageInit(page, BufferGetPageSize(buffer), 0);
>                         freespace = PageGetHeapFreeSpace(page);
>                 }
>
> I.e. it initializes the page differently when zheap is used versus
> heap.
>
> Thinking about whether it's worth to allow to extend that function in an
> extensible manner made me wonder:  Is it actually a good idea to
> initialize the page at that point, including marking it dirty?
>
> As far as I can tell that that has several downsides:
> - Dirtying the buffer for initialization will cause potentially
>   superfluous IO, with no interesting data in the write except for a
>   newly initialized page.
> - As there's no sort of interlock, it's entirely possible that, after a
>   crash, the blocks will come up empty, but with the FSM returning it as
>   as empty, so that path would be good to support anyway.
> - It adds heap specific code to a routine that otherwise could be
>   generic for different table access methods
>

IIUC, your proposal is to remove page initialization and
MarkBufferDirty from RelationAddExtraBlocks(), but record them in FSM.
Is my understanding correct, if so, I don't see any problem with that
and as you have mentioned, it will be generally advantageous as well?

> It seems to me, this could be optimized by *not* initializing the page,
> and having a PageIsNew(), check at the places that check whether the
> page is new, and initialize it in that case.
>

makes sense to me.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Robert Haas
Date:
On Wed, Dec 19, 2018 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> Thinking about whether it's worth to allow to extend that function in an
> extensible manner made me wonder:  Is it actually a good idea to
> initialize the page at that point, including marking it dirty?

As far as I can recall, my motivation was to avoid increasing the
number of warnings produced by VACUUM.  If those warnings are going
away, then I don't know that there's any reason to keep that code as
it is.  But I am not sure whether such a move would provoke any
opposition.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2018-12-19 17:28:17 -0500, Robert Haas wrote:
> On Wed, Dec 19, 2018 at 3:39 AM Andres Freund <andres@anarazel.de> wrote:
> > Thinking about whether it's worth to allow to extend that function in an
> > extensible manner made me wonder:  Is it actually a good idea to
> > initialize the page at that point, including marking it dirty?
> 
> As far as I can recall, my motivation was to avoid increasing the
> number of warnings produced by VACUUM.  If those warnings are going
> away, then I don't know that there's any reason to keep that code as
> it is.  But I am not sure whether such a move would provoke any
> opposition.

What's gained by the logic of emitting that warning in VACUUM after a
crash? I don't really see any robustness advantages in it.  If the logic
were that we'd never reuse empty pages because they can hide corruption
that normally would discovered by checksums, then we shouldn't
reinitialize them at all and instead error out hard - but we can't do
that, because it's normal that they occur.  Right now we have empty
pages on-disk whenever a busy server is restarted in immediate mode,
after all.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Robert Haas
Date:
On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
> What's gained by the logic of emitting that warning in VACUUM after a
> crash? I don't really see any robustness advantages in it.  If the logic
> were that we'd never reuse empty pages because they can hide corruption
> that normally would discovered by checksums, then we shouldn't
> reinitialize them at all and instead error out hard - but we can't do
> that, because it's normal that they occur.  Right now we have empty
> pages on-disk whenever a busy server is restarted in immediate mode,
> after all.

I don't know.  I am just normally reluctant to change things
precipitously that are of long tenure.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
>> What's gained by the logic of emitting that warning in VACUUM after a
>> crash? I don't really see any robustness advantages in it.

> I don't know.  I am just normally reluctant to change things
> precipitously that are of long tenure.

Me too, but I think Andres has a point here.  Those warnings in VACUUM
are ancient, probably predating the introduction of WAL :-(.  At the
time there was good reason to be suspicious of zeroed pages in tables.
Now, though, we have (what we think is) a bulletproof crash recovery
procedure in which possibly-zeroed pages are to be expected; so we're
just causing users unnecessary alarm by warning about them.  I think
it's reasonable to, if not remove the messages entirely, at least
downgrade them to a low DEBUG level.

            regards, tom lane


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2018-12-19 19:39:33 -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
> >> What's gained by the logic of emitting that warning in VACUUM after a
> >> crash? I don't really see any robustness advantages in it.
> 
> > I don't know.  I am just normally reluctant to change things
> > precipitously that are of long tenure.
> 
> Me too, but I think Andres has a point here.  Those warnings in VACUUM
> are ancient, probably predating the introduction of WAL :-(.  At the
> time there was good reason to be suspicious of zeroed pages in tables.
> Now, though, we have (what we think is) a bulletproof crash recovery
> procedure in which possibly-zeroed pages are to be expected; so we're
> just causing users unnecessary alarm by warning about them.  I think
> it's reasonable to, if not remove the messages entirely, at least
> downgrade them to a low DEBUG level.

Yea, I think that'd be reasonable.

I think we ought to add them, as new/zero pages, to the FSM. If we did
so both during vacuum and RelationAddExtraBlocks() we'd avoid the
redundant writes, and avoid depending on heap layout in
RelationAddExtraBlocks().

I wonder if it'd make sense to only log a DEBUG if there's no
corresponding FSM entry. That'd obviously not be bulltproof, but it'd
reduce the spammyness considerably.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2018-12-19 16:56:36 -0800, Andres Freund wrote:
> On 2018-12-19 19:39:33 -0500, Tom Lane wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> > > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
> > >> What's gained by the logic of emitting that warning in VACUUM after a
> > >> crash? I don't really see any robustness advantages in it.
> > 
> > > I don't know.  I am just normally reluctant to change things
> > > precipitously that are of long tenure.
> > 
> > Me too, but I think Andres has a point here.  Those warnings in VACUUM
> > are ancient, probably predating the introduction of WAL :-(.  At the
> > time there was good reason to be suspicious of zeroed pages in tables.
> > Now, though, we have (what we think is) a bulletproof crash recovery
> > procedure in which possibly-zeroed pages are to be expected; so we're
> > just causing users unnecessary alarm by warning about them.  I think
> > it's reasonable to, if not remove the messages entirely, at least
> > downgrade them to a low DEBUG level.
> 
> Yea, I think that'd be reasonable.
> 
> I think we ought to add them, as new/zero pages, to the FSM. If we did
> so both during vacuum and RelationAddExtraBlocks() we'd avoid the
> redundant writes, and avoid depending on heap layout in
> RelationAddExtraBlocks().
> 
> I wonder if it'd make sense to only log a DEBUG if there's no
> corresponding FSM entry. That'd obviously not be bulltproof, but it'd
> reduce the spammyness considerably.

Here's a prototype patch implementing this change.  I'm not sure the new
DEBUG message is really worth it?


Looking at the surrounding code made me wonder about the wisdom of
entering empty pages as all-visible and all-frozen into the VM. That'll
mean we'll never re-discover them on a primary, after promotion. There's
no mechanism to add such pages to the FSM on a standby (in contrast to
e.g. pages where tuples are modified), as vacuum will never visit that
page again.  Now obviously it has the advantage of avoiding
re-processing the page in the next vacuum, but is that really an
important goal? If there's frequent vacuums, there got to be a fair
amount of modifications, which ought to lead to re-use of such pages at
a not too far away point?

Greetings,

Andres Freund

Attachment

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2018-12-20 15:04:11 -0800, Andres Freund wrote:
> On 2018-12-19 16:56:36 -0800, Andres Freund wrote:
> > On 2018-12-19 19:39:33 -0500, Tom Lane wrote:
> > > Robert Haas <robertmhaas@gmail.com> writes:
> > > > On Wed, Dec 19, 2018 at 5:37 PM Andres Freund <andres@anarazel.de> wrote:
> > > >> What's gained by the logic of emitting that warning in VACUUM after a
> > > >> crash? I don't really see any robustness advantages in it.
> > > 
> > > > I don't know.  I am just normally reluctant to change things
> > > > precipitously that are of long tenure.
> > > 
> > > Me too, but I think Andres has a point here.  Those warnings in VACUUM
> > > are ancient, probably predating the introduction of WAL :-(.  At the
> > > time there was good reason to be suspicious of zeroed pages in tables.
> > > Now, though, we have (what we think is) a bulletproof crash recovery
> > > procedure in which possibly-zeroed pages are to be expected; so we're
> > > just causing users unnecessary alarm by warning about them.  I think
> > > it's reasonable to, if not remove the messages entirely, at least
> > > downgrade them to a low DEBUG level.
> > 
> > Yea, I think that'd be reasonable.
> > 
> > I think we ought to add them, as new/zero pages, to the FSM. If we did
> > so both during vacuum and RelationAddExtraBlocks() we'd avoid the
> > redundant writes, and avoid depending on heap layout in
> > RelationAddExtraBlocks().
> > 
> > I wonder if it'd make sense to only log a DEBUG if there's no
> > corresponding FSM entry. That'd obviously not be bulltproof, but it'd
> > reduce the spammyness considerably.
> 
> Here's a prototype patch implementing this change.  I'm not sure the new
> DEBUG message is really worth it?
> 
> 
> Looking at the surrounding code made me wonder about the wisdom of
> entering empty pages as all-visible and all-frozen into the VM. That'll
> mean we'll never re-discover them on a primary, after promotion. There's
> no mechanism to add such pages to the FSM on a standby (in contrast to
> e.g. pages where tuples are modified), as vacuum will never visit that
> page again.  Now obviously it has the advantage of avoiding
> re-processing the page in the next vacuum, but is that really an
> important goal? If there's frequent vacuums, there got to be a fair
> amount of modifications, which ought to lead to re-use of such pages at
> a not too far away point?

Any comments on the approach in this patch?

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
>> Looking at the surrounding code made me wonder about the wisdom of
>> entering empty pages as all-visible and all-frozen into the VM. That'll
>> mean we'll never re-discover them on a primary, after promotion. There's
>> no mechanism to add such pages to the FSM on a standby (in contrast to
>> e.g. pages where tuples are modified), as vacuum will never visit that
>> page again.  Now obviously it has the advantage of avoiding
>> re-processing the page in the next vacuum, but is that really an
>> important goal? If there's frequent vacuums, there got to be a fair
>> amount of modifications, which ought to lead to re-use of such pages at
>> a not too far away point?

> Any comments on the approach in this patch?

I agree with the concept of postponing page init till we're actually
going to do something with the page.  However, the patch seems to need
some polish:

* There's a comment in RelationAddExtraBlocks, just above where you
changed, that

         * Extend by one page.  This should generally match the main-line
         * extension code in RelationGetBufferForTuple, except that we hold
         * the relation extension lock throughout.

This seems to be falsified by this patch, in that one of the two paths
does PageInit and the other doesn't.

* s/unitialized pages/uninitialized pages/

* This bit in vacuumlazy seems unnecessarily confusing:

+            Size        freespace = 0;
...
+            if (GetRecordedFreeSpace(onerel, blkno) == 0)
+                freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
+
+            if (freespace > 0)
+            {
+                RecordPageWithFreeSpace(onerel, blkno, freespace);

I'd write that as just

+            if (GetRecordedFreeSpace(onerel, blkno) == 0)
+            {
+                Size    freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
+                RecordPageWithFreeSpace(onerel, blkno, freespace);


I tend to agree that the DEBUG message isn't very necessary, or at least
could be lower than DEBUG1.

            regards, tom lane


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-01-15 17:35:21 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> >> Looking at the surrounding code made me wonder about the wisdom of
> >> entering empty pages as all-visible and all-frozen into the VM. That'll
> >> mean we'll never re-discover them on a primary, after promotion. There's
> >> no mechanism to add such pages to the FSM on a standby (in contrast to
> >> e.g. pages where tuples are modified), as vacuum will never visit that
> >> page again.  Now obviously it has the advantage of avoiding
> >> re-processing the page in the next vacuum, but is that really an
> >> important goal? If there's frequent vacuums, there got to be a fair
> >> amount of modifications, which ought to lead to re-use of such pages at
> >> a not too far away point?
> 
> > Any comments on the approach in this patch?
> 
> I agree with the concept of postponing page init till we're actually
> going to do something with the page.  However, the patch seems to need
> some polish:

Yea, as I'd written in an earlier message, it was really meant as a
prototype to see whether there's buyin to the change.


> * There's a comment in RelationAddExtraBlocks, just above where you
> changed, that
> 
>          * Extend by one page.  This should generally match the main-line
>          * extension code in RelationGetBufferForTuple, except that we hold
>          * the relation extension lock throughout.
> 
> This seems to be falsified by this patch, in that one of the two paths
> does PageInit and the other doesn't.
> 
> * s/unitialized pages/uninitialized pages/
> 
> * This bit in vacuumlazy seems unnecessarily confusing:
> 
> +            Size        freespace = 0;
> ...
> +            if (GetRecordedFreeSpace(onerel, blkno) == 0)
> +                freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
> +
> +            if (freespace > 0)
> +            {
> +                RecordPageWithFreeSpace(onerel, blkno, freespace);
> 
> I'd write that as just
> 
> +            if (GetRecordedFreeSpace(onerel, blkno) == 0)
> +            {
> +                Size    freespace = BufferGetPageSize(buf) - SizeOfPageHeaderData;
> +                RecordPageWithFreeSpace(onerel, blkno, freespace);

Fixed, thanks for looking!


> I tend to agree that the DEBUG message isn't very necessary, or at least
> could be lower than DEBUG1.

After a bit of back and forth waffling, I've removed it now.

Besides a fair bit of comment changes the latest version has just one
functional change: lazy_check_needs_freeze() doesn't indicate requiring
freezing for new pages anymore, if we can't get a cleanup lock on those,
it's about to be written to, and we'd not do more than enter it into the
FSM.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

So, I'd pushed the latest version. And longfin came back with an
interesting error:

ERROR:  page 135 of relation "pg_class" should be empty but is not

The only way I can currently imagine this happening is that there's a
concurrent vacuum that discovers the page is empty, enters it into the
FSM (which now isn't happening under an extension lock anymore), and
then a separate backend starts to actually use that buffer.  That seems
tight but possible.  Seems we need to re-add the
LockRelationForExtension/UnlockRelationForExtension() logic :(

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-01-28 14:10:55 -0800, Andres Freund wrote:
> So, I'd pushed the latest version. And longfin came back with an
> interesting error:
> 
> ERROR:  page 135 of relation "pg_class" should be empty but is not
> 
> The only way I can currently imagine this happening is that there's a
> concurrent vacuum that discovers the page is empty, enters it into the
> FSM (which now isn't happening under an extension lock anymore), and
> then a separate backend starts to actually use that buffer.  That seems
> tight but possible.  Seems we need to re-add the
> LockRelationForExtension/UnlockRelationForExtension() logic :(

Hm, but thinking about this, isn't this a risk independent of this
change? The FSM isn't WAL logged, and given it's tree-ish structure it's
possible to "rediscover" free space (e.g. FreeSpaceMapVacuum()). ISTM
that after a crash the FSM might point to free space that doesn't
actually exist, and is rediscovered after independent changes.  Not sure
if that's actually a realistic issue.

I'm inclined to put back the
           LockBuffer(buf, BUFFER_LOCK_UNLOCK);
           LockRelationForExtension(onerel, ExclusiveLock);
           UnlockRelationForExtension(onerel, ExclusiveLock);
           LockBufferForCleanup(buf);
           if (PageIsNew(page))
dance regardless, just to get the buildfarm to green?

But I do wonder if we should just make hio.c cope with this instead.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I'm inclined to put back the
>            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
>            LockRelationForExtension(onerel, ExclusiveLock);
>            UnlockRelationForExtension(onerel, ExclusiveLock);
>            LockBufferForCleanup(buf);
>            if (PageIsNew(page))
> dance regardless, just to get the buildfarm to green?

The buildfarm's got half a dozen reports now of a failure of this ilk,
so you'd better do something.

> But I do wonder if we should just make hio.c cope with this instead.

Probably should not try to go that way under time presssure.

            regards, tom lane


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-01-28 18:08:59 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I'm inclined to put back the
> >            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> >            LockRelationForExtension(onerel, ExclusiveLock);
> >            UnlockRelationForExtension(onerel, ExclusiveLock);
> >            LockBufferForCleanup(buf);
> >            if (PageIsNew(page))
> > dance regardless, just to get the buildfarm to green?
> 
> The buildfarm's got half a dozen reports now of a failure of this ilk,
> so you'd better do something.

Yea, I was working on a patch. Was trying to come up with an explanation
of how this can be realistically hit on the BF, but failed.  I've pushed
something now, let's see whether that fixes it.


> > But I do wonder if we should just make hio.c cope with this instead.
> 
> Probably should not try to go that way under time presssure.

Definitely.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-01-28 15:49:33 -0800, Andres Freund wrote:
> On 2019-01-28 18:08:59 -0500, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I'm inclined to put back the
> > >            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> > >            LockRelationForExtension(onerel, ExclusiveLock);
> > >            UnlockRelationForExtension(onerel, ExclusiveLock);
> > >            LockBufferForCleanup(buf);
> > >            if (PageIsNew(page))
> > > dance regardless, just to get the buildfarm to green?
> > 
> > The buildfarm's got half a dozen reports now of a failure of this ilk,
> > so you'd better do something.
> 
> Yea, I was working on a patch. Was trying to come up with an explanation
> of how this can be realistically hit on the BF, but failed.  I've pushed
> something now, let's see whether that fixes it.

It has not. Given that I don't understand what's happening here I'm
going to revert both commits unless I figure it out in the next ~30min.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-01-28 16:40:36 -0800, Andres Freund wrote:
> On 2019-01-28 15:49:33 -0800, Andres Freund wrote:
> > On 2019-01-28 18:08:59 -0500, Tom Lane wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > > > I'm inclined to put back the
> > > >            LockBuffer(buf, BUFFER_LOCK_UNLOCK);
> > > >            LockRelationForExtension(onerel, ExclusiveLock);
> > > >            UnlockRelationForExtension(onerel, ExclusiveLock);
> > > >            LockBufferForCleanup(buf);
> > > >            if (PageIsNew(page))
> > > > dance regardless, just to get the buildfarm to green?
> > > 
> > > The buildfarm's got half a dozen reports now of a failure of this ilk,
> > > so you'd better do something.
> > 
> > Yea, I was working on a patch. Was trying to come up with an explanation
> > of how this can be realistically hit on the BF, but failed.  I've pushed
> > something now, let's see whether that fixes it.
> 
> It has not. Given that I don't understand what's happening here I'm
> going to revert both commits unless I figure it out in the next ~30min.

I did that now. I couldn't reproduce it locally, despite a lot of
runs. Looking at the buildfarm it looks like the failures were,
excluding handfish which failed without recognizable symptoms before and
after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
is interesting. I asked Thomas Munro whether he could run on freebsd,
and he gave me a login, where I just now reproduced the issue (2 of 5
make check runs failing).

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> I did that now. I couldn't reproduce it locally, despite a lot of
> runs. Looking at the buildfarm it looks like the failures were,
> excluding handfish which failed without recognizable symptoms before and
> after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> is interesting.

Isn't it now.  Something about the BSD scheduler perhaps?  But we've
got four or five different BSD-ish platforms that reported failures,
and it's hard to believe they've all got identical schedulers.

That second handfish failure does match the symptoms elsewhere:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22

--- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
2018-10-3020:11:45.551967381 -0300 
+++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
2019-01-2822:38:20.614211568 -0200 
@@ -0,0 +1,20 @@
+SQL error: page 0 of relation "test_thread" should be empty but is not on line 125

so it's not quite 100% BSD, but certainly the failure rate on BSD is
way higher than elsewhere.  Puzzling.

            regards, tom lane


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I did that now. I couldn't reproduce it locally, despite a lot of
> > runs. Looking at the buildfarm it looks like the failures were,
> > excluding handfish which failed without recognizable symptoms before and
> > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > is interesting.
> 
> Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> got four or five different BSD-ish platforms that reported failures,
> and it's hard to believe they've all got identical schedulers.
> 
> That second handfish failure does match the symptoms elsewhere:
> 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> 
> --- /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
 2018-10-30 20:11:45.551967381 -0300
 
> +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
2019-01-28 22:38:20.614211568 -0200
 
> @@ -0,0 +1,20 @@
> +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
> 
> so it's not quite 100% BSD, but certainly the failure rate on BSD is
> way higher than elsewhere.  Puzzling.

Interesting.

While chatting with Robert about this issue I came across the following
section of code:

        /*
         * If the FSM knows nothing of the rel, try the last page before we
         * give up and extend.  This avoids one-tuple-per-page syndrome during
         * bootstrapping or in a recently-started system.
         */
        if (targetBlock == InvalidBlockNumber)
        {
            BlockNumber nblocks = RelationGetNumberOfBlocks(relation);

            if (nblocks > 0)
                targetBlock = nblocks - 1;
        }


I think that explains the issue (albeit not why it is much more frequent
on BSDs).  Because we're not going through the FSM, it's perfectly
possible to find a page that is uninitialized, *and* is not yet in the
FSM. The only reason this wasn't previously actively broken, I think, is
that while we previously *also* looked that page (before the extending
backend acquired a lock!), when looking at the page
PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
space because it just interprets the zeroes in pd_upper - pd_lower as no
free space.

Hm, thinking about what a good solution here could be.

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > Andres Freund <andres@anarazel.de> writes:
> > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > runs. Looking at the buildfarm it looks like the failures were,
> > > excluding handfish which failed without recognizable symptoms before and
> > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > is interesting.
> > 
> > Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> > got four or five different BSD-ish platforms that reported failures,
> > and it's hard to believe they've all got identical schedulers.
> > 
> > That second handfish failure does match the symptoms elsewhere:
> > 
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> > 
> > ---
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
2018-10-3020:11:45.551967381 -0300
 
> > +++ /home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
  2019-01-28 22:38:20.614211568 -0200
 
> > @@ -0,0 +1,20 @@
> > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
> > 
> > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > way higher than elsewhere.  Puzzling.
> 
> Interesting.
> 
> While chatting with Robert about this issue I came across the following
> section of code:
> 
>         /*
>          * If the FSM knows nothing of the rel, try the last page before we
>          * give up and extend.  This avoids one-tuple-per-page syndrome during
>          * bootstrapping or in a recently-started system.
>          */
>         if (targetBlock == InvalidBlockNumber)
>         {
>             BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> 
>             if (nblocks > 0)
>                 targetBlock = nblocks - 1;
>         }
> 
> 
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs).  Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.
> 
> Hm, thinking about what a good solution here could be.

I wonder if we should just expand the logic we have for
RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
just use it without any changes, but the name seems a bit confusing) -
because that'd prevent the current weirdness that it's possible that the
buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
the LockBuffer(BUFFER_LOCK_EXCLUSIVE).  I think that'd allow us to
alltogether drop the cleanup lock logic we currently have, and also
protect us against the FSM issue I'd outlined upthread?

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> While chatting with Robert about this issue I came across the following
> section of code:
> 
>         /*
>          * If the FSM knows nothing of the rel, try the last page before we
>          * give up and extend.  This avoids one-tuple-per-page syndrome during
>          * bootstrapping or in a recently-started system.
>          */
>         if (targetBlock == InvalidBlockNumber)
>         {
>             BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> 
>             if (nblocks > 0)
>                 targetBlock = nblocks - 1;
>         }
> 
> 
> I think that explains the issue (albeit not why it is much more frequent
> on BSDs).  Because we're not going through the FSM, it's perfectly
> possible to find a page that is uninitialized, *and* is not yet in the
> FSM. The only reason this wasn't previously actively broken, I think, is
> that while we previously *also* looked that page (before the extending
> backend acquired a lock!), when looking at the page
> PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> space because it just interprets the zeroes in pd_upper - pd_lower as no
> free space.

FWIW, after commenting out that block and adapting a few regression
tests to changed plans, I could not reproduce the issue on a FreeBSD
machine in 31 runs, where it previously triggered in roughly 1/3 cases.

Still don't quite understand why so much more likely on BSD...

Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-01-29 12:23:51 -0800, Andres Freund wrote:
> On 2019-01-29 11:25:41 -0800, Andres Freund wrote:
> > On 2019-01-28 22:37:53 -0500, Tom Lane wrote:
> > > Andres Freund <andres@anarazel.de> writes:
> > > > I did that now. I couldn't reproduce it locally, despite a lot of
> > > > runs. Looking at the buildfarm it looks like the failures were,
> > > > excluding handfish which failed without recognizable symptoms before and
> > > > after, on BSD derived platforms (netbsd, freebsd, OX), which certainly
> > > > is interesting.
> > > 
> > > Isn't it now.  Something about the BSD scheduler perhaps?  But we've
> > > got four or five different BSD-ish platforms that reported failures,
> > > and it's hard to believe they've all got identical schedulers.
> > > 
> > > That second handfish failure does match the symptoms elsewhere:
> > > 
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=handfish&dt=2019-01-29%2000%3A20%3A22
> > > 
> > > ---
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/expected/thread-thread.stderr
2018-10-3020:11:45.551967381 -0300
 
> > > +++
/home/filiperosset/dev/client-code-REL_8/HEAD/pgsql.build/src/interfaces/ecpg/test/results/thread-thread.stderr
2019-01-2822:38:20.614211568 -0200
 
> > > @@ -0,0 +1,20 @@
> > > +SQL error: page 0 of relation "test_thread" should be empty but is not on line 125
> > > 
> > > so it's not quite 100% BSD, but certainly the failure rate on BSD is
> > > way higher than elsewhere.  Puzzling.
> > 
> > Interesting.
> > 
> > While chatting with Robert about this issue I came across the following
> > section of code:
> > 
> >         /*
> >          * If the FSM knows nothing of the rel, try the last page before we
> >          * give up and extend.  This avoids one-tuple-per-page syndrome during
> >          * bootstrapping or in a recently-started system.
> >          */
> >         if (targetBlock == InvalidBlockNumber)
> >         {
> >             BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
> > 
> >             if (nblocks > 0)
> >                 targetBlock = nblocks - 1;
> >         }
> > 
> > 
> > I think that explains the issue (albeit not why it is much more frequent
> > on BSDs).  Because we're not going through the FSM, it's perfectly
> > possible to find a page that is uninitialized, *and* is not yet in the
> > FSM. The only reason this wasn't previously actively broken, I think, is
> > that while we previously *also* looked that page (before the extending
> > backend acquired a lock!), when looking at the page
> > PageGetHeapFreeSpace(), via PageGetFreeSpace(), decides there's no free
> > space because it just interprets the zeroes in pd_upper - pd_lower as no
> > free space.
> > 
> > Hm, thinking about what a good solution here could be.
> 
> I wonder if we should just expand the logic we have for
> RBM_ZERO_AND_LOCK so it can be and use it in hio.c (we probably could
> just use it without any changes, but the name seems a bit confusing) -
> because that'd prevent the current weirdness that it's possible that the
> buffer can be locked by somebody between the ReadBufferBI(P_NEW) and and
> the LockBuffer(BUFFER_LOCK_EXCLUSIVE).  I think that'd allow us to
> alltogether drop the cleanup lock logic we currently have, and also
> protect us against the FSM issue I'd outlined upthread?

Here's a version of the patch implementing this approach.  I assume this
solves the FreeBSD issue, but I'm running tests in a loop on Thomas'
machine.

I did not rename RBM_ZERO_AND_LOCK. New buffers are zeroed too, so that
still seems apt enough.

Greetings,

Andres Freund

Attachment

Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Andres Freund
Date:
Hi,

On 2019-02-01 07:14:11 -0800, Andres Freund wrote:
> Here's a version of the patch implementing this approach.  I assume this
> solves the FreeBSD issue, but I'm running tests in a loop on Thomas'
> machine.

I pushed this, and the buildfarm so far is showing more love.

There's a failure on jacana, but it looks like it's a machine issue:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-02-03%2013%3A17%3A57
Feb 03 08:36:16 rm: cannot remove
`/home/pgrunner/bf/root/HEAD/pgsql.build/tmp_install/home/pgrunner/bf/root/HEAD/inst/bin/postgres.exe':Permission
denied

And one on eelpout, which appears to be unrelated as well:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2019-02-03%2011%3A54%3A13

#   Failed test 'intermediate client certificate is missing: matches'
#   at t/001_ssltests.pl line 385.
#                   'psql: server closed the connection unexpectedly
#     This probably means the server terminated abnormally
#     before or while processing the request.
# could not send startup packet: Connection reset by peer
# '
#     doesn't match '(?^:SSL error)'
# Looks like you failed 1 test of 71.
[12:15:36] t/001_ssltests.pl .. 
Dubious, test returned 1 (wstat 256, 0x100)


Greetings,

Andres Freund


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Amit Kapila
Date:
On Sun, Feb 3, 2019 at 8:48 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-02-01 07:14:11 -0800, Andres Freund wrote:
> > Here's a version of the patch implementing this approach.  I assume this
> > solves the FreeBSD issue, but I'm running tests in a loop on Thomas'
> > machine.
>
> I pushed this, and the buildfarm so far is showing more love.
>
> There's a failure on jacana, but it looks like it's a machine issue:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana&dt=2019-02-03%2013%3A17%3A57
> Feb 03 08:36:16 rm: cannot remove
`/home/pgrunner/bf/root/HEAD/pgsql.build/tmp_install/home/pgrunner/bf/root/HEAD/inst/bin/postgres.exe':Permission
denied
>
> And one on eelpout, which appears to be unrelated as well:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2019-02-03%2011%3A54%3A13
>
> #   Failed test 'intermediate client certificate is missing: matches'
> #   at t/001_ssltests.pl line 385.
> #                   'psql: server closed the connection unexpectedly
> #       This probably means the server terminated abnormally
> #       before or while processing the request.
> # could not send startup packet: Connection reset by peer
> # '
> #     doesn't match '(?^:SSL error)'
> # Looks like you failed 1 test of 71.
> [12:15:36] t/001_ssltests.pl ..
> Dubious, test returned 1 (wstat 256, 0x100)
>

None of these failures seem to be related to your commit.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: Why are we PageInit'ing buffers in RelationAddExtraBlocks()?

From
Thomas Munro
Date:
On Mon, Feb 4, 2019 at 1:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> On Sun, Feb 3, 2019 at 8:48 PM Andres Freund <andres@anarazel.de> wrote:
> > And one on eelpout, which appears to be unrelated as well:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=eelpout&dt=2019-02-03%2011%3A54%3A13

> None of these failures seem to be related to your commit.

Yeah, that seems to be a problem with recent OpenSSL.  I have some
analysis over here, but I don't know what exactly is going on yet:

https://www.postgresql.org/message-id/flat/CAEepm%3D2n6Nv%2B5tFfe8YnkUm1fXgvxR0Mm1FoD%2BQKG-vLNGLyKg%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com