Thread: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Masahiko Sawada
Date:
Hi all,

I faced suspicious behaviour on hot standby server related to visibility map.
The scenario is,

1. Create table and check internal of visibility map on master server.
postgres(1)=# create table hoge (col int);
CREATE TABLE
postgres(1)=# insert into hoge select generate_series(1,10);
INSERT 0 10
postgres(1)=# select * from pg_visibility('hoge');
 blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
     0 | f           | f          | f
(1 row)

2. Check internal of visibility map on standby server.
postgres(1)=# select * from pg_visibility('hoge');
 blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
     0 | f           | f          | f
(1 row)

3. Do VACUUM on master server.
postgres(1)=# vacuum hoge;
VACUUM
postgres(1)=# select * from pg_visibility('hoge');
 blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
     0 | t           | f          | t
(1 row)

4. Check internal of visibility map on standby server again.
** Note that the we use the connection we established at #2 again **
postgres(1)=# select * from pg_visibility('hoge');
 blkno | all_visible | all_frozen | pd_all_visible
-------+-------------+------------+----------------
     0 | f           | f          | t
(1 row)

Even on standby server, the result should be (t,f,t), but returned (f,f,t)
(XLOG_HEAP2_VISIBLE record actually has been reached to standby, and
has been surely  applied)

As a result of looked into code around the recvoery, ISTM that the
cause is related to relation cache clear.
In heap_xlog_visible, if the standby server receives WAL record then
relation cache is eventually cleared in vm_extend,  but If standby
server receives FPI then relation cache would not be cleared.
For example, after I applied attached patch to HEAD, (it might not be
right way but) this problem seems to be resolved.

Is this a bug? or not?

Regards,

--
Masahiko Sawada

Attachment

Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Noah Misch
Date:
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> As a result of looked into code around the recvoery, ISTM that the
> cause is related to relation cache clear.
> In heap_xlog_visible, if the standby server receives WAL record then
> relation cache is eventually cleared in vm_extend,  but If standby
> server receives FPI then relation cache would not be cleared.
> For example, after I applied attached patch to HEAD, (it might not be
> right way but) this problem seems to be resolved.
> 
> Is this a bug? or not?

It's a bug.  I don't expect it causes queries to return wrong answers, because
visibilitymap.c says "it's always safe to clear a bit in the map from
correctness point of view."  (The bug makes a visibility map bit temporarily
appear to have been cleared.)  I still call it a bug, because recovery
behavior becomes too difficult to verify when xlog replay produces conditions
that don't happen outside of recovery.  Even if there's no way to get a wrong
query answer today, this would be too easy to break later.  I wonder if we
make the same omission in other xlog replay functions.  Similar omissions may
cause wrong query answers, even if this particular one does not.

Would you like to bisect for the commit, or at least the major release, at
which the bug first appeared?

I wonder if your discovery has any relationship to this recently-reported case
of insufficient smgr invalidation:
http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com

Thanks,
nm



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Masahiko Sawada
Date:
On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
>> As a result of looked into code around the recvoery, ISTM that the
>> cause is related to relation cache clear.
>> In heap_xlog_visible, if the standby server receives WAL record then
>> relation cache is eventually cleared in vm_extend,  but If standby
>> server receives FPI then relation cache would not be cleared.
>> For example, after I applied attached patch to HEAD, (it might not be
>> right way but) this problem seems to be resolved.
>>
>> Is this a bug? or not?
>
> It's a bug.  I don't expect it causes queries to return wrong answers, because
> visibilitymap.c says "it's always safe to clear a bit in the map from
> correctness point of view."  (The bug makes a visibility map bit temporarily
> appear to have been cleared.)  I still call it a bug, because recovery
> behavior becomes too difficult to verify when xlog replay produces conditions
> that don't happen outside of recovery.  Even if there's no way to get a wrong
> query answer today, this would be too easy to break later.  I wonder if we
> make the same omission in other xlog replay functions.  Similar omissions may
> cause wrong query answers, even if this particular one does not.
>
> Would you like to bisect for the commit, or at least the major release, at
> which the bug first appeared?
>
> I wonder if your discovery has any relationship to this recently-reported case
> of insufficient smgr invalidation:
> http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
>

I'm not sure this bug has relationship to another issue you mentioned
but after further investigation, this bug seems to be reproduced even
on more older version.
At least I reproduced it at 9.0.0.

Regards,

--
Masahiko Sawada



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Noah Misch
Date:
On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> >> As a result of looked into code around the recvoery, ISTM that the
> >> cause is related to relation cache clear.
> >> In heap_xlog_visible, if the standby server receives WAL record then
> >> relation cache is eventually cleared in vm_extend,  but If standby
> >> server receives FPI then relation cache would not be cleared.
> >> For example, after I applied attached patch to HEAD, (it might not be
> >> right way but) this problem seems to be resolved.
> >>
> >> Is this a bug? or not?
> >
> > It's a bug.  I don't expect it causes queries to return wrong answers, because
> > visibilitymap.c says "it's always safe to clear a bit in the map from
> > correctness point of view."  (The bug makes a visibility map bit temporarily
> > appear to have been cleared.)  I still call it a bug, because recovery
> > behavior becomes too difficult to verify when xlog replay produces conditions
> > that don't happen outside of recovery.  Even if there's no way to get a wrong
> > query answer today, this would be too easy to break later.  I wonder if we
> > make the same omission in other xlog replay functions.  Similar omissions may
> > cause wrong query answers, even if this particular one does not.
> >
> > Would you like to bisect for the commit, or at least the major release, at
> > which the bug first appeared?
> >
> > I wonder if your discovery has any relationship to this recently-reported case
> > of insufficient smgr invalidation:
> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
> >
> 
> I'm not sure this bug has relationship to another issue you mentioned
> but after further investigation, this bug seems to be reproduced even
> on more older version.
> At least I reproduced it at 9.0.0.

Would you try PostgreSQL 9.2.16?  The visibility map was not crash safe and
had no correctness implications until 9.2.  If 9.2 behaves this way, it's
almost certainly not a recent regression.



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Masahiko Sawada
Date:
On Fri, Apr 1, 2016 at 9:10 AM, Noah Misch <noah@leadboat.com> wrote:
> On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote:
>> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote:
>> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
>> >> As a result of looked into code around the recvoery, ISTM that the
>> >> cause is related to relation cache clear.
>> >> In heap_xlog_visible, if the standby server receives WAL record then
>> >> relation cache is eventually cleared in vm_extend,  but If standby
>> >> server receives FPI then relation cache would not be cleared.
>> >> For example, after I applied attached patch to HEAD, (it might not be
>> >> right way but) this problem seems to be resolved.
>> >>
>> >> Is this a bug? or not?
>> >
>> > It's a bug.  I don't expect it causes queries to return wrong answers, because
>> > visibilitymap.c says "it's always safe to clear a bit in the map from
>> > correctness point of view."  (The bug makes a visibility map bit temporarily
>> > appear to have been cleared.)  I still call it a bug, because recovery
>> > behavior becomes too difficult to verify when xlog replay produces conditions
>> > that don't happen outside of recovery.  Even if there's no way to get a wrong
>> > query answer today, this would be too easy to break later.  I wonder if we
>> > make the same omission in other xlog replay functions.  Similar omissions may
>> > cause wrong query answers, even if this particular one does not.
>> >
>> > Would you like to bisect for the commit, or at least the major release, at
>> > which the bug first appeared?
>> >
>> > I wonder if your discovery has any relationship to this recently-reported case
>> > of insufficient smgr invalidation:
>> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com
>> >
>>
>> I'm not sure this bug has relationship to another issue you mentioned
>> but after further investigation, this bug seems to be reproduced even
>> on more older version.
>> At least I reproduced it at 9.0.0.
>
> Would you try PostgreSQL 9.2.16?  The visibility map was not crash safe and
> had no correctness implications until 9.2.  If 9.2 behaves this way, it's
> almost certainly not a recent regression.

Yeah, I reproduced it on 9.2.0 and 9.2.16, it's not recent regression.
The commit is 503c7305a1e379f95649eef1a694d0c1dbdc674a which
introduces crash-safe visibility map.

Regards,

--
Masahiko Sawada



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-03-31 01:02:06 -0400, Noah Misch wrote:
> On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote:
> > As a result of looked into code around the recvoery, ISTM that the
> > cause is related to relation cache clear.
> > In heap_xlog_visible, if the standby server receives WAL record then
> > relation cache is eventually cleared in vm_extend,  but If standby
> > server receives FPI then relation cache would not be cleared.
> > For example, after I applied attached patch to HEAD, (it might not be
> > right way but) this problem seems to be resolved.
> > 
> > Is this a bug? or not?
> 
> It's a bug.

I agree it's a bug. If I understand correctly it's not really
visibilitymap related though:

Debugging shows that vm_extend() (on the master) correctly issues a
CacheInvalidateSmgr(), which ought to clear the smgr state on the
standby. But replay doesn't do anything like that.  That kind of sounded
familiar.  The issue is that vacuum doesn't assign an xid and thus
RecordTransactionCommit() doesn't emit a commit record, which in turn
means invalidation messages aren't sent to the standby.

Ugh.

We've recently discussed a very similar issue around
http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de

Unfortunately Simon over in that thread disagreed there about fixing
this by always emitting a commit record when nmsgs > 0 in
RecordTransactionCommit().  I think this thread is a pretty strong hint
that we actually should do so.

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Robert Haas
Date:
On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
> We've recently discussed a very similar issue around
> http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de
>
> Unfortunately Simon over in that thread disagreed there about fixing
> this by always emitting a commit record when nmsgs > 0 in
> RecordTransactionCommit().  I think this thread is a pretty strong hint
> that we actually should do so.

Yes.  I'm pretty confident that you had the right idea there, and that
Simon's objection was off-base.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-14 11:50:58 -0400, Robert Haas wrote:
> On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
> > We've recently discussed a very similar issue around
> > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de
> >
> > Unfortunately Simon over in that thread disagreed there about fixing
> > this by always emitting a commit record when nmsgs > 0 in
> > RecordTransactionCommit().  I think this thread is a pretty strong hint
> > that we actually should do so.
> 
> Yes.  I'm pretty confident that you had the right idea there, and that
> Simon's objection was off-base.

The easiest way to achieve that seems to be to just assign an xid if
that's the case; while it's not necessarily safe/efficient to do so at
the point the invalidation message was queued, I think it should be safe
to do so at commit time. Seems less invasive to backpatch than to either
support commit records without xids, or a separate record just
transporting invalidation messages.

Although a separate record with invalidations is going to be needed to
support logical decoding of running xacts anyway...

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Robert Haas
Date:
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-14 11:50:58 -0400, Robert Haas wrote:
>> On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
>> > We've recently discussed a very similar issue around
>> > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de
>> >
>> > Unfortunately Simon over in that thread disagreed there about fixing
>> > this by always emitting a commit record when nmsgs > 0 in
>> > RecordTransactionCommit().  I think this thread is a pretty strong hint
>> > that we actually should do so.
>>
>> Yes.  I'm pretty confident that you had the right idea there, and that
>> Simon's objection was off-base.
>
> The easiest way to achieve that seems to be to just assign an xid if
> that's the case; while it's not necessarily safe/efficient to do so at
> the point the invalidation message was queued, I think it should be safe
> to do so at commit time. Seems less invasive to backpatch than to either
> support commit records without xids, or a separate record just
> transporting invalidation messages.

I agree that's better for back-patching.  I hope it won't suck
performance-wise.  In master, we might think of inventing something
new.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-15 13:07:19 -0400, Robert Haas wrote:
> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
> > On 2016-04-14 11:50:58 -0400, Robert Haas wrote:
> >> On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote:
> >> > We've recently discussed a very similar issue around
> >> > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de
> >> >
> >> > Unfortunately Simon over in that thread disagreed there about fixing
> >> > this by always emitting a commit record when nmsgs > 0 in
> >> > RecordTransactionCommit().  I think this thread is a pretty strong hint
> >> > that we actually should do so.
> >>
> >> Yes.  I'm pretty confident that you had the right idea there, and that
> >> Simon's objection was off-base.
> >
> > The easiest way to achieve that seems to be to just assign an xid if
> > that's the case; while it's not necessarily safe/efficient to do so at
> > the point the invalidation message was queued, I think it should be safe
> > to do so at commit time. Seems less invasive to backpatch than to either
> > support commit records without xids, or a separate record just
> > transporting invalidation messages.
> 
> I agree that's better for back-patching.

It's a bit ugly though, since we're at that stage pretty heavily
assuming there's no xid assigned (on the caller level)... I've toyed
with the idea of emitting a commit record that doesn't have an assigned
xid, but that'd require changes on the apply side, which would be uglier
than a new record type :(

> I hope it won't suck performance-wise.

I can't see that be the case, there's not many places where we send
invalidation messages without an assigned xid. Running an instrumented
build with an appropriate wal level reveals about a handfull places.

Greetings,

Andres Freund



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
>> The easiest way to achieve that seems to be to just assign an xid if
>> that's the case; while it's not necessarily safe/efficient to do so at
>> the point the invalidation message was queued, I think it should be safe
>> to do so at commit time. Seems less invasive to backpatch than to either
>> support commit records without xids, or a separate record just
>> transporting invalidation messages.

> I agree that's better for back-patching.  I hope it won't suck
> performance-wise.  In master, we might think of inventing something
> new.

I'm a little worried about whether this will break assumptions that
vacuum doesn't have an XID.  I don't immediately see how it would,
but it seems a bit shaky.

I find it hard to believe that the act of assigning an XID would add
measurably to the cost of a vacuum, so Robert's performance concern
doesn't sound very exciting.  If this works, I think it's fine to
adopt as a permanent solution.
        regards, tom lane



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Robert Haas
Date:
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
>>> The easiest way to achieve that seems to be to just assign an xid if
>>> that's the case; while it's not necessarily safe/efficient to do so at
>>> the point the invalidation message was queued, I think it should be safe
>>> to do so at commit time. Seems less invasive to backpatch than to either
>>> support commit records without xids, or a separate record just
>>> transporting invalidation messages.
>
>> I agree that's better for back-patching.  I hope it won't suck
>> performance-wise.  In master, we might think of inventing something
>> new.
>
> I'm a little worried about whether this will break assumptions that
> vacuum doesn't have an XID.  I don't immediately see how it would,
> but it seems a bit shaky.

Actually, I think there's a bigger problem.  If you manage to almost
wrap around the XID space, and the cluster shuts down, you are
guaranteed to be able to vacuum the whole cluster without actually
running out of XIDs.  Forcing an XID to be assigned here would make
that uncertain - it would depend on how many tables you have versus
how many XIDs you have left.  That seems unacceptable to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-15 13:44:27 -0400, Robert Haas wrote:
> On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
> >>> The easiest way to achieve that seems to be to just assign an xid if
> >>> that's the case; while it's not necessarily safe/efficient to do so at
> >>> the point the invalidation message was queued, I think it should be safe
> >>> to do so at commit time. Seems less invasive to backpatch than to either
> >>> support commit records without xids, or a separate record just
> >>> transporting invalidation messages.
> >
> >> I agree that's better for back-patching.  I hope it won't suck
> >> performance-wise.  In master, we might think of inventing something
> >> new.
> >
> > I'm a little worried about whether this will break assumptions that
> > vacuum doesn't have an XID.  I don't immediately see how it would,
> > but it seems a bit shaky.
> 
> Actually, I think there's a bigger problem.  If you manage to almost
> wrap around the XID space, and the cluster shuts down, you are
> guaranteed to be able to vacuum the whole cluster without actually
> running out of XIDs.

Currently you're unfortunately not, c.f.
http://www.postgresql.org/message-id/CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com


> Forcing an XID to be assigned here would make
> that uncertain - it would depend on how many tables you have versus
> how many XIDs you have left.  That seems unacceptable to me.

But I agree that that's a concern.  I'm kinda leaning towards
introducing an invalidation WAL record for that case atm. It's quite
possible to force an xid to be assigned in xact.c, but it's certainly
not pretty (the least ugly way is to duplicate the
xactGetCommittedInvalidationMessages() call, and force an xid to be
assigned early in CommitTransaction().

Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Simon Riggs
Date:
On 15 April 2016 at 18:44, Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
>>> The easiest way to achieve that seems to be to just assign an xid if
>>> that's the case; while it's not necessarily safe/efficient to do so at
>>> the point the invalidation message was queued, I think it should be safe
>>> to do so at commit time. Seems less invasive to backpatch than to either
>>> support commit records without xids, or a separate record just
>>> transporting invalidation messages.
>
>> I agree that's better for back-patching.  I hope it won't suck
>> performance-wise.  In master, we might think of inventing something
>> new.
>
> I'm a little worried about whether this will break assumptions that
> vacuum doesn't have an XID.  I don't immediately see how it would,
> but it seems a bit shaky.

Actually, I think there's a bigger problem.  If you manage to almost
wrap around the XID space, and the cluster shuts down, you are
guaranteed to be able to vacuum the whole cluster without actually
running out of XIDs.  Forcing an XID to be assigned here would make
that uncertain - it would depend on how many tables you have versus
how many XIDs you have left.  That seems unacceptable to me.

I agree this is a bug, similar to the last one.

For me, the issue is that we need to do something to catch bugs. The existing code does not have any test at all to check whether we are doing the wrong thing - it just lets the wrong thing happen.

Fixing it by forcing a new behaviour might be the right thing to do going forwards, but I don't much like the idea of forcing new behaviour in back branches. It might fix this bug, but can easily cause others.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-15 19:59:06 +0100, Simon Riggs wrote:
> For me, the issue is that we need to do something to catch bugs. The
> existing code does not have any test at all to check whether we are doing
> the wrong thing - it just lets the wrong thing happen.

But sending the message, without assigning an xid, *IS* the right thing
to do here? We shouldn't assign an xid, and we need to send the message
out to the standbys.


> Fixing it by forcing a new behaviour might be the right thing to do going
> forwards, but I don't much like the idea of forcing new behaviour in back
> branches. It might fix this bug, but can easily cause others.

What's your alternative? Assigning an xid in the middle of vacuum isn't
ok, breaking vacuum isn't either?

Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Simon Riggs
Date:
On 15 April 2016 at 20:01, Andres Freund <andres@anarazel.de> wrote:
On 2016-04-15 19:59:06 +0100, Simon Riggs wrote:
> For me, the issue is that we need to do something to catch bugs. The
> existing code does not have any test at all to check whether we are doing
> the wrong thing - it just lets the wrong thing happen.

But sending the message, without assigning an xid, *IS* the right thing
to do here? We shouldn't assign an xid, and we need to send the message
out to the standbys.


> Fixing it by forcing a new behaviour might be the right thing to do going
> forwards, but I don't much like the idea of forcing new behaviour in back
> branches. It might fix this bug, but can easily cause others.

What's your alternative? Assigning an xid in the middle of vacuum isn't
ok, breaking vacuum isn't either?

In my understanding we have two choices for this bug

1) assign an xid so it forces sending a message (message plus xid)
2) send a message without assigning an xid (message only)

(1) seems like it is worse for backpatching, IMHO, though I am willing to hear other thoughts or options

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes:
> In my understanding we have two choices for this bug

> 1) assign an xid so it forces sending a message (message plus xid)
> 2) send a message without assigning an xid (message only)

> (1) seems like it is worse for backpatching, IMHO, though I am willing to
> hear other thoughts or options

The problem with (1) is that it creates side-effects that could be bad;
Robert's already pointed out one close-to-show-stopper consequence,
and I have little confidence that there are not others.  In general,
if we got here without assigning an xid, there's a reason.

I think the bottom line is that we misdesigned the WAL representation
by assuming that this sort of info could always be piggybacked on a
transaction commit record.  It's time to fix that.
        regards, tom lane



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> I think the bottom line is that we misdesigned the WAL representation
> by assuming that this sort of info could always be piggybacked on a
> transaction commit record.  It's time to fix that.

I think we got to piggyback it onto a commit record, as long as there's
one. Otherwise it's going to be more complex (queuing messages when
reading an inval record) and slower (more wal records).  I can easily
develop a patch for that, the question is what we do on the back
branches...

Greetings,

Andres Freund



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Alvaro Herrera
Date:
Andres Freund wrote:
> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> > I think the bottom line is that we misdesigned the WAL representation
> > by assuming that this sort of info could always be piggybacked on a
> > transaction commit record.  It's time to fix that.
> 
> I think we got to piggyback it onto a commit record, as long as there's
> one. Otherwise it's going to be more complex (queuing messages when
> reading an inval record) and slower (more wal records).  I can easily
> develop a patch for that, the question is what we do on the back
> branches...

We have introduced new wal records in back branches previously --
nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).  The user
just needs to make sure to upgrade the standbys first.  If they don't,
they would die upon replay of the first such record, which they can take
as an indication that they need to be upgraded; the standby is down for
some time, but there is no data loss or corruption.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Tom Lane
Date:
Andres Freund <andres@anarazel.de> writes:
> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> I think the bottom line is that we misdesigned the WAL representation
>> by assuming that this sort of info could always be piggybacked on a
>> transaction commit record.  It's time to fix that.

> I think we got to piggyback it onto a commit record, as long as there's
> one.

No objection to that part.  What I'm saying is that when there isn't one,
the answer is a new record type, not forcing xid assignment.  It might
look almost like a commit record, but it shouldn't imply that there
was a transaction.
        regards, tom lane



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Michael Paquier
Date:
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> > I think the bottom line is that we misdesigned the WAL representation
>> > by assuming that this sort of info could always be piggybacked on a
>> > transaction commit record.  It's time to fix that.
>>
>> I think we got to piggyback it onto a commit record, as long as there's
>> one. Otherwise it's going to be more complex (queuing messages when
>> reading an inval record) and slower (more wal records).  I can easily
>> develop a patch for that, the question is what we do on the back
>> branches...
>
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).  The user
> just needs to make sure to upgrade the standbys first.  If they don't,
> they would die upon replay of the first such record, which they can take
> as an indication that they need to be upgraded; the standby is down for
> some time, but there is no data loss or corruption.

Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.
-- 
Michael



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Simon Riggs
Date:
On 18 April 2016 at 12:43, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> > I think the bottom line is that we misdesigned the WAL representation
>> > by assuming that this sort of info could always be piggybacked on a
>> > transaction commit record.  It's time to fix that.
>>
>> I think we got to piggyback it onto a commit record, as long as there's
>> one. Otherwise it's going to be more complex (queuing messages when
>> reading an inval record) and slower (more wal records).  I can easily
>> develop a patch for that, the question is what we do on the back
>> branches...
>
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).  The user
> just needs to make sure to upgrade the standbys first.  If they don't,
> they would die upon replay of the first such record, which they can take
> as an indication that they need to be upgraded; the standby is down for
> some time, but there is no data loss or corruption.

Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.

(non-reply just because of travel)

OK, I'll write up a patch today to fix, with a view to backpatching.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-15 17:37:03 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> > > I think the bottom line is that we misdesigned the WAL representation
> > > by assuming that this sort of info could always be piggybacked on a
> > > transaction commit record.  It's time to fix that.
> > 
> > I think we got to piggyback it onto a commit record, as long as there's
> > one. Otherwise it's going to be more complex (queuing messages when
> > reading an inval record) and slower (more wal records).  I can easily
> > develop a patch for that, the question is what we do on the back
> > branches...
> 
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).

Yea, I remember ;). We made that decision because we couldn't find
another way, and because the consequences were pretty grave.


> The user just needs to make sure to upgrade the standbys first.  If
> they don't, they would die upon replay of the first such record, which
> they can take as an indication that they need to be upgraded; the
> standby is down for some time, but there is no data loss or
> corruption.

There could, if they're using wal_keep_segments, and the standby cannot
be caught up anymore.


I think it's still worth to go for the new record type, but it's a
pretty close call. We could also just decide to document the issues :/ -
but I'm not sure we're eing all of them yet.

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-18 20:43:30 +0900, Michael Paquier wrote:
> Yeah, introducing a new WAL record to address this issue in
> back-branches would not be an issue, and that's what we should do. For
> HEAD, let's add that in the commit record.

I'm not sure why/how you'd do it that way in HEAD? I mean the only
reason not to use a separate record is the standby incompatibility.

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-18 13:15:19 +0100, Simon Riggs wrote:
> OK, I'll write up a patch today to fix, with a view to backpatching.

Cool. I've spent a couple minutes on this yesterday, and have the start
of a patch - do you want that?

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-15 16:53:37 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
> >> I think the bottom line is that we misdesigned the WAL representation
> >> by assuming that this sort of info could always be piggybacked on a
> >> transaction commit record.  It's time to fix that.
>
> > I think we got to piggyback it onto a commit record, as long as there's
> > one.
>
> No objection to that part.  What I'm saying is that when there isn't one,
> the answer is a new record type, not forcing xid assignment.  It might
> look almost like a commit record, but it shouldn't imply that there
> was a transaction.

Here's a patch doing so.  Note that, after putting the record into RM_XACT_ID
first, I've decided to make it a RM_STANDBY_ID type record. I think it's
likely that this is going to be needed beyond transaction commits, and
it generally seems to fit better into standby.c; even if it's a bit
awkward that commit records contain similar data. To avoid duplicating
the *desc.c code, I've exposed standby_desc_invalidations() to also be
used by xactdesc.c.

It fixes the problem at hand, but essentially it's just luck that the
patch is sufficient. The first layer of the issue is that queued
invalidation messages aren't sent; but for vm_extend() there's another
side to it. Namely vm_extend() does

    /*
     * Send a shared-inval message to force other backends to close any smgr
     * references they may have for this rel, which we are about to change.
     * This is a useful optimization because it means that backends don't have
     * to keep checking for creation or extension of the file, which happens
     * infrequently.
     */
    CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);

but CacheInvalidateSmgr is non-transactional as it's comment explains:
 *
 * Note: because these messages are nontransactional, they won't be captured
 * in commit/abort WAL entries.  Instead, calls to CacheInvalidateSmgr()
 * should happen in low-level smgr.c routines, which are executed while
 * replaying WAL as well as when creating it.
 *

as far as I can see vm_extend() is the only current caller forgetting
that rule. I don't think it's safe to use CacheInvalidateSmgr() outside
smgr.c.

The reason this all ends up working nonetheless is that the
heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
which queues a relcache invalidation, which in turn does a
RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
doing a transactional CacheInvalidateSmgr().  But that seems more than
fragile.

ISTM we should additionally replace the CacheInvalidateSmgr() with a
CacheInvalidateRelcache() and document that that implies an smgr
invalidation.  Alternatively we could log smgr (and relmapper)
invalidations as well, but that's not quite non-invasive either; but
might be a good long-term idea to keep things simpler.

Comments?

- Andres

Attachment

Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Simon Riggs
Date:
On 18 April 2016 at 13:15, Simon Riggs <simon@2ndquadrant.com> wrote:
On 18 April 2016 at 12:43, Michael Paquier <michael.paquier@gmail.com> wrote:
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> > I think the bottom line is that we misdesigned the WAL representation
>> > by assuming that this sort of info could always be piggybacked on a
>> > transaction commit record.  It's time to fix that.
>>
>> I think we got to piggyback it onto a commit record, as long as there's
>> one. Otherwise it's going to be more complex (queuing messages when
>> reading an inval record) and slower (more wal records).  I can easily
>> develop a patch for that, the question is what we do on the back
>> branches...
>
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e).  The user
> just needs to make sure to upgrade the standbys first.  If they don't,
> they would die upon replay of the first such record, which they can take
> as an indication that they need to be upgraded; the standby is down for
> some time, but there is no data loss or corruption.

Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.

(non-reply just because of travel)

OK, I'll write up a patch today to fix, with a view to backpatching.

Patch from Tuesday.  On various planes.

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment

Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-24 22:10:19 +0100, Simon Riggs wrote:
> On 18 April 2016 at 13:15, Simon Riggs <simon@2ndquadrant.com> wrote:
> > (non-reply just because of travel)
> >
> > OK, I'll write up a patch today to fix, with a view to backpatching.
> >
> 
> Patch from Tuesday.  On various planes.

I posted a version of this at
http://archives.postgresql.org/message-id/20160424025117.2cmf6ku4ldfcoo44%40alap3.anarazel.de
(because it was blocking me from fixing a newer bug)

I think my approach of a separate record is going to be easier to
backpatch - the new commit record format you took advantage of is
new. If we go your way, this also needs pg_xlogdump/xact_desc() &
logical decoding integration.

Greetings,

Andres Freund



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
Dean, Robert,

Afaics the problem described below was introduced in b4e07417, do you
have a different/better proposal than
s/CacheInvalidateSmgr/CacheInvalidateRelcache/? Doing that doesn't feel
quite right either, because it only makes the file extension visible at
end-of-xact - which is mostly harmless, but still.

On 2016-04-23 19:51:17 -0700, Andres Freund wrote:
> It fixes the problem at hand, but essentially it's just luck that the
> patch is sufficient. The first layer of the issue is that queued
> invalidation messages aren't sent; but for vm_extend() there's another
> side to it. Namely vm_extend() does
> 
>     /*
>      * Send a shared-inval message to force other backends to close any smgr
>      * references they may have for this rel, which we are about to change.
>      * This is a useful optimization because it means that backends don't have
>      * to keep checking for creation or extension of the file, which happens
>      * infrequently.
>      */
>     CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode);
> 
> but CacheInvalidateSmgr is non-transactional as it's comment explains:
>  *
>  * Note: because these messages are nontransactional, they won't be captured
>  * in commit/abort WAL entries.  Instead, calls to CacheInvalidateSmgr()
>  * should happen in low-level smgr.c routines, which are executed while
>  * replaying WAL as well as when creating it.
>  *
> 
> as far as I can see vm_extend() is the only current caller forgetting
> that rule. I don't think it's safe to use CacheInvalidateSmgr() outside
> smgr.c.
> 
> The reason this all ends up working nonetheless is that the
> heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
> which queues a relcache invalidation, which in turn does a
> RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
> doing a transactional CacheInvalidateSmgr().  But that seems more than
> fragile.
> 
> ISTM we should additionally replace the CacheInvalidateSmgr() with a
> CacheInvalidateRelcache() and document that that implies an smgr
> invalidation.  Alternatively we could log smgr (and relmapper)
> invalidations as well, but that's not quite non-invasive either; but
> might be a good long-term idea to keep things simpler.

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Michael Paquier
Date:
On Tue, Apr 26, 2016 at 1:13 AM, Andres Freund <andres@anarazel.de> wrote:
> I think my approach of a separate record is going to be easier to
> backpatch - the new commit record format you took advantage of is
> new.

Sorry for the late reply.

After reading both patches yesterday, I found your approach with
standby.c to be neater.
-- 
Michael



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Michael Paquier
Date:
On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
> Here's a patch doing so.  Note that, after putting the record into RM_XACT_ID
> first, I've decided to make it a RM_STANDBY_ID type record. I think it's
> likely that this is going to be needed beyond transaction commits, and
> it generally seems to fit better into standby.c; even if it's a bit
> awkward that commit records contain similar data. To avoid duplicating
> the *desc.c code, I've exposed standby_desc_invalidations() to also be
> used by xactdesc.c.

I have been eyeballing this patch for some time since yesterday and
did some tests, and that's really neat. Having the invalidation stuff
in standby.c makes quite some sense as well.

+        * wanting to emit more WAL recoreds).
s/recoreds/records/

> The reason this all ends up working nonetheless is that the
> heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple()
> which queues a relcache invalidation, which in turn does a
> RelationCloseSmgr() in RelationClearRelation(). Thereby effectively
> doing a transactional CacheInvalidateSmgr().  But that seems more than
> fragile.

You have spent quite some time on that. As things stand that's indeed
a bit fragile..

> ISTM we should additionally replace the CacheInvalidateSmgr() with a
> CacheInvalidateRelcache() and document that that implies an smgr
> invalidation.  Alternatively we could log smgr (and relmapper)
> invalidations as well, but that's not quite non-invasive either; but
> might be a good long-term idea to keep things simpler.
>
> Comments?

Yeah, this looks like a good idea at the end. As the invalidation
patch is aimed at being backpatched, this may be something to do as
well in back-branches.
-- 
Michael



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
Hi,

Thanks for looking into this.

On 2016-04-26 11:43:06 +0900, Michael Paquier wrote:
> On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
> > ISTM we should additionally replace the CacheInvalidateSmgr() with a
> > CacheInvalidateRelcache() and document that that implies an smgr
> > invalidation.  Alternatively we could log smgr (and relmapper)
> > invalidations as well, but that's not quite non-invasive either; but
> > might be a good long-term idea to keep things simpler.
> >
> > Comments?
> 
> Yeah, this looks like a good idea at the end.

You mean the bit about making smgr invalidations logged?

> As the invalidation patch is aimed at being backpatched, this may be
> something to do as well in back-branches.

I'm a bit split here. I think forcing processing of invalidations at
moments they've previously never been processed is a bit risky for the
back branches. But on the other hand relcache invalidations are only
processed at end-of-xact, which isn't really correct for the code at
hand :/

Greetings,

Andres Freund



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Michael Paquier
Date:
On Tue, Apr 26, 2016 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote:
> Thanks for looking into this.
>
> On 2016-04-26 11:43:06 +0900, Michael Paquier wrote:
>> On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
>> > ISTM we should additionally replace the CacheInvalidateSmgr() with a
>> > CacheInvalidateRelcache() and document that that implies an smgr
>> > invalidation.  Alternatively we could log smgr (and relmapper)
>> > invalidations as well, but that's not quite non-invasive either; but
>> > might be a good long-term idea to keep things simpler.
>> >
>> > Comments?
>>
>> Yeah, this looks like a good idea at the end.
>
> You mean the bit about making smgr invalidations logged?

Sorry for the lack of precision in my words. I was referring to your
idea of replacing CacheInvalidateSmgr() by CacheInvalidateRelcache()
sounds like a good option as this would make the invalidation to be
logged at commit. Thinking about the logging of smgr invalidations,
this is quite interesting. But what would we actually gain in doing
that? Do you foresee any advantages in doing so? The only case where
this would be useful now is for vm_extend by looking at the code.

>> As the invalidation patch is aimed at being backpatched, this may be
>> something to do as well in back-branches.
>
> I'm a bit split here. I think forcing processing of invalidations at
> moments they've previously never been processed is a bit risky for the
> back branches. But on the other hand relcache invalidations are only
> processed at end-of-xact, which isn't really correct for the code at
> hand :/

Oh, OK. So you mean that this patch is not aimed for back-branches
with this new record type, but that's only for HEAD.. To be honest, I
thought that we had better address this issue on back-branches with a
patch close to what you are proposing (which is more or less what
Simon has actually sent), and keep the change of CacheInvalidateSmgr()
in visibilitymap.c to be HEAD-only.
-- 
Michael



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Michael Paquier
Date:
On Tue, Apr 26, 2016 at 12:39 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Tue, Apr 26, 2016 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote:
>> Thanks for looking into this.
>>
>> On 2016-04-26 11:43:06 +0900, Michael Paquier wrote:
>>> On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
>>> > ISTM we should additionally replace the CacheInvalidateSmgr() with a
>>> > CacheInvalidateRelcache() and document that that implies an smgr
>>> > invalidation.  Alternatively we could log smgr (and relmapper)
>>> > invalidations as well, but that's not quite non-invasive either; but
>>> > might be a good long-term idea to keep things simpler.
>>> >
>>> > Comments?
>>>
>>> Yeah, this looks like a good idea at the end.
>>
>> You mean the bit about making smgr invalidations logged?
>
> Sorry for the lack of precision in my words. I was referring to your
> idea of replacing CacheInvalidateSmgr() by CacheInvalidateRelcache()
> sounds like a good option as this would make the invalidation to be
> logged at commit. Thinking about the logging of smgr invalidations,
> this is quite interesting. But what would we actually gain in doing
> that? Do you foresee any advantages in doing so? The only case where
> this would be useful now is for vm_extend by looking at the code.
>
>>> As the invalidation patch is aimed at being backpatched, this may be
>>> something to do as well in back-branches.
>>
>> I'm a bit split here. I think forcing processing of invalidations at
>> moments they've previously never been processed is a bit risky for the
>> back branches. But on the other hand relcache invalidations are only
>> processed at end-of-xact, which isn't really correct for the code at
>> hand :/
>
> Oh, OK. So you mean that this patch is not aimed for back-branches
> with this new record type, but that's only for HEAD.. To be honest, I
> thought that we had better address this issue on back-branches with a
> patch close to what you are proposing (which is more or less what
> Simon has actually sent), and keep the change of CacheInvalidateSmgr()
> in visibilitymap.c to be HEAD-only.

Ah, and actually as I'm on it from your previous patch there is this bit:
+       else if (msg->id == SHAREDINVALRELMAP_ID)
+           appendStringInfoString(buf, " relmap");
+       else if (msg->id == SHAREDINVALRELMAP_ID)
+           appendStringInfo(buf, " relmap db %u", msg->rm.dbId);
You surely want to check for !OidIsValid(msg->rm.dbId) in the first one.
-- 
Michael



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-26 12:39:37 +0900, Michael Paquier wrote:
> Thinking about the logging of smgr invalidations, this is quite
> interesting. But what would we actually gain in doing that? Do you
> foresee any advantages in doing so? The only case where this would be
> useful now is for vm_extend by looking at the code.

Well, it'd make vm_extend actually correct, which replacing the
invalidation with a relcache one would not. Relcache invalidations are
transactional, whereas smgr ones are not (intentionally so!). I don't
think it's currently a big problem, but it does make me rather wary.

> >> As the invalidation patch is aimed at being backpatched, this may be
> >> something to do as well in back-branches.
> >
> > I'm a bit split here. I think forcing processing of invalidations at
> > moments they've previously never been processed is a bit risky for the
> > back branches. But on the other hand relcache invalidations are only
> > processed at end-of-xact, which isn't really correct for the code at
> > hand :/
> 
> Oh, OK. So you mean that this patch is not aimed for back-branches
> with this new record type, but that's only for HEAD.

No, I think we got to do this in all branches. I was just wondering
about how to fix vm_extend(). Which I do think we got to fix, even in
the back-branches.

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Dean Rasheed
Date:
On 26 April 2016 at 04:48, Andres Freund <andres@anarazel.de> wrote:
> No, I think we got to do this in all branches. I was just wondering
> about how to fix vm_extend(). Which I do think we got to fix, even in
> the back-branches.
>

I think replacing CacheInvalidateSmgr() with CacheInvalidateRelcache()
in vm_extend() is probably the safer thing to do, and ought to be
relatively harmless.

It means that an index-only scan won't be notified of VM extension
until the end of the other transaction, which might lead to extra heap
fetches, but I think that's unlikely to have any performance impact
because it ought to be a fairly rare event, and if it was another
transaction adding tuples, they wouldn't be all visible before it was
committed anyway, so the extra heap fetches would be required in any
case.

Regards,
Dean



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Robert Haas
Date:
On Mon, Apr 25, 2016 at 2:10 PM, Andres Freund <andres@anarazel.de> wrote:
> Afaics the problem described below was introduced in b4e07417, do you
> have a different/better proposal than
> s/CacheInvalidateSmgr/CacheInvalidateRelcache/? Doing that doesn't feel
> quite right either, because it only makes the file extension visible at
> end-of-xact - which is mostly harmless, but still.

Maybe I'm all wet here, but it seems to me like this is a bug in
heap_xlog_visible().  If the relation has been extended on the master
but no bits are set, then it doesn't really matter whether any
invalidation happens on the standby.  On the other hand, if the
relation has been extended on the master and a bit has actually been
set on the new page, then there should be an XLOG_HEAP2_VISIBLE record
that gets replayed on the standby.  And replay of that record should
call visibilitymap_pin(), which should call vm_readbuf(), which should
call vm_extend(), which should issue the same smgr invalidation on the
standby that got issued on the master.  So no problem!  However, in
the FPI case, heap_xlog_visible() optimizes all of that away, so the
invalidation doesn't get issued.  Oops.  But we could cure that just
by having heap_xlog_visible() still call CacheInvalidateSmgr() even in
that case, and then we'd be actually be following the rule that
non-transactional invalidations should be issued on the standby even
while in replay.  That seems cleaner to me than switching to a
transactional invalidation, which isn't really the right thing anyway.

In other words, I think Masahiko Sawada's patch in the original post
is pretty much right on target, except that we don't need to do that
always, but rather only in the FPI case when the call to
visibilitymap_pin() is being optimized away.  If we solve the problem
that way, I don't think we even need a new WAL record for this case,
which is a non-trivial fringe benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Michael Paquier
Date:
On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> In other words, I think Masahiko Sawada's patch in the original post
> is pretty much right on target, except that we don't need to do that
> always, but rather only in the FPI case when the call to
> visibilitymap_pin() is being optimized away.  If we solve the problem
> that way, I don't think we even need a new WAL record for this case,
> which is a non-trivial fringe benefit.

The visibility map is not the only thing that need to be addressed,
no? For example take this report from Dmitry Vasilyev of a couple of
months back where index relations are not visible on a standby:
http://www.postgresql.org/message-id/CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com
This is really leading to a solution where we need to take a more
general approach to this problem instead of trying to patch multiple
WAL replay code paths. And Andres' stuff does so.
-- 
Michael



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-27 14:27:33 +0900, Michael Paquier wrote:
> On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> > In other words, I think Masahiko Sawada's patch in the original post
> > is pretty much right on target, except that we don't need to do that
> > always, but rather only in the FPI case when the call to
> > visibilitymap_pin() is being optimized away.  If we solve the problem
> > that way, I don't think we even need a new WAL record for this case,
> > which is a non-trivial fringe benefit.
> 
> The visibility map is not the only thing that need to be addressed,
> no?

If I understand Robert correctly his point is about fixing the smgr
inval alone - without the invalidation fix that'd not be enough because
the relcache info with outdated information (particularly relallvisible
et al), would continue to be valid. Relcache invalidations imply an smgr
one, but not the other way round.

The reason the fix for nmsg > 0 && !markXidCommitted isn't entirely
sufficient is because the smgr invalidation isn't transaction bound,
i.e. sent out immediately. So, to have the same behaviour on master/HS,
we need a way to send out the invalidiation properly in lockstep with
replay.

- Andres



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Robert Haas
Date:
On Wed, Apr 27, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-27 14:27:33 +0900, Michael Paquier wrote:
>> On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> > In other words, I think Masahiko Sawada's patch in the original post
>> > is pretty much right on target, except that we don't need to do that
>> > always, but rather only in the FPI case when the call to
>> > visibilitymap_pin() is being optimized away.  If we solve the problem
>> > that way, I don't think we even need a new WAL record for this case,
>> > which is a non-trivial fringe benefit.
>>
>> The visibility map is not the only thing that need to be addressed,
>> no?
>
> If I understand Robert correctly his point is about fixing the smgr
> inval alone - without the invalidation fix that'd not be enough because
> the relcache info with outdated information (particularly relallvisible
> et al), would continue to be valid. Relcache invalidations imply an smgr
> one, but not the other way round.
>
> The reason the fix for nmsg > 0 && !markXidCommitted isn't entirely
> sufficient is because the smgr invalidation isn't transaction bound,
> i.e. sent out immediately. So, to have the same behaviour on master/HS,
> we need a way to send out the invalidiation properly in lockstep with
> replay.

What I'm confused about here is:

Masahiko Sawada posted a patch that fixes the problem for him, which
does not involve any new WAL record type.  It also seems to be fixing
the problem in a way that is clean and consistent with what we've done
elsewhere.

The patch actually under discussion here manages to introduce a new
WAL record type without fixing that problem.

Therefore I include that the committed patch fixes some *other*
problem, not the one that this thread is ostensibly about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Andres Freund
Date:
On 2016-04-27 11:59:39 -0400, Robert Haas wrote:
> Masahiko Sawada posted a patch that fixes the problem for him, which
> does not involve any new WAL record type.  It also seems to be fixing
> the problem in a way that is clean and consistent with what we've done
> elsewhere.

It only fixes one symptom, the relcache entry is still wrong
afterwards. Which is pretty relevant for planning.


> The patch actually under discussion here manages to introduce a new
> WAL record type without fixing that problem.

It does fix the problem, just not in a super robust way. Which is why I
think we should add something like Masahiko's fix additionally.

Greetings,

Andres Freund



Re: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.

From
Robert Haas
Date:
On Wed, Apr 27, 2016 at 12:03 PM, Andres Freund <andres@anarazel.de> wrote:
> On 2016-04-27 11:59:39 -0400, Robert Haas wrote:
>> Masahiko Sawada posted a patch that fixes the problem for him, which
>> does not involve any new WAL record type.  It also seems to be fixing
>> the problem in a way that is clean and consistent with what we've done
>> elsewhere.
>
> It only fixes one symptom, the relcache entry is still wrong
> afterwards. Which is pretty relevant for planning.
>
>
>> The patch actually under discussion here manages to introduce a new
>> WAL record type without fixing that problem.
>
> It does fix the problem, just not in a super robust way. Which is why I
> think we should add something like Masahiko's fix additionally.

OK, that works for me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company