Thread: Suspicious behaviour on applying XLOG_HEAP2_VISIBLE.
Hi all, I faced suspicious behaviour on hot standby server related to visibility map. The scenario is, 1. Create table and check internal of visibility map on master server. postgres(1)=# create table hoge (col int); CREATE TABLE postgres(1)=# insert into hoge select generate_series(1,10); INSERT 0 10 postgres(1)=# select * from pg_visibility('hoge'); blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | f | f | f (1 row) 2. Check internal of visibility map on standby server. postgres(1)=# select * from pg_visibility('hoge'); blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | f | f | f (1 row) 3. Do VACUUM on master server. postgres(1)=# vacuum hoge; VACUUM postgres(1)=# select * from pg_visibility('hoge'); blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | t | f | t (1 row) 4. Check internal of visibility map on standby server again. ** Note that the we use the connection we established at #2 again ** postgres(1)=# select * from pg_visibility('hoge'); blkno | all_visible | all_frozen | pd_all_visible -------+-------------+------------+---------------- 0 | f | f | t (1 row) Even on standby server, the result should be (t,f,t), but returned (f,f,t) (XLOG_HEAP2_VISIBLE record actually has been reached to standby, and has been surely applied) As a result of looked into code around the recvoery, ISTM that the cause is related to relation cache clear. In heap_xlog_visible, if the standby server receives WAL record then relation cache is eventually cleared in vm_extend, but If standby server receives FPI then relation cache would not be cleared. For example, after I applied attached patch to HEAD, (it might not be right way but) this problem seems to be resolved. Is this a bug? or not? Regards, -- Masahiko Sawada
Attachment
On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote: > As a result of looked into code around the recvoery, ISTM that the > cause is related to relation cache clear. > In heap_xlog_visible, if the standby server receives WAL record then > relation cache is eventually cleared in vm_extend, but If standby > server receives FPI then relation cache would not be cleared. > For example, after I applied attached patch to HEAD, (it might not be > right way but) this problem seems to be resolved. > > Is this a bug? or not? It's a bug. I don't expect it causes queries to return wrong answers, because visibilitymap.c says "it's always safe to clear a bit in the map from correctness point of view." (The bug makes a visibility map bit temporarily appear to have been cleared.) I still call it a bug, because recovery behavior becomes too difficult to verify when xlog replay produces conditions that don't happen outside of recovery. Even if there's no way to get a wrong query answer today, this would be too easy to break later. I wonder if we make the same omission in other xlog replay functions. Similar omissions may cause wrong query answers, even if this particular one does not. Would you like to bisect for the commit, or at least the major release, at which the bug first appeared? I wonder if your discovery has any relationship to this recently-reported case of insufficient smgr invalidation: http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com Thanks, nm
On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote: > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote: >> As a result of looked into code around the recvoery, ISTM that the >> cause is related to relation cache clear. >> In heap_xlog_visible, if the standby server receives WAL record then >> relation cache is eventually cleared in vm_extend, but If standby >> server receives FPI then relation cache would not be cleared. >> For example, after I applied attached patch to HEAD, (it might not be >> right way but) this problem seems to be resolved. >> >> Is this a bug? or not? > > It's a bug. I don't expect it causes queries to return wrong answers, because > visibilitymap.c says "it's always safe to clear a bit in the map from > correctness point of view." (The bug makes a visibility map bit temporarily > appear to have been cleared.) I still call it a bug, because recovery > behavior becomes too difficult to verify when xlog replay produces conditions > that don't happen outside of recovery. Even if there's no way to get a wrong > query answer today, this would be too easy to break later. I wonder if we > make the same omission in other xlog replay functions. Similar omissions may > cause wrong query answers, even if this particular one does not. > > Would you like to bisect for the commit, or at least the major release, at > which the bug first appeared? > > I wonder if your discovery has any relationship to this recently-reported case > of insufficient smgr invalidation: > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com > I'm not sure this bug has relationship to another issue you mentioned but after further investigation, this bug seems to be reproduced even on more older version. At least I reproduced it at 9.0.0. Regards, -- Masahiko Sawada
On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote: > On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote: > > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote: > >> As a result of looked into code around the recvoery, ISTM that the > >> cause is related to relation cache clear. > >> In heap_xlog_visible, if the standby server receives WAL record then > >> relation cache is eventually cleared in vm_extend, but If standby > >> server receives FPI then relation cache would not be cleared. > >> For example, after I applied attached patch to HEAD, (it might not be > >> right way but) this problem seems to be resolved. > >> > >> Is this a bug? or not? > > > > It's a bug. I don't expect it causes queries to return wrong answers, because > > visibilitymap.c says "it's always safe to clear a bit in the map from > > correctness point of view." (The bug makes a visibility map bit temporarily > > appear to have been cleared.) I still call it a bug, because recovery > > behavior becomes too difficult to verify when xlog replay produces conditions > > that don't happen outside of recovery. Even if there's no way to get a wrong > > query answer today, this would be too easy to break later. I wonder if we > > make the same omission in other xlog replay functions. Similar omissions may > > cause wrong query answers, even if this particular one does not. > > > > Would you like to bisect for the commit, or at least the major release, at > > which the bug first appeared? > > > > I wonder if your discovery has any relationship to this recently-reported case > > of insufficient smgr invalidation: > > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com > > > > I'm not sure this bug has relationship to another issue you mentioned > but after further investigation, this bug seems to be reproduced even > on more older version. > At least I reproduced it at 9.0.0. Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and had no correctness implications until 9.2. If 9.2 behaves this way, it's almost certainly not a recent regression.
On Fri, Apr 1, 2016 at 9:10 AM, Noah Misch <noah@leadboat.com> wrote: > On Thu, Mar 31, 2016 at 04:48:26PM +0900, Masahiko Sawada wrote: >> On Thu, Mar 31, 2016 at 2:02 PM, Noah Misch <noah@leadboat.com> wrote: >> > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote: >> >> As a result of looked into code around the recvoery, ISTM that the >> >> cause is related to relation cache clear. >> >> In heap_xlog_visible, if the standby server receives WAL record then >> >> relation cache is eventually cleared in vm_extend, but If standby >> >> server receives FPI then relation cache would not be cleared. >> >> For example, after I applied attached patch to HEAD, (it might not be >> >> right way but) this problem seems to be resolved. >> >> >> >> Is this a bug? or not? >> > >> > It's a bug. I don't expect it causes queries to return wrong answers, because >> > visibilitymap.c says "it's always safe to clear a bit in the map from >> > correctness point of view." (The bug makes a visibility map bit temporarily >> > appear to have been cleared.) I still call it a bug, because recovery >> > behavior becomes too difficult to verify when xlog replay produces conditions >> > that don't happen outside of recovery. Even if there's no way to get a wrong >> > query answer today, this would be too easy to break later. I wonder if we >> > make the same omission in other xlog replay functions. Similar omissions may >> > cause wrong query answers, even if this particular one does not. >> > >> > Would you like to bisect for the commit, or at least the major release, at >> > which the bug first appeared? >> > >> > I wonder if your discovery has any relationship to this recently-reported case >> > of insufficient smgr invalidation: >> > http://www.postgresql.org/message-id/flat/CAB7nPqSBFmh5cQjpRbFBp9Rkv1nF=Nh2o1FxKkJ6yvOBtvYDBA@mail.gmail.com >> > >> >> I'm not sure this bug has relationship to another issue you mentioned >> but after further investigation, this bug seems to be reproduced even >> on more older version. >> At least I reproduced it at 9.0.0. > > Would you try PostgreSQL 9.2.16? The visibility map was not crash safe and > had no correctness implications until 9.2. If 9.2 behaves this way, it's > almost certainly not a recent regression. Yeah, I reproduced it on 9.2.0 and 9.2.16, it's not recent regression. The commit is 503c7305a1e379f95649eef1a694d0c1dbdc674a which introduces crash-safe visibility map. Regards, -- Masahiko Sawada
On 2016-03-31 01:02:06 -0400, Noah Misch wrote: > On Thu, Mar 10, 2016 at 01:04:11AM +0900, Masahiko Sawada wrote: > > As a result of looked into code around the recvoery, ISTM that the > > cause is related to relation cache clear. > > In heap_xlog_visible, if the standby server receives WAL record then > > relation cache is eventually cleared in vm_extend, but If standby > > server receives FPI then relation cache would not be cleared. > > For example, after I applied attached patch to HEAD, (it might not be > > right way but) this problem seems to be resolved. > > > > Is this a bug? or not? > > It's a bug. I agree it's a bug. If I understand correctly it's not really visibilitymap related though: Debugging shows that vm_extend() (on the master) correctly issues a CacheInvalidateSmgr(), which ought to clear the smgr state on the standby. But replay doesn't do anything like that. That kind of sounded familiar. The issue is that vacuum doesn't assign an xid and thus RecordTransactionCommit() doesn't emit a commit record, which in turn means invalidation messages aren't sent to the standby. Ugh. We've recently discussed a very similar issue around http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de Unfortunately Simon over in that thread disagreed there about fixing this by always emitting a commit record when nmsgs > 0 in RecordTransactionCommit(). I think this thread is a pretty strong hint that we actually should do so. - Andres
On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote: > We've recently discussed a very similar issue around > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de > > Unfortunately Simon over in that thread disagreed there about fixing > this by always emitting a commit record when nmsgs > 0 in > RecordTransactionCommit(). I think this thread is a pretty strong hint > that we actually should do so. Yes. I'm pretty confident that you had the right idea there, and that Simon's objection was off-base. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-14 11:50:58 -0400, Robert Haas wrote: > On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote: > > We've recently discussed a very similar issue around > > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de > > > > Unfortunately Simon over in that thread disagreed there about fixing > > this by always emitting a commit record when nmsgs > 0 in > > RecordTransactionCommit(). I think this thread is a pretty strong hint > > that we actually should do so. > > Yes. I'm pretty confident that you had the right idea there, and that > Simon's objection was off-base. The easiest way to achieve that seems to be to just assign an xid if that's the case; while it's not necessarily safe/efficient to do so at the point the invalidation message was queued, I think it should be safe to do so at commit time. Seems less invasive to backpatch than to either support commit records without xids, or a separate record just transporting invalidation messages. Although a separate record with invalidations is going to be needed to support logical decoding of running xacts anyway... - Andres
On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-14 11:50:58 -0400, Robert Haas wrote: >> On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote: >> > We've recently discussed a very similar issue around >> > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de >> > >> > Unfortunately Simon over in that thread disagreed there about fixing >> > this by always emitting a commit record when nmsgs > 0 in >> > RecordTransactionCommit(). I think this thread is a pretty strong hint >> > that we actually should do so. >> >> Yes. I'm pretty confident that you had the right idea there, and that >> Simon's objection was off-base. > > The easiest way to achieve that seems to be to just assign an xid if > that's the case; while it's not necessarily safe/efficient to do so at > the point the invalidation message was queued, I think it should be safe > to do so at commit time. Seems less invasive to backpatch than to either > support commit records without xids, or a separate record just > transporting invalidation messages. I agree that's better for back-patching. I hope it won't suck performance-wise. In master, we might think of inventing something new. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-15 13:07:19 -0400, Robert Haas wrote: > On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote: > > On 2016-04-14 11:50:58 -0400, Robert Haas wrote: > >> On Wed, Apr 13, 2016 at 9:58 PM, Andres Freund <andres@anarazel.de> wrote: > >> > We've recently discussed a very similar issue around > >> > http://www.postgresql.org/message-id/20160227002958.peftvmcx4dxwe244@alap3.anarazel.de > >> > > >> > Unfortunately Simon over in that thread disagreed there about fixing > >> > this by always emitting a commit record when nmsgs > 0 in > >> > RecordTransactionCommit(). I think this thread is a pretty strong hint > >> > that we actually should do so. > >> > >> Yes. I'm pretty confident that you had the right idea there, and that > >> Simon's objection was off-base. > > > > The easiest way to achieve that seems to be to just assign an xid if > > that's the case; while it's not necessarily safe/efficient to do so at > > the point the invalidation message was queued, I think it should be safe > > to do so at commit time. Seems less invasive to backpatch than to either > > support commit records without xids, or a separate record just > > transporting invalidation messages. > > I agree that's better for back-patching. It's a bit ugly though, since we're at that stage pretty heavily assuming there's no xid assigned (on the caller level)... I've toyed with the idea of emitting a commit record that doesn't have an assigned xid, but that'd require changes on the apply side, which would be uglier than a new record type :( > I hope it won't suck performance-wise. I can't see that be the case, there's not many places where we send invalidation messages without an assigned xid. Running an instrumented build with an appropriate wal level reveals about a handfull places. Greetings, Andres Freund
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote: >> The easiest way to achieve that seems to be to just assign an xid if >> that's the case; while it's not necessarily safe/efficient to do so at >> the point the invalidation message was queued, I think it should be safe >> to do so at commit time. Seems less invasive to backpatch than to either >> support commit records without xids, or a separate record just >> transporting invalidation messages. > I agree that's better for back-patching. I hope it won't suck > performance-wise. In master, we might think of inventing something > new. I'm a little worried about whether this will break assumptions that vacuum doesn't have an XID. I don't immediately see how it would, but it seems a bit shaky. I find it hard to believe that the act of assigning an XID would add measurably to the cost of a vacuum, so Robert's performance concern doesn't sound very exciting. If this works, I think it's fine to adopt as a permanent solution. regards, tom lane
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote: >>> The easiest way to achieve that seems to be to just assign an xid if >>> that's the case; while it's not necessarily safe/efficient to do so at >>> the point the invalidation message was queued, I think it should be safe >>> to do so at commit time. Seems less invasive to backpatch than to either >>> support commit records without xids, or a separate record just >>> transporting invalidation messages. > >> I agree that's better for back-patching. I hope it won't suck >> performance-wise. In master, we might think of inventing something >> new. > > I'm a little worried about whether this will break assumptions that > vacuum doesn't have an XID. I don't immediately see how it would, > but it seems a bit shaky. Actually, I think there's a bigger problem. If you manage to almost wrap around the XID space, and the cluster shuts down, you are guaranteed to be able to vacuum the whole cluster without actually running out of XIDs. Forcing an XID to be assigned here would make that uncertain - it would depend on how many tables you have versus how many XIDs you have left. That seems unacceptable to me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-15 13:44:27 -0400, Robert Haas wrote: > On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote: > >>> The easiest way to achieve that seems to be to just assign an xid if > >>> that's the case; while it's not necessarily safe/efficient to do so at > >>> the point the invalidation message was queued, I think it should be safe > >>> to do so at commit time. Seems less invasive to backpatch than to either > >>> support commit records without xids, or a separate record just > >>> transporting invalidation messages. > > > >> I agree that's better for back-patching. I hope it won't suck > >> performance-wise. In master, we might think of inventing something > >> new. > > > > I'm a little worried about whether this will break assumptions that > > vacuum doesn't have an XID. I don't immediately see how it would, > > but it seems a bit shaky. > > Actually, I think there's a bigger problem. If you manage to almost > wrap around the XID space, and the cluster shuts down, you are > guaranteed to be able to vacuum the whole cluster without actually > running out of XIDs. Currently you're unfortunately not, c.f. http://www.postgresql.org/message-id/CAPpHfdspOkmiQsxh-UZw2chM6dRMwXAJGEmmbmqYR=yvM7-s6A@mail.gmail.com > Forcing an XID to be assigned here would make > that uncertain - it would depend on how many tables you have versus > how many XIDs you have left. That seems unacceptable to me. But I agree that that's a concern. I'm kinda leaning towards introducing an invalidation WAL record for that case atm. It's quite possible to force an xid to be assigned in xact.c, but it's certainly not pretty (the least ugly way is to duplicate the xactGetCommittedInvalidationMessages() call, and force an xid to be assigned early in CommitTransaction(). Andres
On 15 April 2016 at 18:44, Robert Haas <robertmhaas@gmail.com> wrote:
--
On Fri, Apr 15, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Thu, Apr 14, 2016 at 12:11 PM, Andres Freund <andres@anarazel.de> wrote:
>>> The easiest way to achieve that seems to be to just assign an xid if
>>> that's the case; while it's not necessarily safe/efficient to do so at
>>> the point the invalidation message was queued, I think it should be safe
>>> to do so at commit time. Seems less invasive to backpatch than to either
>>> support commit records without xids, or a separate record just
>>> transporting invalidation messages.
>
>> I agree that's better for back-patching. I hope it won't suck
>> performance-wise. In master, we might think of inventing something
>> new.
>
> I'm a little worried about whether this will break assumptions that
> vacuum doesn't have an XID. I don't immediately see how it would,
> but it seems a bit shaky.
Actually, I think there's a bigger problem. If you manage to almost
wrap around the XID space, and the cluster shuts down, you are
guaranteed to be able to vacuum the whole cluster without actually
running out of XIDs. Forcing an XID to be assigned here would make
that uncertain - it would depend on how many tables you have versus
how many XIDs you have left. That seems unacceptable to me.
I agree this is a bug, similar to the last one.
For me, the issue is that we need to do something to catch bugs. The existing code does not have any test at all to check whether we are doing the wrong thing - it just lets the wrong thing happen.
Fixing it by forcing a new behaviour might be the right thing to do going forwards, but I don't much like the idea of forcing new behaviour in back branches. It might fix this bug, but can easily cause others.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-15 19:59:06 +0100, Simon Riggs wrote: > For me, the issue is that we need to do something to catch bugs. The > existing code does not have any test at all to check whether we are doing > the wrong thing - it just lets the wrong thing happen. But sending the message, without assigning an xid, *IS* the right thing to do here? We shouldn't assign an xid, and we need to send the message out to the standbys. > Fixing it by forcing a new behaviour might be the right thing to do going > forwards, but I don't much like the idea of forcing new behaviour in back > branches. It might fix this bug, but can easily cause others. What's your alternative? Assigning an xid in the middle of vacuum isn't ok, breaking vacuum isn't either? Andres
On 15 April 2016 at 20:01, Andres Freund <andres@anarazel.de> wrote:
--
On 2016-04-15 19:59:06 +0100, Simon Riggs wrote:
> For me, the issue is that we need to do something to catch bugs. The
> existing code does not have any test at all to check whether we are doing
> the wrong thing - it just lets the wrong thing happen.
But sending the message, without assigning an xid, *IS* the right thing
to do here? We shouldn't assign an xid, and we need to send the message
out to the standbys.
> Fixing it by forcing a new behaviour might be the right thing to do going
> forwards, but I don't much like the idea of forcing new behaviour in back
> branches. It might fix this bug, but can easily cause others.
What's your alternative? Assigning an xid in the middle of vacuum isn't
ok, breaking vacuum isn't either?
In my understanding we have two choices for this bug
1) assign an xid so it forces sending a message (message plus xid)
2) send a message without assigning an xid (message only)
(1) seems like it is worse for backpatching, IMHO, though I am willing to hear other thoughts or options
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > In my understanding we have two choices for this bug > 1) assign an xid so it forces sending a message (message plus xid) > 2) send a message without assigning an xid (message only) > (1) seems like it is worse for backpatching, IMHO, though I am willing to > hear other thoughts or options The problem with (1) is that it creates side-effects that could be bad; Robert's already pointed out one close-to-show-stopper consequence, and I have little confidence that there are not others. In general, if we got here without assigning an xid, there's a reason. I think the bottom line is that we misdesigned the WAL representation by assuming that this sort of info could always be piggybacked on a transaction commit record. It's time to fix that. regards, tom lane
On 2016-04-15 15:26:17 -0400, Tom Lane wrote: > I think the bottom line is that we misdesigned the WAL representation > by assuming that this sort of info could always be piggybacked on a > transaction commit record. It's time to fix that. I think we got to piggyback it onto a commit record, as long as there's one. Otherwise it's going to be more complex (queuing messages when reading an inval record) and slower (more wal records). I can easily develop a patch for that, the question is what we do on the back branches... Greetings, Andres Freund
Andres Freund wrote: > On 2016-04-15 15:26:17 -0400, Tom Lane wrote: > > I think the bottom line is that we misdesigned the WAL representation > > by assuming that this sort of info could always be piggybacked on a > > transaction commit record. It's time to fix that. > > I think we got to piggyback it onto a commit record, as long as there's > one. Otherwise it's going to be more complex (queuing messages when > reading an inval record) and slower (more wal records). I can easily > develop a patch for that, the question is what we do on the back > branches... We have introduced new wal records in back branches previously -- nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user just needs to make sure to upgrade the standbys first. If they don't, they would die upon replay of the first such record, which they can take as an indication that they need to be upgraded; the standby is down for some time, but there is no data loss or corruption. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Andres Freund <andres@anarazel.de> writes: > On 2016-04-15 15:26:17 -0400, Tom Lane wrote: >> I think the bottom line is that we misdesigned the WAL representation >> by assuming that this sort of info could always be piggybacked on a >> transaction commit record. It's time to fix that. > I think we got to piggyback it onto a commit record, as long as there's > one. No objection to that part. What I'm saying is that when there isn't one, the answer is a new record type, not forcing xid assignment. It might look almost like a commit record, but it shouldn't imply that there was a transaction. regards, tom lane
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Andres Freund wrote: >> On 2016-04-15 15:26:17 -0400, Tom Lane wrote: >> > I think the bottom line is that we misdesigned the WAL representation >> > by assuming that this sort of info could always be piggybacked on a >> > transaction commit record. It's time to fix that. >> >> I think we got to piggyback it onto a commit record, as long as there's >> one. Otherwise it's going to be more complex (queuing messages when >> reading an inval record) and slower (more wal records). I can easily >> develop a patch for that, the question is what we do on the back >> branches... > > We have introduced new wal records in back branches previously -- > nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user > just needs to make sure to upgrade the standbys first. If they don't, > they would die upon replay of the first such record, which they can take > as an indication that they need to be upgraded; the standby is down for > some time, but there is no data loss or corruption. Yeah, introducing a new WAL record to address this issue in back-branches would not be an issue, and that's what we should do. For HEAD, let's add that in the commit record. -- Michael
On 18 April 2016 at 12:43, Michael Paquier <michael.paquier@gmail.com> wrote:
--
On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> > I think the bottom line is that we misdesigned the WAL representation
>> > by assuming that this sort of info could always be piggybacked on a
>> > transaction commit record. It's time to fix that.
>>
>> I think we got to piggyback it onto a commit record, as long as there's
>> one. Otherwise it's going to be more complex (queuing messages when
>> reading an inval record) and slower (more wal records). I can easily
>> develop a patch for that, the question is what we do on the back
>> branches...
>
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user
> just needs to make sure to upgrade the standbys first. If they don't,
> they would die upon replay of the first such record, which they can take
> as an indication that they need to be upgraded; the standby is down for
> some time, but there is no data loss or corruption.
Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.
(non-reply just because of travel)
OK, I'll write up a patch today to fix, with a view to backpatching.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-04-15 17:37:03 -0300, Alvaro Herrera wrote: > Andres Freund wrote: > > On 2016-04-15 15:26:17 -0400, Tom Lane wrote: > > > I think the bottom line is that we misdesigned the WAL representation > > > by assuming that this sort of info could always be piggybacked on a > > > transaction commit record. It's time to fix that. > > > > I think we got to piggyback it onto a commit record, as long as there's > > one. Otherwise it's going to be more complex (queuing messages when > > reading an inval record) and slower (more wal records). I can easily > > develop a patch for that, the question is what we do on the back > > branches... > > We have introduced new wal records in back branches previously -- > nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). Yea, I remember ;). We made that decision because we couldn't find another way, and because the consequences were pretty grave. > The user just needs to make sure to upgrade the standbys first. If > they don't, they would die upon replay of the first such record, which > they can take as an indication that they need to be upgraded; the > standby is down for some time, but there is no data loss or > corruption. There could, if they're using wal_keep_segments, and the standby cannot be caught up anymore. I think it's still worth to go for the new record type, but it's a pretty close call. We could also just decide to document the issues :/ - but I'm not sure we're eing all of them yet. - Andres
On 2016-04-18 20:43:30 +0900, Michael Paquier wrote: > Yeah, introducing a new WAL record to address this issue in > back-branches would not be an issue, and that's what we should do. For > HEAD, let's add that in the commit record. I'm not sure why/how you'd do it that way in HEAD? I mean the only reason not to use a separate record is the standby incompatibility. - Andres
On 2016-04-18 13:15:19 +0100, Simon Riggs wrote: > OK, I'll write up a patch today to fix, with a view to backpatching. Cool. I've spent a couple minutes on this yesterday, and have the start of a patch - do you want that? - Andres
On 2016-04-15 16:53:37 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > On 2016-04-15 15:26:17 -0400, Tom Lane wrote: > >> I think the bottom line is that we misdesigned the WAL representation > >> by assuming that this sort of info could always be piggybacked on a > >> transaction commit record. It's time to fix that. > > > I think we got to piggyback it onto a commit record, as long as there's > > one. > > No objection to that part. What I'm saying is that when there isn't one, > the answer is a new record type, not forcing xid assignment. It might > look almost like a commit record, but it shouldn't imply that there > was a transaction. Here's a patch doing so. Note that, after putting the record into RM_XACT_ID first, I've decided to make it a RM_STANDBY_ID type record. I think it's likely that this is going to be needed beyond transaction commits, and it generally seems to fit better into standby.c; even if it's a bit awkward that commit records contain similar data. To avoid duplicating the *desc.c code, I've exposed standby_desc_invalidations() to also be used by xactdesc.c. It fixes the problem at hand, but essentially it's just luck that the patch is sufficient. The first layer of the issue is that queued invalidation messages aren't sent; but for vm_extend() there's another side to it. Namely vm_extend() does /* * Send a shared-inval message to force other backends to close any smgr * references they may have for this rel, which we are about to change. * This is a useful optimization because it means that backends don't have * to keep checking for creation or extension of the file, which happens * infrequently. */ CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode); but CacheInvalidateSmgr is non-transactional as it's comment explains: * * Note: because these messages are nontransactional, they won't be captured * in commit/abort WAL entries. Instead, calls to CacheInvalidateSmgr() * should happen in low-level smgr.c routines, which are executed while * replaying WAL as well as when creating it. * as far as I can see vm_extend() is the only current caller forgetting that rule. I don't think it's safe to use CacheInvalidateSmgr() outside smgr.c. The reason this all ends up working nonetheless is that the heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple() which queues a relcache invalidation, which in turn does a RelationCloseSmgr() in RelationClearRelation(). Thereby effectively doing a transactional CacheInvalidateSmgr(). But that seems more than fragile. ISTM we should additionally replace the CacheInvalidateSmgr() with a CacheInvalidateRelcache() and document that that implies an smgr invalidation. Alternatively we could log smgr (and relmapper) invalidations as well, but that's not quite non-invasive either; but might be a good long-term idea to keep things simpler. Comments? - Andres
Attachment
On 18 April 2016 at 13:15, Simon Riggs <simon@2ndquadrant.com> wrote:
--
On 18 April 2016 at 12:43, Michael Paquier <michael.paquier@gmail.com> wrote:On Sat, Apr 16, 2016 at 5:37 AM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Andres Freund wrote:
>> On 2016-04-15 15:26:17 -0400, Tom Lane wrote:
>> > I think the bottom line is that we misdesigned the WAL representation
>> > by assuming that this sort of info could always be piggybacked on a
>> > transaction commit record. It's time to fix that.
>>
>> I think we got to piggyback it onto a commit record, as long as there's
>> one. Otherwise it's going to be more complex (queuing messages when
>> reading an inval record) and slower (more wal records). I can easily
>> develop a patch for that, the question is what we do on the back
>> branches...
>
> We have introduced new wal records in back branches previously --
> nothing new (c.f. 8e9a16ab8f7f0e5813644975cc3f336e5b064b6e). The user
> just needs to make sure to upgrade the standbys first. If they don't,
> they would die upon replay of the first such record, which they can take
> as an indication that they need to be upgraded; the standby is down for
> some time, but there is no data loss or corruption.
Yeah, introducing a new WAL record to address this issue in
back-branches would not be an issue, and that's what we should do. For
HEAD, let's add that in the commit record.(non-reply just because of travel)OK, I'll write up a patch today to fix, with a view to backpatching.
Patch from Tuesday. On various planes.
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
On 2016-04-24 22:10:19 +0100, Simon Riggs wrote: > On 18 April 2016 at 13:15, Simon Riggs <simon@2ndquadrant.com> wrote: > > (non-reply just because of travel) > > > > OK, I'll write up a patch today to fix, with a view to backpatching. > > > > Patch from Tuesday. On various planes. I posted a version of this at http://archives.postgresql.org/message-id/20160424025117.2cmf6ku4ldfcoo44%40alap3.anarazel.de (because it was blocking me from fixing a newer bug) I think my approach of a separate record is going to be easier to backpatch - the new commit record format you took advantage of is new. If we go your way, this also needs pg_xlogdump/xact_desc() & logical decoding integration. Greetings, Andres Freund
Dean, Robert, Afaics the problem described below was introduced in b4e07417, do you have a different/better proposal than s/CacheInvalidateSmgr/CacheInvalidateRelcache/? Doing that doesn't feel quite right either, because it only makes the file extension visible at end-of-xact - which is mostly harmless, but still. On 2016-04-23 19:51:17 -0700, Andres Freund wrote: > It fixes the problem at hand, but essentially it's just luck that the > patch is sufficient. The first layer of the issue is that queued > invalidation messages aren't sent; but for vm_extend() there's another > side to it. Namely vm_extend() does > > /* > * Send a shared-inval message to force other backends to close any smgr > * references they may have for this rel, which we are about to change. > * This is a useful optimization because it means that backends don't have > * to keep checking for creation or extension of the file, which happens > * infrequently. > */ > CacheInvalidateSmgr(rel->rd_smgr->smgr_rnode); > > but CacheInvalidateSmgr is non-transactional as it's comment explains: > * > * Note: because these messages are nontransactional, they won't be captured > * in commit/abort WAL entries. Instead, calls to CacheInvalidateSmgr() > * should happen in low-level smgr.c routines, which are executed while > * replaying WAL as well as when creating it. > * > > as far as I can see vm_extend() is the only current caller forgetting > that rule. I don't think it's safe to use CacheInvalidateSmgr() outside > smgr.c. > > The reason this all ends up working nonetheless is that the > heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple() > which queues a relcache invalidation, which in turn does a > RelationCloseSmgr() in RelationClearRelation(). Thereby effectively > doing a transactional CacheInvalidateSmgr(). But that seems more than > fragile. > > ISTM we should additionally replace the CacheInvalidateSmgr() with a > CacheInvalidateRelcache() and document that that implies an smgr > invalidation. Alternatively we could log smgr (and relmapper) > invalidations as well, but that's not quite non-invasive either; but > might be a good long-term idea to keep things simpler. - Andres
On Tue, Apr 26, 2016 at 1:13 AM, Andres Freund <andres@anarazel.de> wrote: > I think my approach of a separate record is going to be easier to > backpatch - the new commit record format you took advantage of is > new. Sorry for the late reply. After reading both patches yesterday, I found your approach with standby.c to be neater. -- Michael
On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote: > Here's a patch doing so. Note that, after putting the record into RM_XACT_ID > first, I've decided to make it a RM_STANDBY_ID type record. I think it's > likely that this is going to be needed beyond transaction commits, and > it generally seems to fit better into standby.c; even if it's a bit > awkward that commit records contain similar data. To avoid duplicating > the *desc.c code, I've exposed standby_desc_invalidations() to also be > used by xactdesc.c. I have been eyeballing this patch for some time since yesterday and did some tests, and that's really neat. Having the invalidation stuff in standby.c makes quite some sense as well. + * wanting to emit more WAL recoreds). s/recoreds/records/ > The reason this all ends up working nonetheless is that the > heap_inplace_update()s in vacuum triggers a CacheInvalidateHeapTuple() > which queues a relcache invalidation, which in turn does a > RelationCloseSmgr() in RelationClearRelation(). Thereby effectively > doing a transactional CacheInvalidateSmgr(). But that seems more than > fragile. You have spent quite some time on that. As things stand that's indeed a bit fragile.. > ISTM we should additionally replace the CacheInvalidateSmgr() with a > CacheInvalidateRelcache() and document that that implies an smgr > invalidation. Alternatively we could log smgr (and relmapper) > invalidations as well, but that's not quite non-invasive either; but > might be a good long-term idea to keep things simpler. > > Comments? Yeah, this looks like a good idea at the end. As the invalidation patch is aimed at being backpatched, this may be something to do as well in back-branches. -- Michael
Hi, Thanks for looking into this. On 2016-04-26 11:43:06 +0900, Michael Paquier wrote: > On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote: > > ISTM we should additionally replace the CacheInvalidateSmgr() with a > > CacheInvalidateRelcache() and document that that implies an smgr > > invalidation. Alternatively we could log smgr (and relmapper) > > invalidations as well, but that's not quite non-invasive either; but > > might be a good long-term idea to keep things simpler. > > > > Comments? > > Yeah, this looks like a good idea at the end. You mean the bit about making smgr invalidations logged? > As the invalidation patch is aimed at being backpatched, this may be > something to do as well in back-branches. I'm a bit split here. I think forcing processing of invalidations at moments they've previously never been processed is a bit risky for the back branches. But on the other hand relcache invalidations are only processed at end-of-xact, which isn't really correct for the code at hand :/ Greetings, Andres Freund
On Tue, Apr 26, 2016 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote: > Thanks for looking into this. > > On 2016-04-26 11:43:06 +0900, Michael Paquier wrote: >> On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote: >> > ISTM we should additionally replace the CacheInvalidateSmgr() with a >> > CacheInvalidateRelcache() and document that that implies an smgr >> > invalidation. Alternatively we could log smgr (and relmapper) >> > invalidations as well, but that's not quite non-invasive either; but >> > might be a good long-term idea to keep things simpler. >> > >> > Comments? >> >> Yeah, this looks like a good idea at the end. > > You mean the bit about making smgr invalidations logged? Sorry for the lack of precision in my words. I was referring to your idea of replacing CacheInvalidateSmgr() by CacheInvalidateRelcache() sounds like a good option as this would make the invalidation to be logged at commit. Thinking about the logging of smgr invalidations, this is quite interesting. But what would we actually gain in doing that? Do you foresee any advantages in doing so? The only case where this would be useful now is for vm_extend by looking at the code. >> As the invalidation patch is aimed at being backpatched, this may be >> something to do as well in back-branches. > > I'm a bit split here. I think forcing processing of invalidations at > moments they've previously never been processed is a bit risky for the > back branches. But on the other hand relcache invalidations are only > processed at end-of-xact, which isn't really correct for the code at > hand :/ Oh, OK. So you mean that this patch is not aimed for back-branches with this new record type, but that's only for HEAD.. To be honest, I thought that we had better address this issue on back-branches with a patch close to what you are proposing (which is more or less what Simon has actually sent), and keep the change of CacheInvalidateSmgr() in visibilitymap.c to be HEAD-only. -- Michael
On Tue, Apr 26, 2016 at 12:39 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Tue, Apr 26, 2016 at 11:47 AM, Andres Freund <andres@anarazel.de> wrote: >> Thanks for looking into this. >> >> On 2016-04-26 11:43:06 +0900, Michael Paquier wrote: >>> On Sun, Apr 24, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote: >>> > ISTM we should additionally replace the CacheInvalidateSmgr() with a >>> > CacheInvalidateRelcache() and document that that implies an smgr >>> > invalidation. Alternatively we could log smgr (and relmapper) >>> > invalidations as well, but that's not quite non-invasive either; but >>> > might be a good long-term idea to keep things simpler. >>> > >>> > Comments? >>> >>> Yeah, this looks like a good idea at the end. >> >> You mean the bit about making smgr invalidations logged? > > Sorry for the lack of precision in my words. I was referring to your > idea of replacing CacheInvalidateSmgr() by CacheInvalidateRelcache() > sounds like a good option as this would make the invalidation to be > logged at commit. Thinking about the logging of smgr invalidations, > this is quite interesting. But what would we actually gain in doing > that? Do you foresee any advantages in doing so? The only case where > this would be useful now is for vm_extend by looking at the code. > >>> As the invalidation patch is aimed at being backpatched, this may be >>> something to do as well in back-branches. >> >> I'm a bit split here. I think forcing processing of invalidations at >> moments they've previously never been processed is a bit risky for the >> back branches. But on the other hand relcache invalidations are only >> processed at end-of-xact, which isn't really correct for the code at >> hand :/ > > Oh, OK. So you mean that this patch is not aimed for back-branches > with this new record type, but that's only for HEAD.. To be honest, I > thought that we had better address this issue on back-branches with a > patch close to what you are proposing (which is more or less what > Simon has actually sent), and keep the change of CacheInvalidateSmgr() > in visibilitymap.c to be HEAD-only. Ah, and actually as I'm on it from your previous patch there is this bit: + else if (msg->id == SHAREDINVALRELMAP_ID) + appendStringInfoString(buf, " relmap"); + else if (msg->id == SHAREDINVALRELMAP_ID) + appendStringInfo(buf, " relmap db %u", msg->rm.dbId); You surely want to check for !OidIsValid(msg->rm.dbId) in the first one. -- Michael
On 2016-04-26 12:39:37 +0900, Michael Paquier wrote: > Thinking about the logging of smgr invalidations, this is quite > interesting. But what would we actually gain in doing that? Do you > foresee any advantages in doing so? The only case where this would be > useful now is for vm_extend by looking at the code. Well, it'd make vm_extend actually correct, which replacing the invalidation with a relcache one would not. Relcache invalidations are transactional, whereas smgr ones are not (intentionally so!). I don't think it's currently a big problem, but it does make me rather wary. > >> As the invalidation patch is aimed at being backpatched, this may be > >> something to do as well in back-branches. > > > > I'm a bit split here. I think forcing processing of invalidations at > > moments they've previously never been processed is a bit risky for the > > back branches. But on the other hand relcache invalidations are only > > processed at end-of-xact, which isn't really correct for the code at > > hand :/ > > Oh, OK. So you mean that this patch is not aimed for back-branches > with this new record type, but that's only for HEAD. No, I think we got to do this in all branches. I was just wondering about how to fix vm_extend(). Which I do think we got to fix, even in the back-branches. - Andres
On 26 April 2016 at 04:48, Andres Freund <andres@anarazel.de> wrote: > No, I think we got to do this in all branches. I was just wondering > about how to fix vm_extend(). Which I do think we got to fix, even in > the back-branches. > I think replacing CacheInvalidateSmgr() with CacheInvalidateRelcache() in vm_extend() is probably the safer thing to do, and ought to be relatively harmless. It means that an index-only scan won't be notified of VM extension until the end of the other transaction, which might lead to extra heap fetches, but I think that's unlikely to have any performance impact because it ought to be a fairly rare event, and if it was another transaction adding tuples, they wouldn't be all visible before it was committed anyway, so the extra heap fetches would be required in any case. Regards, Dean
On Mon, Apr 25, 2016 at 2:10 PM, Andres Freund <andres@anarazel.de> wrote: > Afaics the problem described below was introduced in b4e07417, do you > have a different/better proposal than > s/CacheInvalidateSmgr/CacheInvalidateRelcache/? Doing that doesn't feel > quite right either, because it only makes the file extension visible at > end-of-xact - which is mostly harmless, but still. Maybe I'm all wet here, but it seems to me like this is a bug in heap_xlog_visible(). If the relation has been extended on the master but no bits are set, then it doesn't really matter whether any invalidation happens on the standby. On the other hand, if the relation has been extended on the master and a bit has actually been set on the new page, then there should be an XLOG_HEAP2_VISIBLE record that gets replayed on the standby. And replay of that record should call visibilitymap_pin(), which should call vm_readbuf(), which should call vm_extend(), which should issue the same smgr invalidation on the standby that got issued on the master. So no problem! However, in the FPI case, heap_xlog_visible() optimizes all of that away, so the invalidation doesn't get issued. Oops. But we could cure that just by having heap_xlog_visible() still call CacheInvalidateSmgr() even in that case, and then we'd be actually be following the rule that non-transactional invalidations should be issued on the standby even while in replay. That seems cleaner to me than switching to a transactional invalidation, which isn't really the right thing anyway. In other words, I think Masahiko Sawada's patch in the original post is pretty much right on target, except that we don't need to do that always, but rather only in the FPI case when the call to visibilitymap_pin() is being optimized away. If we solve the problem that way, I don't think we even need a new WAL record for this case, which is a non-trivial fringe benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > In other words, I think Masahiko Sawada's patch in the original post > is pretty much right on target, except that we don't need to do that > always, but rather only in the FPI case when the call to > visibilitymap_pin() is being optimized away. If we solve the problem > that way, I don't think we even need a new WAL record for this case, > which is a non-trivial fringe benefit. The visibility map is not the only thing that need to be addressed, no? For example take this report from Dmitry Vasilyev of a couple of months back where index relations are not visible on a standby: http://www.postgresql.org/message-id/CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com This is really leading to a solution where we need to take a more general approach to this problem instead of trying to patch multiple WAL replay code paths. And Andres' stuff does so. -- Michael
On 2016-04-27 14:27:33 +0900, Michael Paquier wrote: > On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: > > In other words, I think Masahiko Sawada's patch in the original post > > is pretty much right on target, except that we don't need to do that > > always, but rather only in the FPI case when the call to > > visibilitymap_pin() is being optimized away. If we solve the problem > > that way, I don't think we even need a new WAL record for this case, > > which is a non-trivial fringe benefit. > > The visibility map is not the only thing that need to be addressed, > no? If I understand Robert correctly his point is about fixing the smgr inval alone - without the invalidation fix that'd not be enough because the relcache info with outdated information (particularly relallvisible et al), would continue to be valid. Relcache invalidations imply an smgr one, but not the other way round. The reason the fix for nmsg > 0 && !markXidCommitted isn't entirely sufficient is because the smgr invalidation isn't transaction bound, i.e. sent out immediately. So, to have the same behaviour on master/HS, we need a way to send out the invalidiation properly in lockstep with replay. - Andres
On Wed, Apr 27, 2016 at 11:51 AM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-27 14:27:33 +0900, Michael Paquier wrote: >> On Wed, Apr 27, 2016 at 12:25 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> > In other words, I think Masahiko Sawada's patch in the original post >> > is pretty much right on target, except that we don't need to do that >> > always, but rather only in the FPI case when the call to >> > visibilitymap_pin() is being optimized away. If we solve the problem >> > that way, I don't think we even need a new WAL record for this case, >> > which is a non-trivial fringe benefit. >> >> The visibility map is not the only thing that need to be addressed, >> no? > > If I understand Robert correctly his point is about fixing the smgr > inval alone - without the invalidation fix that'd not be enough because > the relcache info with outdated information (particularly relallvisible > et al), would continue to be valid. Relcache invalidations imply an smgr > one, but not the other way round. > > The reason the fix for nmsg > 0 && !markXidCommitted isn't entirely > sufficient is because the smgr invalidation isn't transaction bound, > i.e. sent out immediately. So, to have the same behaviour on master/HS, > we need a way to send out the invalidiation properly in lockstep with > replay. What I'm confused about here is: Masahiko Sawada posted a patch that fixes the problem for him, which does not involve any new WAL record type. It also seems to be fixing the problem in a way that is clean and consistent with what we've done elsewhere. The patch actually under discussion here manages to introduce a new WAL record type without fixing that problem. Therefore I include that the committed patch fixes some *other* problem, not the one that this thread is ostensibly about. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2016-04-27 11:59:39 -0400, Robert Haas wrote: > Masahiko Sawada posted a patch that fixes the problem for him, which > does not involve any new WAL record type. It also seems to be fixing > the problem in a way that is clean and consistent with what we've done > elsewhere. It only fixes one symptom, the relcache entry is still wrong afterwards. Which is pretty relevant for planning. > The patch actually under discussion here manages to introduce a new > WAL record type without fixing that problem. It does fix the problem, just not in a super robust way. Which is why I think we should add something like Masahiko's fix additionally. Greetings, Andres Freund
On Wed, Apr 27, 2016 at 12:03 PM, Andres Freund <andres@anarazel.de> wrote: > On 2016-04-27 11:59:39 -0400, Robert Haas wrote: >> Masahiko Sawada posted a patch that fixes the problem for him, which >> does not involve any new WAL record type. It also seems to be fixing >> the problem in a way that is clean and consistent with what we've done >> elsewhere. > > It only fixes one symptom, the relcache entry is still wrong > afterwards. Which is pretty relevant for planning. > > >> The patch actually under discussion here manages to introduce a new >> WAL record type without fixing that problem. > > It does fix the problem, just not in a super robust way. Which is why I > think we should add something like Masahiko's fix additionally. OK, that works for me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company