Thread: Wrong defeinition of pq_putmessage_noblock since 9.5
Hello, While testing replication for 9.5, we found that repl-master can ignore wal_sender_timeout and seemingly waits for TCP retransmission timeout for the case of sudden power-off of a standby. My investigation told me that the immediate cause could be that secure_write() is called with *blocking mode* (that is, port->noblock = false) under *pq_putmessage_noblock* macro called from XLogSendPhysical(). libpq.h of 9.5 and newer defines it as the following, > #define pq_putmessage(msgtype, s, len) \ > (PqCommMethods->putmessage(msgtype, s, len)) > #define pq_putmessage_noblock(msgtype, s, len) \ > (PqCommMethods->putmessage(msgtype, s, len)) which is apparently should be the following. > #define pq_putmessage_noblock(msgtype, s, len) \ > (PqCommMethods->putmessage_noblock(msgtype, s, len)) The attached patch fixes it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Thu, Jul 28, 2016 at 6:52 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > Hello, > > While testing replication for 9.5, we found that repl-master can > ignore wal_sender_timeout and seemingly waits for TCP > retransmission timeout for the case of sudden power-off of a > standby. > > My investigation told me that the immediate cause could be that > secure_write() is called with *blocking mode* (that is, > port->noblock = false) under *pq_putmessage_noblock* macro called > from XLogSendPhysical(). > > libpq.h of 9.5 and newer defines it as the following, > >> #define pq_putmessage(msgtype, s, len) \ >> (PqCommMethods->putmessage(msgtype, s, len)) >> #define pq_putmessage_noblock(msgtype, s, len) \ >> (PqCommMethods->putmessage(msgtype, s, len)) > > which is apparently should be the following. > >> #define pq_putmessage_noblock(msgtype, s, len) \ >> (PqCommMethods->putmessage_noblock(msgtype, s, len)) > > The attached patch fixes it. Good catch! Barring objection, I will push this to both master and 9.5. Regards, -- Fujii Masao
On Thu, Jul 28, 2016 at 9:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote: > On Thu, Jul 28, 2016 at 6:52 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: >> Hello, >> >> While testing replication for 9.5, we found that repl-master can >> ignore wal_sender_timeout and seemingly waits for TCP >> retransmission timeout for the case of sudden power-off of a >> standby. >> >> My investigation told me that the immediate cause could be that >> secure_write() is called with *blocking mode* (that is, >> port->noblock = false) under *pq_putmessage_noblock* macro called >> from XLogSendPhysical(). >> >> libpq.h of 9.5 and newer defines it as the following, >> >>> #define pq_putmessage(msgtype, s, len) \ >>> (PqCommMethods->putmessage(msgtype, s, len)) >>> #define pq_putmessage_noblock(msgtype, s, len) \ >>> (PqCommMethods->putmessage(msgtype, s, len)) >> >> which is apparently should be the following. >> >>> #define pq_putmessage_noblock(msgtype, s, len) \ >>> (PqCommMethods->putmessage_noblock(msgtype, s, len)) >> >> The attached patch fixes it. > > Good catch! Barring objection, I will push this to both master and 9.5. Regarding this patch, while reading pqcomm.c, I found the following things. 1. socket_comm_reset() calls pq_endcopyout(). I think that socket_endcopyout() should be called, instead. 2. socket_putmessage_noblock() calls pq_putmessage(). I think that socket_putmessage() should be called, instead. 3. Several source comments in pqcomm.c have not been updated. Some comments still use the old function name like pq_putmessage(). Attached patch fixes the above issues. Regards, -- Fujii Masao
Attachment
Fujii Masao <masao.fujii@gmail.com> writes: > 3. Several source comments in pqcomm.c have not been updated. > Some comments still use the old function name like pq_putmessage(). > Attached patch fixes the above issues. I dunno, this seems like it's doubling down on some extremely poor decisions. Why is it that you now have to flip a coin to guess whether the prefix is pq_ or socket_ for functions in this module? I would rather see that renaming reverted. regards, tom lane
At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us> > Fujii Masao <masao.fujii@gmail.com> writes: > > 3. Several source comments in pqcomm.c have not been updated. > > Some comments still use the old function name like pq_putmessage(). > > > Attached patch fixes the above issues. > > I dunno, this seems like it's doubling down on some extremely poor > decisions. Why is it that you now have to flip a coin to guess whether > the prefix is pq_ or socket_ for functions in this module? I would > rather see that renaming reverted. The set of functions in PQcommMethods doesn't seem clean. They are choosed arbitrary just so that other pq_* functions used in parallel workers will work as expected. I suppose that it needs some refactoring. By the way, pq_start/endcopyout() are used only in FE protocols below 3.0, which had already bacome obsolete as of PG7.4. While the next dev cycle is for PG10, if there is no particular reason to support such acient protocols, removing them would make things easier and cleaner. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Jul 29, 2016 at 12:18 PM, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us> >> Fujii Masao <masao.fujii@gmail.com> writes: >> > 3. Several source comments in pqcomm.c have not been updated. >> > Some comments still use the old function name like pq_putmessage(). >> >> > Attached patch fixes the above issues. >> >> I dunno, this seems like it's doubling down on some extremely poor >> decisions. Why is it that you now have to flip a coin to guess whether >> the prefix is pq_ or socket_ for functions in this module? I would >> rather see that renaming reverted. Yes, I agree with that. I cannot understand the intention behind 2bd9e41 to rename those routines as they are now, so getting them back with pg_ as prefix looks like a good idea to me. > The set of functions in PQcommMethods doesn't seem clean. They > are chosen arbitrarily just so that other pq_* functions used in > parallel workers will work as expected. I suppose that it needs > some refactoring. Any work in this area is likely 10.0 material at this point. > By the way, pq_start/endcopyout() are used only in FE protocols > below 3.0, which had already bacome obsolete as of PG7.4. While > the next dev cycle is for PG10, if there is no particular reason > to support such ancient protocols, removing them would make things > easier and cleaner. Remove support for protocol 2 has been in the air for some time, but that's a separate discussion. If you want to discuss this issue particularly, raising a new thread would be a good idea. -- Michael
At Fri, 29 Jul 2016 12:47:53 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqSSNTroRi=zGMDxYa7PzX_VSck6hbHY6eTnBBsfYaah6A@mail.gmail.com> > On Fri, Jul 29, 2016 at 12:18 PM, Kyotaro HORIGUCHI > <horiguchi.kyotaro@lab.ntt.co.jp> wrote: > > At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us> > >> Fujii Masao <masao.fujii@gmail.com> writes: > >> > 3. Several source comments in pqcomm.c have not been updated. > >> > Some comments still use the old function name like pq_putmessage(). > >> > >> > Attached patch fixes the above issues. > >> > >> I dunno, this seems like it's doubling down on some extremely poor > >> decisions. Why is it that you now have to flip a coin to guess whether > >> the prefix is pq_ or socket_ for functions in this module? I would > >> rather see that renaming reverted. > > Yes, I agree with that. I cannot understand the intention behind > 2bd9e41 to rename those routines as they are now, so getting them back > with pg_ as prefix looks like a good idea to me. The many of the socket_* functions are required to be replaced with mq_* functions for backgroud workers. So reverting the names of socket_* functions should be accompanied by the need to, for example, changing their callers to use them explicitly via PqCommMethods, not hiding with macros, or renaming current pq_* macros to, say, pqi_. (it stands for PQ-Indirect as a tentative) I'm not confident on the new prefix since I'm not sure that what the 'pq' stands for. (Postgres Query?) Attached patch is a rush work to revert the names of socket_ functions and replace the prefix of the macros with "pqi". pq_ names no longer points to mq_ functions. Is this roughly on the right way? Even though the prefix is not appropriate. > > The set of functions in PQcommMethods doesn't seem clean. They > > are chosen arbitrarily just so that other pq_* functions used in > > parallel workers will work as expected. I suppose that it needs > > some refactoring. > > Any work in this area is likely 10.0 material at this point. > > > By the way, pq_start/endcopyout() are used only in FE protocols > > below 3.0, which had already bacome obsolete as of PG7.4. While > > the next dev cycle is for PG10, if there is no particular reason > > to support such ancient protocols, removing them would make things > > easier and cleaner. > > Remove support for protocol 2 has been in the air for some time, but > that's a separate discussion. If you want to discuss this issue > particularly, raising a new thread would be a good idea. Thanks. I saw in somewhere that the protocol 2 no longer works. I'll raise a new thread later. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > At Fri, 29 Jul 2016 12:47:53 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqSSNTroRi=zGMDxYa7PzX_VSck6hbHY6eTnBBsfYaah6A@mail.gmail.com> >>> At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us> >>> I dunno, this seems like it's doubling down on some extremely poor >>> decisions. Why is it that you now have to flip a coin to guess whether >>> the prefix is pq_ or socket_ for functions in this module? I would >>> rather see that renaming reverted. >> Yes, I agree with that. I cannot understand the intention behind >> 2bd9e41 to rename those routines as they are now, so getting them back >> with pg_ as prefix looks like a good idea to me. > The many of the socket_* functions are required to be replaced > with mq_* functions for backgroud workers. So reverting the names > of socket_* functions should be accompanied by the need to, for > example, changing their callers to use them explicitly via > PqCommMethods, not hiding with macros, or renaming current pq_* > macros to, say, pqi_. (it stands for PQ-Indirect as a tentative) > I'm not confident on the new prefix since I'm not sure that what > the 'pq' stands for. (Postgres Query?) Hmm. Of course the problem with this approach is that we end up touching a whole bunch of call sites, which sort of negates the point of having installed a compatibility macro layer. [ spends awhile longer looking around at this code ] Maybe the real problem here is that the abstraction layer is so incomplete and apparently haphazard, so that it's not very obvious where you should use a pq_xxx name and where you should refer to socket_xxx. For some of the functions in pqcomm.c, such as internal_flush, it's impossible to tell which side of the abstraction boundary they're supposed to be on. (I suspect that that particular function has simply been ignored on the undocumented assumption that a bgworker could never call it, but that's the kind of half-baked work that I don't find acceptable.) I think the core work that needs to be done here is to clearly identify each function in pqcomm.c as either above or below the PqCommMethods abstraction boundary. Maybe we should split one set or the other out to a new source file. In any case, the naming convention ought to reflect that hierarchy. I withdraw the idea that we should try to revert all these functions back to their old names, since that would obscure the layering distinction between socket-specific and general-usage functions. I now think that the problem is that that distinction hasn't been drawn sharply enough. > The set of functions in PQcommMethods doesn't seem clean. They > are chosen arbitrarily just so that other pq_* functions used in > parallel workers will work as expected. I suppose that it needs > some refactoring. Yeah, exactly. > Any work in this area is likely 10.0 material at this point. I'm not really happy with that, since refactoring it again will create back-patch hazards. But I see that a lot of the mess here was created in 9.5, which means we're probably stuck with back-patch hazards anyway. regards, tom lane PS: "PQ" comes from PostQUEL, I believe.
I wrote: > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: >> Any work in this area is likely 10.0 material at this point. > I'm not really happy with that, since refactoring it again will create > back-patch hazards. But I see that a lot of the mess here was created > in 9.5, which means we're probably stuck with back-patch hazards anyway. I've pushed Kyotaro-san's original patch, which is clearly a bug fix. I think the additional changes discussed later in this thread are cosmetic, and probably should wait for a more general review of the layering decisions in pqcomm.c. regards, tom lane
At Fri, 29 Jul 2016 13:00:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <29430.1469811650@sss.pgh.pa.us> > I wrote: > > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > >> Any work in this area is likely 10.0 material at this point. > > > I'm not really happy with that, since refactoring it again will create > > back-patch hazards. But I see that a lot of the mess here was created > > in 9.5, which means we're probably stuck with back-patch hazards anyway. > > I've pushed Kyotaro-san's original patch, which is clearly a bug fix. > I think the additional changes discussed later in this thread are > cosmetic, and probably should wait for a more general review of the > layering decisions in pqcomm.c. Agreed. It's not an abstraction or API, but it actually works with no problem and changing it is an issue obviously for later discussion. Anyway thank you for committing this. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
At Fri, 29 Jul 2016 11:58:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <14846.1469807884@sss.pgh.pa.us> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes: > > The many of the socket_* functions are required to be replaced > > with mq_* functions for backgroud workers. So reverting the names > > of socket_* functions should be accompanied by the need to, for > > example, changing their callers to use them explicitly via > > PqCommMethods, not hiding with macros, or renaming current pq_* > > macros to, say, pqi_. (it stands for PQ-Indirect as a tentative) > > I'm not confident on the new prefix since I'm not sure that what > > the 'pq' stands for. (Postgres Query?) > > Hmm. Of course the problem with this approach is that we end up touching > a whole bunch of call sites, which sort of negates the point of having > installed a compatibility macro layer. > > [ spends awhile longer looking around at this code ] > > Maybe the real problem here is that the abstraction layer is so incomplete > and apparently haphazard, so that it's not very obvious where you should > use a pq_xxx name and where you should refer to socket_xxx. Yes, 'haphazard' is the word I was seeking for. > For some of > the functions in pqcomm.c, such as internal_flush, it's impossible to tell > which side of the abstraction boundary they're supposed to be on. > (I suspect that that particular function has simply been ignored on the > undocumented assumption that a bgworker could never call it, but that's > the kind of half-baked work that I don't find acceptable.) > > I think the core work that needs to be done here is to clearly identify > each function in pqcomm.c as either above or below the PqCommMethods > abstraction boundary. Maybe we should split one set or the other out > to a new source file. In any case, the naming convention ought to > reflect that hierarchy. Agreed. I'm not sure if there's a clean boundary, though. > I withdraw the idea that we should try to revert all these functions back > to their old names, since that would obscure the layering distinction > between socket-specific and general-usage functions. I now think that > the problem is that that distinction hasn't been drawn sharply enough. > > > The set of functions in PQcommMethods doesn't seem clean. They > > are chosen arbitrarily just so that other pq_* functions used in > > parallel workers will work as expected. I suppose that it needs > > some refactoring. > > Yeah, exactly. > > > Any work in this area is likely 10.0 material at this point. > > I'm not really happy with that, since refactoring it again will create > back-patch hazards. But I see that a lot of the mess here was created > in 9.5, which means we're probably stuck with back-patch hazards anyway. I think we have an alternative not to backpatch that refactoring. 9.5 works with it happily and won't get further improovement in this area. > regards, tom lane > > PS: "PQ" comes from PostQUEL, I believe. Thanks, I've learned something new:) https://en.wikipedia.org/wiki/QUEL_query_languages > range of E is EMPLOYEE > retrieve into W > (COMP = E.Salary / (E.Age - 18)) > where E.Name = "Jones" It looks more methematical or programming-language-like to me than SQL. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Jul 29, 2016 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Maybe the real problem here is that the abstraction layer is so incomplete > and apparently haphazard, so that it's not very obvious where you should > use a pq_xxx name and where you should refer to socket_xxx. For some of > the functions in pqcomm.c, such as internal_flush, it's impossible to tell > which side of the abstraction boundary they're supposed to be on. > (I suspect that that particular function has simply been ignored on the > undocumented assumption that a bgworker could never call it, but that's > the kind of half-baked work that I don't find acceptable.) > > I think the core work that needs to be done here is to clearly identify > each function in pqcomm.c as either above or below the PqCommMethods > abstraction boundary. Maybe we should split one set or the other out > to a new source file. In any case, the naming convention ought to > reflect that hierarchy. > > I withdraw the idea that we should try to revert all these functions back > to their old names, since that would obscure the layering distinction > between socket-specific and general-usage functions. I now think that > the problem is that that distinction hasn't been drawn sharply enough. If memory serves, there was some discussion of this at the time the patch that changed this was originally submitted. The original patch, IIRC, just installed one or two hooks and it was argued that I should instead create some kind of abstraction layer. The present coding is the result of my attempt to do just that. I have to admit that I wasn't very eager to churn the contents of this file more than necessary. It seems like old, crufty code to me. I don't object if you want to refactor it, but I'm not sure what problem it solves. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company