Thread: Wrong defeinition of pq_putmessage_noblock since 9.5

Wrong defeinition of pq_putmessage_noblock since 9.5

From
Kyotaro HORIGUCHI
Date:
Hello,

While testing replication for 9.5, we found that repl-master can
ignore wal_sender_timeout and seemingly waits for TCP
retransmission timeout for the case of sudden power-off of a
standby.

My investigation told me that the immediate cause could be that
secure_write() is called with *blocking mode* (that is,
port->noblock = false) under *pq_putmessage_noblock* macro called
from XLogSendPhysical().

libpq.h of 9.5 and newer defines it as the following,

> #define pq_putmessage(msgtype, s, len) \
>     (PqCommMethods->putmessage(msgtype, s, len))
> #define pq_putmessage_noblock(msgtype, s, len) \
>     (PqCommMethods->putmessage(msgtype, s, len))

which is apparently should be the following.

> #define pq_putmessage_noblock(msgtype, s, len) \
>     (PqCommMethods->putmessage_noblock(msgtype, s, len))

The attached patch fixes it.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Fujii Masao
Date:
On Thu, Jul 28, 2016 at 6:52 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> Hello,
>
> While testing replication for 9.5, we found that repl-master can
> ignore wal_sender_timeout and seemingly waits for TCP
> retransmission timeout for the case of sudden power-off of a
> standby.
>
> My investigation told me that the immediate cause could be that
> secure_write() is called with *blocking mode* (that is,
> port->noblock = false) under *pq_putmessage_noblock* macro called
> from XLogSendPhysical().
>
> libpq.h of 9.5 and newer defines it as the following,
>
>> #define pq_putmessage(msgtype, s, len) \
>>       (PqCommMethods->putmessage(msgtype, s, len))
>> #define pq_putmessage_noblock(msgtype, s, len) \
>>       (PqCommMethods->putmessage(msgtype, s, len))
>
> which is apparently should be the following.
>
>> #define pq_putmessage_noblock(msgtype, s, len) \
>>       (PqCommMethods->putmessage_noblock(msgtype, s, len))
>
> The attached patch fixes it.

Good catch! Barring objection, I will push this to both master and 9.5.

Regards,

-- 
Fujii Masao



Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Fujii Masao
Date:
On Thu, Jul 28, 2016 at 9:08 PM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Jul 28, 2016 at 6:52 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Hello,
>>
>> While testing replication for 9.5, we found that repl-master can
>> ignore wal_sender_timeout and seemingly waits for TCP
>> retransmission timeout for the case of sudden power-off of a
>> standby.
>>
>> My investigation told me that the immediate cause could be that
>> secure_write() is called with *blocking mode* (that is,
>> port->noblock = false) under *pq_putmessage_noblock* macro called
>> from XLogSendPhysical().
>>
>> libpq.h of 9.5 and newer defines it as the following,
>>
>>> #define pq_putmessage(msgtype, s, len) \
>>>       (PqCommMethods->putmessage(msgtype, s, len))
>>> #define pq_putmessage_noblock(msgtype, s, len) \
>>>       (PqCommMethods->putmessage(msgtype, s, len))
>>
>> which is apparently should be the following.
>>
>>> #define pq_putmessage_noblock(msgtype, s, len) \
>>>       (PqCommMethods->putmessage_noblock(msgtype, s, len))
>>
>> The attached patch fixes it.
>
> Good catch! Barring objection, I will push this to both master and 9.5.

Regarding this patch, while reading pqcomm.c, I found the following things.

1. socket_comm_reset() calls pq_endcopyout().
    I think that socket_endcopyout() should be called, instead.

2. socket_putmessage_noblock() calls pq_putmessage().
    I think that socket_putmessage() should be called, instead.

3. Several source comments in pqcomm.c have not been updated.
    Some comments still use the old function name like pq_putmessage().

Attached patch fixes the above issues.

Regards,

--
Fujii Masao

Attachment

Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Tom Lane
Date:
Fujii Masao <masao.fujii@gmail.com> writes:
> 3. Several source comments in pqcomm.c have not been updated.
>     Some comments still use the old function name like pq_putmessage().

> Attached patch fixes the above issues.

I dunno, this seems like it's doubling down on some extremely poor
decisions.  Why is it that you now have to flip a coin to guess whether
the prefix is pq_ or socket_ for functions in this module?  I would
rather see that renaming reverted.
        regards, tom lane



Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Kyotaro HORIGUCHI
Date:
At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us>
> Fujii Masao <masao.fujii@gmail.com> writes:
> > 3. Several source comments in pqcomm.c have not been updated.
> >     Some comments still use the old function name like pq_putmessage().
> 
> > Attached patch fixes the above issues.
> 
> I dunno, this seems like it's doubling down on some extremely poor
> decisions.  Why is it that you now have to flip a coin to guess whether
> the prefix is pq_ or socket_ for functions in this module?  I would
> rather see that renaming reverted.

The set of functions in PQcommMethods doesn't seem clean. They
are choosed arbitrary just so that other pq_* functions used in
parallel workers will work as expected. I suppose that it needs
some refactoring.

By the way, pq_start/endcopyout() are used only in FE protocols
below 3.0, which had already bacome obsolete as of PG7.4. While
the next dev cycle is for PG10, if there is no particular reason
to support such acient protocols, removing them would make things
easier and cleaner.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Michael Paquier
Date:
On Fri, Jul 29, 2016 at 12:18 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us>
>> Fujii Masao <masao.fujii@gmail.com> writes:
>> > 3. Several source comments in pqcomm.c have not been updated.
>> >     Some comments still use the old function name like pq_putmessage().
>>
>> > Attached patch fixes the above issues.
>>
>> I dunno, this seems like it's doubling down on some extremely poor
>> decisions.  Why is it that you now have to flip a coin to guess whether
>> the prefix is pq_ or socket_ for functions in this module?  I would
>> rather see that renaming reverted.

Yes, I agree with that. I cannot understand the intention behind
2bd9e41 to rename those routines as they are now, so getting them back
with pg_ as prefix looks like a good idea to me.

> The set of functions in PQcommMethods doesn't seem clean. They
> are chosen arbitrarily just so that other pq_* functions used in
> parallel workers will work as expected. I suppose that it needs
> some refactoring.

Any work in this area is likely 10.0 material at this point.

> By the way, pq_start/endcopyout() are used only in FE protocols
> below 3.0, which had already bacome obsolete as of PG7.4. While
> the next dev cycle is for PG10, if there is no particular reason
> to support such ancient protocols, removing them would make things
> easier and cleaner.

Remove support for protocol 2 has been in the air for some time, but
that's a separate discussion. If you want to discuss this issue
particularly, raising a new thread would be a good idea.
-- 
Michael



Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Kyotaro HORIGUCHI
Date:
At Fri, 29 Jul 2016 12:47:53 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqSSNTroRi=zGMDxYa7PzX_VSck6hbHY6eTnBBsfYaah6A@mail.gmail.com>
> On Fri, Jul 29, 2016 at 12:18 PM, Kyotaro HORIGUCHI
> <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us>
> >> Fujii Masao <masao.fujii@gmail.com> writes:
> >> > 3. Several source comments in pqcomm.c have not been updated.
> >> >     Some comments still use the old function name like pq_putmessage().
> >>
> >> > Attached patch fixes the above issues.
> >>
> >> I dunno, this seems like it's doubling down on some extremely poor
> >> decisions.  Why is it that you now have to flip a coin to guess whether
> >> the prefix is pq_ or socket_ for functions in this module?  I would
> >> rather see that renaming reverted.
> 
> Yes, I agree with that. I cannot understand the intention behind
> 2bd9e41 to rename those routines as they are now, so getting them back
> with pg_ as prefix looks like a good idea to me.

The many of the socket_* functions are required to be replaced
with mq_* functions for backgroud workers. So reverting the names
of socket_* functions should be accompanied by the need to, for
example, changing their callers to use them explicitly via
PqCommMethods, not hiding with macros, or renaming current pq_*
macros to, say, pqi_. (it stands for PQ-Indirect as a tentative)
I'm not confident on the new prefix since I'm not sure that what
the 'pq' stands for. (Postgres Query?)

Attached patch is a rush work to revert the names of socket_
functions and replace the prefix of the macros with "pqi". pq_
names no longer points to mq_ functions. Is this roughly on the
right way? Even though the prefix is not appropriate.

> > The set of functions in PQcommMethods doesn't seem clean. They
> > are chosen arbitrarily just so that other pq_* functions used in
> > parallel workers will work as expected. I suppose that it needs
> > some refactoring.
> 
> Any work in this area is likely 10.0 material at this point.
> 
> > By the way, pq_start/endcopyout() are used only in FE protocols
> > below 3.0, which had already bacome obsolete as of PG7.4. While
> > the next dev cycle is for PG10, if there is no particular reason
> > to support such ancient protocols, removing them would make things
> > easier and cleaner.
> 
> Remove support for protocol 2 has been in the air for some time, but
> that's a separate discussion. If you want to discuss this issue
> particularly, raising a new thread would be a good idea.

Thanks. I saw in somewhere that the protocol 2 no longer
works. I'll raise a new thread later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Tom Lane
Date:
Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> At Fri, 29 Jul 2016 12:47:53 +0900, Michael Paquier <michael.paquier@gmail.com> wrote in
<CAB7nPqSSNTroRi=zGMDxYa7PzX_VSck6hbHY6eTnBBsfYaah6A@mail.gmail.com>
>>> At Thu, 28 Jul 2016 10:46:00 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <4313.1469717160@sss.pgh.pa.us>
>>> I dunno, this seems like it's doubling down on some extremely poor
>>> decisions.  Why is it that you now have to flip a coin to guess whether
>>> the prefix is pq_ or socket_ for functions in this module?  I would
>>> rather see that renaming reverted.

>> Yes, I agree with that. I cannot understand the intention behind
>> 2bd9e41 to rename those routines as they are now, so getting them back
>> with pg_ as prefix looks like a good idea to me.

> The many of the socket_* functions are required to be replaced
> with mq_* functions for backgroud workers. So reverting the names
> of socket_* functions should be accompanied by the need to, for
> example, changing their callers to use them explicitly via
> PqCommMethods, not hiding with macros, or renaming current pq_*
> macros to, say, pqi_. (it stands for PQ-Indirect as a tentative)
> I'm not confident on the new prefix since I'm not sure that what
> the 'pq' stands for. (Postgres Query?)

Hmm.  Of course the problem with this approach is that we end up touching
a whole bunch of call sites, which sort of negates the point of having
installed a compatibility macro layer.

[ spends awhile longer looking around at this code ]

Maybe the real problem here is that the abstraction layer is so incomplete
and apparently haphazard, so that it's not very obvious where you should
use a pq_xxx name and where you should refer to socket_xxx.  For some of
the functions in pqcomm.c, such as internal_flush, it's impossible to tell
which side of the abstraction boundary they're supposed to be on.
(I suspect that that particular function has simply been ignored on the
undocumented assumption that a bgworker could never call it, but that's
the kind of half-baked work that I don't find acceptable.)

I think the core work that needs to be done here is to clearly identify
each function in pqcomm.c as either above or below the PqCommMethods
abstraction boundary.  Maybe we should split one set or the other out
to a new source file.  In any case, the naming convention ought to
reflect that hierarchy.

I withdraw the idea that we should try to revert all these functions back
to their old names, since that would obscure the layering distinction
between socket-specific and general-usage functions.  I now think that
the problem is that that distinction hasn't been drawn sharply enough.

> The set of functions in PQcommMethods doesn't seem clean. They
> are chosen arbitrarily just so that other pq_* functions used in
> parallel workers will work as expected. I suppose that it needs
> some refactoring.

Yeah, exactly.

> Any work in this area is likely 10.0 material at this point.

I'm not really happy with that, since refactoring it again will create
back-patch hazards.  But I see that a lot of the mess here was created
in 9.5, which means we're probably stuck with back-patch hazards anyway.
        regards, tom lane

PS: "PQ" comes from PostQUEL, I believe.



Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Tom Lane
Date:
I wrote:
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
>> Any work in this area is likely 10.0 material at this point.

> I'm not really happy with that, since refactoring it again will create
> back-patch hazards.  But I see that a lot of the mess here was created
> in 9.5, which means we're probably stuck with back-patch hazards anyway.

I've pushed Kyotaro-san's original patch, which is clearly a bug fix.
I think the additional changes discussed later in this thread are
cosmetic, and probably should wait for a more general review of the
layering decisions in pqcomm.c.
        regards, tom lane



Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Kyotaro HORIGUCHI
Date:
At Fri, 29 Jul 2016 13:00:50 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <29430.1469811650@sss.pgh.pa.us>
> I wrote:
> > Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> >> Any work in this area is likely 10.0 material at this point.
> 
> > I'm not really happy with that, since refactoring it again will create
> > back-patch hazards.  But I see that a lot of the mess here was created
> > in 9.5, which means we're probably stuck with back-patch hazards anyway.
> 
> I've pushed Kyotaro-san's original patch, which is clearly a bug fix.
> I think the additional changes discussed later in this thread are
> cosmetic, and probably should wait for a more general review of the
> layering decisions in pqcomm.c.

Agreed. It's not an abstraction or API, but it actually works
with no problem and changing it is an issue obviously for later
discussion.

Anyway thank you for committing this.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Kyotaro HORIGUCHI
Date:
At Fri, 29 Jul 2016 11:58:04 -0400, Tom Lane <tgl@sss.pgh.pa.us> wrote in <14846.1469807884@sss.pgh.pa.us>
> Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> writes:
> > The many of the socket_* functions are required to be replaced
> > with mq_* functions for backgroud workers. So reverting the names
> > of socket_* functions should be accompanied by the need to, for
> > example, changing their callers to use them explicitly via
> > PqCommMethods, not hiding with macros, or renaming current pq_*
> > macros to, say, pqi_. (it stands for PQ-Indirect as a tentative)
> > I'm not confident on the new prefix since I'm not sure that what
> > the 'pq' stands for. (Postgres Query?)
> 
> Hmm.  Of course the problem with this approach is that we end up touching
> a whole bunch of call sites, which sort of negates the point of having
> installed a compatibility macro layer.
> 
> [ spends awhile longer looking around at this code ]
> 
> Maybe the real problem here is that the abstraction layer is so incomplete
> and apparently haphazard, so that it's not very obvious where you should
> use a pq_xxx name and where you should refer to socket_xxx.

Yes, 'haphazard' is the word I was seeking for.

> For some of
> the functions in pqcomm.c, such as internal_flush, it's impossible to tell
> which side of the abstraction boundary they're supposed to be on.
> (I suspect that that particular function has simply been ignored on the
> undocumented assumption that a bgworker could never call it, but that's
> the kind of half-baked work that I don't find acceptable.)
> 
> I think the core work that needs to be done here is to clearly identify
> each function in pqcomm.c as either above or below the PqCommMethods
> abstraction boundary.  Maybe we should split one set or the other out
> to a new source file.  In any case, the naming convention ought to
> reflect that hierarchy.

Agreed. I'm not sure if there's a clean boundary, though.

> I withdraw the idea that we should try to revert all these functions back
> to their old names, since that would obscure the layering distinction
> between socket-specific and general-usage functions.  I now think that
> the problem is that that distinction hasn't been drawn sharply enough.
> 
> > The set of functions in PQcommMethods doesn't seem clean. They
> > are chosen arbitrarily just so that other pq_* functions used in
> > parallel workers will work as expected. I suppose that it needs
> > some refactoring.
> 
> Yeah, exactly.
> 
> > Any work in this area is likely 10.0 material at this point.
> 
> I'm not really happy with that, since refactoring it again will create
> back-patch hazards.  But I see that a lot of the mess here was created
> in 9.5, which means we're probably stuck with back-patch hazards anyway.

I think we have an alternative not to backpatch that
refactoring. 9.5 works with it happily and won't get further
improovement in this area.

>             regards, tom lane
> 
> PS: "PQ" comes from PostQUEL, I believe.

Thanks, I've learned something new:)

https://en.wikipedia.org/wiki/QUEL_query_languages

> range of E is EMPLOYEE
> retrieve into W
> (COMP = E.Salary / (E.Age - 18))
> where E.Name = "Jones"

It looks more methematical or programming-language-like to me
than SQL.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





Re: Wrong defeinition of pq_putmessage_noblock since 9.5

From
Robert Haas
Date:
On Fri, Jul 29, 2016 at 11:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Maybe the real problem here is that the abstraction layer is so incomplete
> and apparently haphazard, so that it's not very obvious where you should
> use a pq_xxx name and where you should refer to socket_xxx.  For some of
> the functions in pqcomm.c, such as internal_flush, it's impossible to tell
> which side of the abstraction boundary they're supposed to be on.
> (I suspect that that particular function has simply been ignored on the
> undocumented assumption that a bgworker could never call it, but that's
> the kind of half-baked work that I don't find acceptable.)
>
> I think the core work that needs to be done here is to clearly identify
> each function in pqcomm.c as either above or below the PqCommMethods
> abstraction boundary.  Maybe we should split one set or the other out
> to a new source file.  In any case, the naming convention ought to
> reflect that hierarchy.
>
> I withdraw the idea that we should try to revert all these functions back
> to their old names, since that would obscure the layering distinction
> between socket-specific and general-usage functions.  I now think that
> the problem is that that distinction hasn't been drawn sharply enough.

If memory serves, there was some discussion of this at the time the
patch that changed this was originally submitted.  The original patch,
IIRC, just installed one or two hooks and it was argued that I should
instead create some kind of abstraction layer.  The present coding is
the result of my attempt to do just that.  I have to admit that I
wasn't very eager to churn the contents of this file more than
necessary.  It seems like old, crufty code to me.  I don't object if
you want to refactor it, but I'm not sure what problem it solves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company