Thread: Enumize logical replication message actions

Enumize logical replication message actions

From
Ashutosh Bapat
Date:
Hi All,
 Logical replication protocol uses single byte character to identify
different chunks of logical repliation messages. The code uses
character literals for the same. These literals are used as bare
constants in code as well. That's true for almost all the code that
deals with wire protocol. With that it becomes difficult to identify
the code which deals with a particular message. For example code that
deals with message type 'B'. In various protocol 'B' has different
meaning and it gets difficult and time consuming to differentiate one
usage from other and find all places which deal with one usage. Here's
a patch simplifying that for top level logical replication messages.

I think I have covered the places that need change. But I might have
missed something, given that these literals are used at several other
places (a problem this patch tries to fix :)).

Initially I had used #define for the same, but Peter E suggested using
Enums so that switch cases can detect any remaining items along with
stronger type checks.

Pavan offleast suggested to create a wrapper
pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
I will change that as well. Comments/suggestions welcome.

-- 
Best Wishes,
Ashutosh Bapat

Attachment

Re: Enumize logical replication message actions

From
Li Japin
Date:
> On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> 
> Hi All,
> Logical replication protocol uses single byte character to identify
> different chunks of logical repliation messages. The code uses
> character literals for the same. These literals are used as bare
> constants in code as well. That's true for almost all the code that
> deals with wire protocol. With that it becomes difficult to identify
> the code which deals with a particular message. For example code that
> deals with message type 'B'. In various protocol 'B' has different
> meaning and it gets difficult and time consuming to differentiate one
> usage from other and find all places which deal with one usage. Here's
> a patch simplifying that for top level logical replication messages.
> 
> I think I have covered the places that need change. But I might have
> missed something, given that these literals are used at several other
> places (a problem this patch tries to fix :)).
> 
> Initially I had used #define for the same, but Peter E suggested using
> Enums so that switch cases can detect any remaining items along with
> stronger type checks.
> 
> Pavan offleast suggested to create a wrapper
> pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
> pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
> I will change that as well. Comments/suggestions welcome.
> 
> -- 
> Best Wishes,
> Ashutosh Bapat
> <0001-Enumize-top-level-logical-replication-actions.patch>

What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
Those are also logical replication protocol message, I think.

--
Best regards
Japin Li


Re: Enumize logical replication message actions

From
Kyotaro Horiguchi
Date:
At Fri, 16 Oct 2020 08:08:40 +0000, Li Japin <japinli@hotmail.com> wrote in 
> 
> > On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
> > 
> > Hi All,
> > Logical replication protocol uses single byte character to identify
> > different chunks of logical repliation messages. The code uses
> > character literals for the same. These literals are used as bare
> > constants in code as well. That's true for almost all the code that
> > deals with wire protocol. With that it becomes difficult to identify
> > the code which deals with a particular message. For example code that
> > deals with message type 'B'. In various protocol 'B' has different
> > meaning and it gets difficult and time consuming to differentiate one
> > usage from other and find all places which deal with one usage. Here's
> > a patch simplifying that for top level logical replication messages.
> > 
> > I think I have covered the places that need change. But I might have
> > missed something, given that these literals are used at several other
> > places (a problem this patch tries to fix :)).
> > 
> > Initially I had used #define for the same, but Peter E suggested using
> > Enums so that switch cases can detect any remaining items along with
> > stronger type checks.
> > 
> > Pavan offleast suggested to create a wrapper
> > pg_send_logical_rep_message() on top of pg_sendbyte(), similarly for
> > pg_getmsgbyte(). I wanted to see if this change is acceptable. If so,
> > I will change that as well. Comments/suggestions welcome.
> > 
> > -- 
> > Best Wishes,
> > Ashutosh Bapat
> > <0001-Enumize-top-level-logical-replication-actions.patch>
> 
> What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
> Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not.  In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 16 Oct 2020 at 14:06, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Fri, 16 Oct 2020 08:08:40 +0000, Li Japin <japinli@hotmail.com> wrote in
>
> > On Oct 16, 2020, at 3:25 PM, Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:

>
> What about ’N’ for new tuples, ‘O’ for old tuple follows, ‘K’ for old key follows?
> Those are also logical replication protocol message, I think.

They are flags stored in a message so they can be seen as different
from the message type letters.

I think we converting those into macros/enums will help but for now I have tackled only the top level message types.
 

Anyway if the values are determined after some meaning, I'm not sure
enumerize them is good thing or not.  In other words, 'U' conveys
almost same amount of information with LOGICAL_REP_MSG_UPDATE in the
context of logical replcation protocol.

We have the same code pattern in PostgresMain and perhaps we don't
going to change them into enums.

That's exactly the problem I am trying to solve. Take for example 'B' as I have mentioned before. That string literal appears in 64 different places in the master branch. Which of those are the ones related to a "BEGIN" message in logical replication protocol is not clear, unless I thumb through each of those. In PostgresMain it's used to indicate a BIND message. Which of those 64 instances are also using 'B' to mean a bind message? Using enums or macros makes it clear. Just look up LOGICAL_REP_MSG_BEGIN. Converting all 'B' to their respective macros will help but might be problematic for back-patching. So that's arguable. But doing that in something as new as logical replication will be helpful, before it gets too late to change that.

Further logical repliation protocol is using the same literal e.g. 'O' to mean origin in some places and old tuple in some other. While comments there help, it's not easy to locate all the code that deals with one meaning or the other. This change will help with that. Another reason as to why logical replication.
--
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 16, 2020 at 12:55 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Hi All,
>  Logical replication protocol uses single byte character to identify
> different chunks of logical repliation messages. The code uses
> character literals for the same. These literals are used as bare
> constants in code as well. That's true for almost all the code that
> deals with wire protocol. With that it becomes difficult to identify
> the code which deals with a particular message. For example code that
> deals with message type 'B'. In various protocol 'B' has different
> meaning and it gets difficult and time consuming to differentiate one
> usage from other and find all places which deal with one usage. Here's
> a patch simplifying that for top level logical replication messages.
>

+1. I think this will make the code easier to read and understand. I
think it would be good to do this in some other parts as well but
starting with logical replication is a good idea as that area is still
evolving.

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Andres Freund
Date:
Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> Here's a patch simplifying that for top level logical replication
> messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type.


>  void
>  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
>  {
> -    pq_sendbyte(out, 'B');        /* BEGIN */
> +    pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);        /* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.


Greetings,

Andres Freund



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:
Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your comments.

On Tue, 20 Oct 2020 at 04:57, Andres Freund <andres@anarazel.de> wrote:
Hi,

On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> Here's a patch simplifying that for top level logical replication
> messages.

I think that's a good plan. One big benefit for me is that it's much
easier to search for an enum than for a single letter
constant. Including searching for all the places that deal with any sort
of logical rep message type. 


>  void
>  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
>  {
> -     pq_sendbyte(out, 'B');          /* BEGIN */
> +     pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);                /* BEGIN */

I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.

Yes. Fixed all places.

I have attached two places - 0001 which is previous 0001 patch with your comments addressed.

0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and receive a logical replication message type respectively. These wrappers add more protection to make sure that the enum definitions fit one byte. This also removes the default case from apply_dispatch() so that we can detect any LogicalRepMsgType not handled by that function.

These two patches are intended to be committed together as a single commit. For now the second one is separate so that it's easy to remove the changes if they are not acceptable.

--
Best Wishes,
Ashutosh
Attachment

Re: Enumize logical replication message actions

From
Kyotaro Horiguchi
Date:
At Thu, 22 Oct 2020 12:13:40 +0530, Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote in 
> Thanks Andres for your review. Thanks Li, Horiguchi-san and Amit for your
> comments.
> 
> On Tue, 20 Oct 2020 at 04:57, Andres Freund <andres@anarazel.de> wrote:
> 
> > Hi,
> >
> > On 2020-10-16 12:55:26 +0530, Ashutosh Bapat wrote:
> > > Here's a patch simplifying that for top level logical replication
> > > messages.
> >
> > I think that's a good plan. One big benefit for me is that it's much
> > easier to search for an enum than for a single letter
> > constant. Including searching for all the places that deal with any sort
> > of logical rep message type.
> 
> 
> >
> > >  void
> > >  logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
> > >  {
> > > -     pq_sendbyte(out, 'B');          /* BEGIN */
> > > +     pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);                /* BEGIN */
> >
> > I think if we have the LOGICAL_REP_MSG_BEGIN we don't need the /* BEGIN */.
> >
> 
> Yes. Fixed all places.
> 
> I have attached two places - 0001 which is previous 0001 patch with your
> comments addressed.

We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
    switch (action)
    {
       ....
        case LOGICAL_REP_MSG_STREAM_COMMIT(s);
           apply_handle_stream_commit(s);
           return;
    }

    ereport(ERROR, ...);
}    

> 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
> receive a logical replication message type respectively. These wrappers add
> more protection to make sure that the enum definitions fit one byte. This
> also removes the default case from apply_dispatch() so that we can detect
> any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
something like that we shouldn't do this refactoring, I think.

pg_get_logicalrep_msg_type() seems doing the same check (that the
value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?


> These two patches are intended to be committed together as a single commit.
> For now the second one is separate so that it's easy to remove the changes
> if they are not acceptable.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:


We shouldn't have the default: in the switch() block in
apply_dispatch(). That prevents compilers from checking
completeness. The content of the default: should be moved out to after
the switch() block.

apply_dispatch()
{
    switch (action)
        {
           ....
            case LOGICAL_REP_MSG_STREAM_COMMIT(s);
                   apply_handle_stream_commit(s);
                   return;
    }

    ereport(ERROR, ...);
}   

> 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send and
> receive a logical replication message type respectively. These wrappers add
> more protection to make sure that the enum definitions fit one byte. This
> also removes the default case from apply_dispatch() so that we can detect
> any LogicalRepMsgType not handled by that function.

pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
something like that we shouldn't do this refactoring, I think.

Enum is an integer, and we want to send byte. The function asserts that the enum fits a byte. If there's a way to declare byte long enums I would use that. But I didn't find a way to do that.


pg_get_logicalrep_msg_type() seems doing the same check (that the
value is compared aganst every keyword value) with
apply_dispatch(). Why do we need that function separately from
apply_dispatch()?


The second patch removes the default case you quoted above. I think that's important to detect any unhandled case at compile time rather than at run time. But we need some way to detect whether the values we get from wire are legit. pg_get_logicalrep_msg_type() does that. Further that function can be used at places other than apply_dispatch() if required without each of those places having their own validation.

--
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

From
Kyotaro Horiguchi
Date:
At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote in 
> On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:
> 
> >
> >
> > We shouldn't have the default: in the switch() block in
> > apply_dispatch(). That prevents compilers from checking
> > completeness. The content of the default: should be moved out to after
> > the switch() block.
> >
> > apply_dispatch()
> > {
> >     switch (action)
> >         {
> >            ....
> >             case LOGICAL_REP_MSG_STREAM_COMMIT(s);
> >                    apply_handle_stream_commit(s);
> >                    return;
> >     }
> >
> >     ereport(ERROR, ...);
> > }
> >
> > > 0002 adds wrappers on top of pq_sendbyte() and pq_getmsgbyte() to send
> > and
> > > receive a logical replication message type respectively. These wrappers
> > add
> > > more protection to make sure that the enum definitions fit one byte. This
> > > also removes the default case from apply_dispatch() so that we can detect
> > > any LogicalRepMsgType not handled by that function.
> >
> > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > something like that we shouldn't do this refactoring, I think.
> >
> 
> Enum is an integer, and we want to send byte. The function asserts that the
> enum fits a byte. If there's a way to declare byte long enums I would use
> that. But I didn't find a way to do that.

That way of defining enums can contain two different symbols with the
same value. If we need to check the values are actually in the range
of char, checking duplicate values has more importance from the
standpoint of likelihood.

AFAICS there're two instances of this kind of enums, CoreceionMethod
and TypeCat. None of them are not checked for width nor duplicates
when they are used.

Even if we need such a kind of check, it souldn't be a wrapper
function that adds costs on non-assertion builds, but a replacing of
pq_sendbyte() done only on USE_ASSERT_CHECKING.

> pg_get_logicalrep_msg_type() seems doing the same check (that the
> > value is compared aganst every keyword value) with
> > apply_dispatch(). Why do we need that function separately from
> > apply_dispatch()?
> >
> >
> The second patch removes the default case you quoted above. I think that's
> important to detect any unhandled case at compile time rather than at run
> time. But we need some way to detect whether the values we get from wire
> are legit. pg_get_logicalrep_msg_type() does that. Further that function
> can be used at places other than apply_dispatch() if required without each
> of those places having their own validation.

Even if that enum contains out-of-range values, that "command" is sent
having truncated to uint8 and on the receiver side apply_dispatch()
doesn't identify the command and raises an error.  That is equivalent
to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
point that symbols that are not used in regression are not checked.)

reagrds.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Enumize logical replication message actions

From
Kyotaro Horiguchi
Date:
At Fri, 23 Oct 2020 10:08:44 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> At Thu, 22 Oct 2020 16:37:18 +0530, Ashutosh Bapat <ashutosh.bapat@2ndquadrant.com> wrote in 
> > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > wrote:
> > pg_get_logicalrep_msg_type() seems doing the same check (that the
> > > value is compared aganst every keyword value) with
> > > apply_dispatch(). Why do we need that function separately from
> > > apply_dispatch()?
> > >
> > >
> > The second patch removes the default case you quoted above. I think that's
> > important to detect any unhandled case at compile time rather than at run
> > time. But we need some way to detect whether the values we get from wire
> > are legit. pg_get_logicalrep_msg_type() does that. Further that function
> > can be used at places other than apply_dispatch() if required without each
> > of those places having their own validation.
> 
> Even if that enum contains out-of-range values, that "command" is sent
> having truncated to uint8 and on the receiver side apply_dispatch()
> doesn't identify the command and raises an error.  That is equivalent
> to what pq_send_logicalrep_msg_type() does. (Also equivalent on the
> point that symbols that are not used in regression are not checked.)

Sorry, this is about pg_send_logicalrep_msg_type(), not
pg_get..(). And I forgot to mention pg_get_logicalrep_msg_type().

For the pg_get_logicalrep_msg_type(), It is just a repetion of what
apply_displatch() does in switch().

If I flattened the code, it looks like:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);
  bool pass = false;

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
       pass = true;
  }
  if (!pass)
     ereport(ERROR, (errmsg("invalid logical replication message type"..

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
        apply_handle_begin();
        break;
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
        apply_handle_begin();
        break;
  }   
}     

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
        apply_handle_begin();
!       return;
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
        apply_handle_begin();
!       return;
  }

  ereport(ERROR, (errmsg("invalid logical replication message type"..
}     
     
which is smaller and fast.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Enumize logical replication message actions

From
Alvaro Herrera
Date:
On 2020-Oct-22, Ashutosh Bapat wrote:

> On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> wrote:

> > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > something like that we shouldn't do this refactoring, I think.
> 
> Enum is an integer, and we want to send byte. The function asserts that the
> enum fits a byte. If there's a way to declare byte long enums I would use
> that. But I didn't find a way to do that.

I didn't look at the code, but maybe it's sufficient to add a
StaticAssert?



Re: Enumize logical replication message actions

From
Kyotaro Horiguchi
Date:
At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2020-Oct-22, Ashutosh Bapat wrote:
> 
> > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > wrote:
> 
> > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > something like that we shouldn't do this refactoring, I think.
> > 
> > Enum is an integer, and we want to send byte. The function asserts that the
> > enum fits a byte. If there's a way to declare byte long enums I would use
> > that. But I didn't find a way to do that.
> 
> I didn't look at the code, but maybe it's sufficient to add a
> StaticAssert?

That check needs to visit all symbols in a enum and confirm that each
of them is in a certain range.

I thought of StaticAssert, but it cannot run a code and I don't know
of a syntax that loops through all symbols in a enumeration so I think
we needs to write a static assertion on every symbol in the
enumeration, which seems to be a kind of stupid.

enum hoge
{
  a = '1',
  b = '2',
  c = '3'
};

StaticAssertDecl((unsigned int)(a | b | c ...) <= 0xff, "too large symbol value");

I didn't come up with a way to apply static assertion on each symbol
definition line.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Enumize logical replication message actions

From
Peter Smith
Date:
On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.

The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

The comment in the LogicalRepMsgType enum will sufficiently ensure
nobody is going to accidentally add any bad replication message codes.
And it's not like these are going to be changed often.

Why not simply downcast your enums when calling pq_sendbyte?
There are only a few of them.

e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

Kind Regards.
Peter Smith
Fujitsu Australia.



Re: Enumize logical replication message actions

From
Kyotaro Horiguchi
Date:
At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith <smithpb2250@gmail.com> wrote in 
> On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts that the
> > > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > > that. But I didn't find a way to do that.
> 
> The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

> The comment in the LogicalRepMsgType enum will sufficiently ensure
> nobody is going to accidentally add any bad replication message codes.
> And it's not like these are going to be changed often.

Agreed.

> Why not simply downcast your enums when calling pq_sendbyte?
> There are only a few of them.
> 
> e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 23 Oct 2020 at 06:50, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:

Those two switch()es are apparently redundant. That code is exactly
equivalent to:

apply_dispatch(s)
{
  LogicalRepMsgType msgtype = pq_getmsgtype(s);

  switch (msgtype)
  {
     case LOGICAL_REP_MSG_BEGIN:
        apply_handle_begin();
!       return;
     ...
     case LOGICAL_REP_MSG_STREAM_COMMIT:
        apply_handle_begin();
!       return;
  }

  ereport(ERROR, (errmsg("invalid logical replication message type"..
}     

which is smaller and fast.

Good idea. Implemented in the latest patch posted with the next mail. 

--
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 23 Oct 2020 at 17:02, Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
At Fri, 23 Oct 2020 19:53:00 +1100, Peter Smith <smithpb2250@gmail.com> wrote in
> On Fri, Oct 23, 2020 at 5:20 PM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
> > > On 2020-Oct-22, Ashutosh Bapat wrote:
> > >
> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > > wrote:
> > >
> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > > something like that we shouldn't do this refactoring, I think.
> > > >
> > > > Enum is an integer, and we want to send byte. The function asserts that the
> > > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > > that. But I didn't find a way to do that.
>
> The pq_send_logicalrep_msg_type() function seemed a bit overkill to me.

Ah, yes, it is what I meant. I didn't come up with the word "overkill".

> The comment in the LogicalRepMsgType enum will sufficiently ensure
> nobody is going to accidentally add any bad replication message codes.
> And it's not like these are going to be changed often.

Agreed.

> Why not simply downcast your enums when calling pq_sendbyte?
> There are only a few of them.
>
> e.g. pq_sendbyte(out, (uint8)LOGICAL_REP_MSG_STREAM_COMMIT);

If you are worried about compiler warning, that explicit cast is not
required. Even if the symbol is larger than 0xff, the upper bytes are
silently truncated off.


I agree with Peter that the prologue of  LogicalRepMsgType is enough.

I also agree with Kyotaro, that explicit cast is unnecessary.

All this together makes the second patch useless. Removed it. Instead used Kyotaro's idea in previous mail.

PFA updated patch.

--
Best Wishes,
Ashutosh
Attachment

Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.
> >
> > I didn't look at the code, but maybe it's sufficient to add a
> > StaticAssert?
>
> That check needs to visit all symbols in a enum and confirm that each
> of them is in a certain range.
>

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 23 Oct 2020 at 18:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
> > On 2020-Oct-22, Ashutosh Bapat wrote:
> >
> > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
> > > wrote:
> >
> > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
> > > > something like that we shouldn't do this refactoring, I think.
> > >
> > > Enum is an integer, and we want to send byte. The function asserts that the
> > > enum fits a byte. If there's a way to declare byte long enums I would use
> > > that. But I didn't find a way to do that.
> >
> > I didn't look at the code, but maybe it's sufficient to add a
> > StaticAssert?
>
> That check needs to visit all symbols in a enum and confirm that each
> of them is in a certain range.
>

Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
indicating this is a fake message and must be the last one) as the
last and just check that?


I don't think that's required once I applied suggestions from Kyotaro and Peter. Please check the latest patch. 
Usually LAST is added to an enum when we need to cap the number of symbols or want to find the number of symbols. None of that is necessary here. Do you see any other use?

--
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 23, 2020 at 6:26 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
>
>
>
> On Fri, 23 Oct 2020 at 18:23, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>> On Fri, Oct 23, 2020 at 11:50 AM Kyotaro Horiguchi
>> <horikyota.ntt@gmail.com> wrote:
>> >
>> > At Thu, 22 Oct 2020 22:31:41 -0300, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
>> > > On 2020-Oct-22, Ashutosh Bapat wrote:
>> > >
>> > > > On Thu, 22 Oct 2020 at 14:46, Kyotaro Horiguchi <horikyota.ntt@gmail.com>
>> > > > wrote:
>> > >
>> > > > > pg_send_logicalrep_msg_type() looks somewhat too-much.  If we need
>> > > > > something like that we shouldn't do this refactoring, I think.
>> > > >
>> > > > Enum is an integer, and we want to send byte. The function asserts that the
>> > > > enum fits a byte. If there's a way to declare byte long enums I would use
>> > > > that. But I didn't find a way to do that.
>> > >
>> > > I didn't look at the code, but maybe it's sufficient to add a
>> > > StaticAssert?
>> >
>> > That check needs to visit all symbols in a enum and confirm that each
>> > of them is in a certain range.
>> >
>>
>> Can we define something like LOGICAL_REP_MSG_LAST (also add a comment
>> indicating this is a fake message and must be the last one) as the
>> last and just check that?
>>
>
> I don't think that's required once I applied suggestions from Kyotaro and Peter. Please check the latest patch.
> Usually LAST is added to an enum when we need to cap the number of symbols or want to find the number of symbols.
Noneof that is necessary here. Do you see any other use?
 
>

You mentioned in the beginning that you prefer to use Enum instead of
define so that switch cases can detect any remaining items but I have
tried adding extra enum value at the end and didn't handle that in
switch case but I didn't get any compilation warning or error. Do we
need something else to detect that at compile time?

Some comments assuming we want to use enum either because I am missing
something or due to some other reason we have not discussed yet.

1.
+ LOGICAL_REP_MSG_STREAM_ABORT = 'A',
+} LogicalRepMsgType;

There is no need for a comma after the last message.

2.
+/*
+ * Logical message types
+ *
+ * Used by logical replication wire protocol.
+ *
+ * Note: though this is an enum it should fit a single byte and should be a
+ * printable character.
+ */
+typedef enum
+{

I think we can expand the comments to probably say why we need these
to fit in a single byte or what problem it can cause if that rule is
disobeyed. This is to make the next person clear why we are imposing
such a rule.

3.
+typedef enum
+{
..
+} LogicalRepMsgType;

There are places in code where we use the enum name
(LogicalRepMsgType) both in the start and end. See TypeCat,
CoercionMethod, CoercionCodes, etc. I see places where we use the way
you have in the code. I would prefer the way we have used at places
like TypeCat as that makes it easier to read.

4.
  switch (action)
  {
- /* BEGIN */
- case 'B':
+ case LOGICAL_REP_MSG_BEGIN:
  apply_handle_begin(s);
- break;
- /* COMMIT */
- case 'C':
+ return;

I think we can simply use 'return apply_handle_begin;' instead of
adding return in another line. Again, I think we changed this handling
in apply_dispatch() to improve the case where we can detect at the
compile time any missing enum but at this stage it is not clear to me
if that is true.

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Peter Smith
Date:
On Fri, Oct 30, 2020 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

Hi Amit

> You mentioned in the beginning that you prefer to use Enum instead of
> define so that switch cases can detect any remaining items but I have
> tried adding extra enum value at the end and didn't handle that in
> switch case but I didn't get any compilation warning or error. Do we
> need something else to detect that at compile time?

See [1] some GCC compiler options that can expose missing cases like those.

e.g. use -Wswitch or -Wswitch-enum
Detection depends if the switch has a default case or not.

> 4.
>   switch (action)
>   {
> - /* BEGIN */
> - case 'B':
> + case LOGICAL_REP_MSG_BEGIN:
>   apply_handle_begin(s);
> - break;
> - /* COMMIT */
> - case 'C':
> + return;
>
> I think we can simply use 'return apply_handle_begin;' instead of
> adding return in another line. Again, I think we changed this handling
> in apply_dispatch() to improve the case where we can detect at the
> compile time any missing enum but at this stage it is not clear to me
> if that is true.

IIUC getting rid of the default from the switch can make the missing
enum detection easier because then you can use -Wswitch option to
expose the problem (instead of -Wswitch-enum which may give lots of
false positives as well)

===

[1]  https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Wswitch

Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 30, 2020 at 10:37 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Fri, Oct 30, 2020 at 2:46 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Hi Amit
>
> > You mentioned in the beginning that you prefer to use Enum instead of
> > define so that switch cases can detect any remaining items but I have
> > tried adding extra enum value at the end and didn't handle that in
> > switch case but I didn't get any compilation warning or error. Do we
> > need something else to detect that at compile time?
>
> See [1] some GCC compiler options that can expose missing cases like those.
>

Thanks, I am able to see the warnings now.

> e.g. use -Wswitch or -Wswitch-enum
> Detection depends if the switch has a default case or not.
>
> > 4.
> >   switch (action)
> >   {
> > - /* BEGIN */
> > - case 'B':
> > + case LOGICAL_REP_MSG_BEGIN:
> >   apply_handle_begin(s);
> > - break;
> > - /* COMMIT */
> > - case 'C':
> > + return;
> >
> > I think we can simply use 'return apply_handle_begin;' instead of
> > adding return in another line. Again, I think we changed this handling
> > in apply_dispatch() to improve the case where we can detect at the
> > compile time any missing enum but at this stage it is not clear to me
> > if that is true.
>
> IIUC getting rid of the default from the switch can make the missing
> enum detection easier because then you can use -Wswitch option to
> expose the problem (instead of -Wswitch-enum which may give lots of
> false positives as well)
>

Fair enough. So, it makes sense to move the default out of the switch case.

Ashutosh, see if we can add in comments (or may be commit message) why
we preferred to use enum for these messages.

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 30, 2020 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 30, 2020 at 10:37 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > IIUC getting rid of the default from the switch can make the missing
> > enum detection easier because then you can use -Wswitch option to
> > expose the problem (instead of -Wswitch-enum which may give lots of
> > false positives as well)
> >
>
> Fair enough. So, it makes sense to move the default out of the switch case.
>

One more thing I was thinking about this patch was whether it has any
impact w.r.t to Endianness as we are using four-bytes to represent
one-byte and it seems there is no issue with that because pq_sendbyte
accepts just one-byte and sends that over the network. So, we could
see a problem only if we use any enum value which is more than
one-byte which we are anyway adding a warning message along with the
definition of enum. So, we are safe here. Does that make sense?

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 30 Oct 2020 at 09:16, Amit Kapila <amit.kapila16@gmail.com> wrote

1.
+ LOGICAL_REP_MSG_STREAM_ABORT = 'A',
+} LogicalRepMsgType;

There is no need for a comma after the last message.

Done. Thanks for noticing it. 
 
2.
+/*
+ * Logical message types
+ *
+ * Used by logical replication wire protocol.
+ *
+ * Note: though this is an enum it should fit a single byte and should be a
+ * printable character.
+ */
+typedef enum
+{

I think we can expand the comments to probably say why we need these
to fit in a single byte or what problem it can cause if that rule is
disobeyed. This is to make the next person clear why we are imposing
such a rule.

Done. Please check.
 

3.
+typedef enum
+{
..
+} LogicalRepMsgType;

There are places in code where we use the enum name
(LogicalRepMsgType) both in the start and end. See TypeCat,
CoercionMethod, CoercionCodes, etc. I see places where we use the way
you have in the code. I would prefer the way we have used at places
like TypeCat as that makes it easier to read.

Not my favourite style since changing the type name requires changing enum name to keep those consistent. But anyway done.
 

4.
  switch (action)
  {
- /* BEGIN */
- case 'B':
+ case LOGICAL_REP_MSG_BEGIN:
  apply_handle_begin(s);
- break;
- /* COMMIT */
- case 'C':
+ return;

I think we can simply use 'return apply_handle_begin;' instead of
adding return in another line. Again, I think we changed this handling
in apply_dispatch() to improve the case where we can detect at the
compile time any missing enum but at this stage it is not clear to me
if that is true.
 
I don't see much value in writing it like "return apply_handle_begin()"; gives an impression that apply_handle_begin() and apply_dispatch() are returning something which they are not. I would prefer return on separate line unless there's something more than style improvement.

I have added rationale behind Enum in the commit message as you suggested in one of the later mails.

PFA patch addressing your comments.
--
Best Wishes,
Ashutosh
Attachment

Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 30 Oct 2020 at 14:59, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 30, 2020 at 11:50 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Fri, Oct 30, 2020 at 10:37 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > IIUC getting rid of the default from the switch can make the missing
> > enum detection easier because then you can use -Wswitch option to
> > expose the problem (instead of -Wswitch-enum which may give lots of
> > false positives as well)
> >
>
> Fair enough. So, it makes sense to move the default out of the switch case.
>

One more thing I was thinking about this patch was whether it has any
impact w.r.t to Endianness as we are using four-bytes to represent
one-byte and it seems there is no issue with that because pq_sendbyte
accepts just one-byte and sends that over the network. So, we could
see a problem only if we use any enum value which is more than
one-byte which we are anyway adding a warning message along with the
definition of enum. So, we are safe here. Does that make sense?


yes. Endian-ness should be handled by the compiler transparently. 

--
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 30, 2020 at 5:05 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
>
>
>
> On Fri, 30 Oct 2020 at 09:16, Amit Kapila <amit.kapila16@gmail.com> wrote
>>
>> I think we can simply use 'return apply_handle_begin;' instead of
>> adding return in another line. Again, I think we changed this handling
>> in apply_dispatch() to improve the case where we can detect at the
>> compile time any missing enum but at this stage it is not clear to me
>> if that is true.
>
>
> I don't see much value in writing it like "return apply_handle_begin()"; gives an impression that
apply_handle_begin()and apply_dispatch() are returning something which they are not. I would prefer return on separate
lineunless there's something more than style improvement. 
>

Fair enough.

> I have added rationale behind Enum in the commit message as you suggested in one of the later mails.
>
> PFA patch addressing your comments.
>

I don't like the word 'Enumize' in commit message. How about changing
it to something like: (a) Add defines for logical replication protocol
messages, or (b) Associate names with logical replication protocol
messages.

+ 2. It's easy to locate the code handling a given type.

In the above instead of 'type', shouldn't it be 'message'.

Other than that the patch looks good to me.

--
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Fri, 30 Oct 2020 at 17:37, Amit Kapila <amit.kapila16@gmail.com> wrote:

I don't like the word 'Enumize' in commit message. How about changing
it to something like: (a) Add defines for logical replication protocol
messages, or (b) Associate names with logical replication protocol
messages.

I have used  "Use Enum for top level logical replication message types" in the attached patch. But please free to use (a) if you feel so.
 

+ 2. It's easy to locate the code handling a given type.

In the above instead of 'type', shouldn't it be 'message'.

Used "message type". But please feel free to use "message" if you think that's appropriate.
 

Other than that the patch looks good to me.


Patch with updated commit message and also the list of reviewers  

--
Best Wishes,
Ashutosh
Attachment

Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Fri, Oct 30, 2020 at 5:52 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
>
> On Fri, 30 Oct 2020 at 17:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>>
>> Other than that the patch looks good to me.
>>
>
> Patch with updated commit message and also the list of reviewers
>

Thanks, pushed!

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:
Thanks Amit.

On Mon, 2 Nov 2020 at 14:15, Amit Kapila <amit.kapila16@gmail.com> wrote:
On Fri, Oct 30, 2020 at 5:52 PM Ashutosh Bapat
<ashutosh.bapat@2ndquadrant.com> wrote:
>
> On Fri, 30 Oct 2020 at 17:37, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>
>>
>> Other than that the patch looks good to me.
>>
>
> Patch with updated commit message and also the list of reviewers
>

Thanks, pushed!

--
With Regards,
Amit Kapila.


--
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

From
Peter Smith
Date:
Hi Hackers.

Last month there was a commit [1] for replacing logical replication
message type characters with enums of equivalent values.

I was revisiting this code recently and I think due to oversight that
initial patch was incomplete. IIUC there are several more enum
substitutions which should have been made.

PSA my patch which adds those missing substitutions.

---

[1] https://github.com/postgres/postgres/commit/644f0d7cc9c2cb270746f2024c706554e0fbec82

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Wed, Nov 25, 2020 at 2:26 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> Hi Hackers.
>
> Last month there was a commit [1] for replacing logical replication
> message type characters with enums of equivalent values.
>
> I was revisiting this code recently and I think due to oversight that
> initial patch was incomplete. IIUC there are several more enum
> substitutions which should have been made.
>
> PSA my patch which adds those missing substitutions.
>

Good catch. I'll review it in a day or so.

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Amit Kapila
Date:
On Wed, Nov 25, 2020 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 25, 2020 at 2:26 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Hackers.
> >
> > Last month there was a commit [1] for replacing logical replication
> > message type characters with enums of equivalent values.
> >
> > I was revisiting this code recently and I think due to oversight that
> > initial patch was incomplete. IIUC there are several more enum
> > substitutions which should have been made.
> >
> > PSA my patch which adds those missing substitutions.
> >
>
> Good catch. I'll review it in a day or so.
>

The patch looks good to me and I have pushed it.

-- 
With Regards,
Amit Kapila.



Re: Enumize logical replication message actions

From
Ashutosh Bapat
Date:


On Thu, Nov 26, 2020 at 10:15 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Wed, Nov 25, 2020 at 2:52 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Nov 25, 2020 at 2:26 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > Hi Hackers.
> >
> > Last month there was a commit [1] for replacing logical replication
> > message type characters with enums of equivalent values.
> >
> > I was revisiting this code recently and I think due to oversight that
> > initial patch was incomplete. IIUC there are several more enum
> > substitutions which should have been made.
> >
> > PSA my patch which adds those missing substitutions.
> >
>
> Good catch. I'll review it in a day or so.
>

The patch looks good to me and I have pushed it.

Thanks Peter and Amit for noticing the missing substitutions and fixing those.  

--
Best Wishes,
Ashutosh