Thread: Simplify some codes in pgoutput

Simplify some codes in pgoutput

From
"houzj.fnst@fujitsu.com"
Date:
Hi,

I noticed that there are some duplicated codes in pgoutput_change() function
which can be simplified, and here is an attempt to do that.

Best Regards,
Hou Zhijie


Attachment

Re: Simplify some codes in pgoutput

From
Amit Kapila
Date:
On Wed, Mar 15, 2023 at 2:00 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> I noticed that there are some duplicated codes in pgoutput_change() function
> which can be simplified, and here is an attempt to do that.
>

For REORDER_BUFFER_CHANGE_DELETE, when the old tuple is missing, after
this patch, we will still send BEGIN and do OutputPluginWrite, etc.
Also, it will try to perform row_filter when none of old_slot or
new_slot is set. I don't know for which particular case we have s
handling missing old tuples for deletes but that might require changes
in your proposed patch.

--
With Regards,
Amit Kapila.



Re: Simplify some codes in pgoutput

From
Peter Smith
Date:
On Wed, Mar 15, 2023 at 7:30 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> Hi,
>
> I noticed that there are some duplicated codes in pgoutput_change() function
> which can be simplified, and here is an attempt to do that.
>
> Best Regards,
> Hou Zhijie

Hi Hou-san.

I had a quick look at the 0001 patch.

Here are some first comments.

======

1.
+ if (relentry->attrmap)
+ old_slot = execute_attr_map_slot(relentry->attrmap, old_slot,
+ MakeTupleTableSlot(RelationGetDescr(targetrel),
+ &TTSOpsVirtual));

1a.
IMO maybe it was more readable before when there was a separate
'tupdesc' variable, instead of trying to squeeze too much into one
statement.

1b.
Should you retain the old comments that said "/* Convert tuple if needed. */"

~~~

2.
- if (old_slot)
- old_slot = execute_attr_map_slot(relentry->attrmap,
- old_slot,
- MakeTupleTableSlot(tupdesc, &TTSOpsVirtual));

The original code for REORDER_BUFFER_CHANGE_UPDATE was checking "if
(old_slot)" but that check seems no longer present. Is it OK?

~~~

3.
- /*
- * Send BEGIN if we haven't yet.
- *
- * We send the BEGIN message after ensuring that we will actually
- * send the change. This avoids sending a pair of BEGIN/COMMIT
- * messages for empty transactions.
- */

That original longer comment has been replaced with just "/* Send
BEGIN if we haven't yet */". Won't it be better to retain the more
informative longer comment?

~~~

4.
+
+cleanup:
  if (RelationIsValid(ancestor))
  {
  RelationClose(ancestor);

~

Since you've introduced a new label 'cleanup:' then IMO you can remove
that old comment "/* Cleanup */".

------
Kind Regards,
Peter Smith.
Fujitsu Australia



RE: Simplify some codes in pgoutput

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, March 16, 2023 12:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:

> 
> On Wed, Mar 15, 2023 at 2:00 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > I noticed that there are some duplicated codes in pgoutput_change()
> function
> > which can be simplified, and here is an attempt to do that.
> >
> 
> For REORDER_BUFFER_CHANGE_DELETE, when the old tuple is missing, after
> this patch, we will still send BEGIN and do OutputPluginWrite, etc.
> Also, it will try to perform row_filter when none of old_slot or
> new_slot is set. I don't know for which particular case we have s
> handling missing old tuples for deletes but that might require changes
> in your proposed patch.

I researched this a bit. I think the old tuple will be null only if the
modified table doesn't have PK or RI when the DELETE happens (referred to
the heap_delete()), but in that case the DELETE won't be allowed to be
replicated(e.g. the DELETE will either error out or be filtered by table level
filter in pgoutput_change).

I also checked this for system table and in that case it is null but
reorderbuffer doesn't forward it. For user_catalog_table, similarily, the
DELETE should be filtered by table filter in pgoutput_change as well.

So, I think we can remove this check and log.
And here is the new version patch which removes that for now.

Best Regards,
Hou zj

Attachment

RE: Simplify some codes in pgoutput

From
"houzj.fnst@fujitsu.com"
Date:
On Friday, March 17, 2023 11:49 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> On Wed, Mar 15, 2023 at 7:30 PM houzj.fnst@fujitsu.com
> <houzj.fnst@fujitsu.com> wrote:
> >
> > Hi,
> >
> > I noticed that there are some duplicated codes in pgoutput_change()
> > function which can be simplified, and here is an attempt to do that.
> 
> Hi Hou-san.
> 
> I had a quick look at the 0001 patch.
> 
> Here are some first comments.

Thanks for the comments.

> 
> ======
> 
> 1.
> + if (relentry->attrmap)
> + old_slot = execute_attr_map_slot(relentry->attrmap, old_slot,
> + MakeTupleTableSlot(RelationGetDescr(targetrel),
> + &TTSOpsVirtual));
> 
> 1a.
> IMO maybe it was more readable before when there was a separate 'tupdesc'
> variable, instead of trying to squeeze too much into one statement.
> 
> 1b.
> Should you retain the old comments that said "/* Convert tuple if needed. */"

Added.

> ~~~
> 
> 2.
> - if (old_slot)
> - old_slot = execute_attr_map_slot(relentry->attrmap,
> - old_slot,
> - MakeTupleTableSlot(tupdesc, &TTSOpsVirtual));
> 
> The original code for REORDER_BUFFER_CHANGE_UPDATE was checking "if
> (old_slot)" but that check seems no longer present. Is it OK?

I think the logic is the same.

> 
> ~~~
> 
> 3.
> - /*
> - * Send BEGIN if we haven't yet.
> - *
> - * We send the BEGIN message after ensuring that we will actually
> - * send the change. This avoids sending a pair of BEGIN/COMMIT
> - * messages for empty transactions.
> - */
> 
> That original longer comment has been replaced with just "/* Send BEGIN if we
> haven't yet */". Won't it be better to retain the more informative longer
> comment?

Added.

> ~~~
> 
> 4.
> +
> +cleanup:
>   if (RelationIsValid(ancestor))
>   {
>   RelationClose(ancestor);
> 
> ~
> 
> Since you've introduced a new label 'cleanup:' then IMO you can remove that
> old comment "/* Cleanup */".
> 
Removed.

Best Regards,
Hou zj

RE: Simplify some codes in pgoutput

From
"houzj.fnst@fujitsu.com"
Date:
On Monday, March 20, 2023 5:20  PMhouzj.fnst@fujitsu.com wrote:
> 
> On Thursday, March 16, 2023 12:30 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> 
> >
> > On Wed, Mar 15, 2023 at 2:00 PM houzj.fnst@fujitsu.com
> > <houzj.fnst@fujitsu.com> wrote:
> > >
> > > I noticed that there are some duplicated codes in pgoutput_change()
> > function
> > > which can be simplified, and here is an attempt to do that.
> > >
> >
> > For REORDER_BUFFER_CHANGE_DELETE, when the old tuple is missing, after
> > this patch, we will still send BEGIN and do OutputPluginWrite, etc.
> > Also, it will try to perform row_filter when none of old_slot or
> > new_slot is set. I don't know for which particular case we have s
> > handling missing old tuples for deletes but that might require changes
> > in your proposed patch.
> 
> I researched this a bit. I think the old tuple will be null only if the modified table
> doesn't have PK or RI when the DELETE happens (referred to the heap_delete()),
> but in that case the DELETE won't be allowed to be replicated(e.g. the DELETE
> will either error out or be filtered by table level filter in pgoutput_change).
> 
> I also checked this for system table and in that case it is null but reorderbuffer
> doesn't forward it. For user_catalog_table, similarily, the DELETE should be
> filtered by table filter in pgoutput_change as well.
> 
> So, I think we can remove this check and log.
> And here is the new version patch which removes that for now.

After rethinking about this, it seems better leave this check for now. Although
it may be unnecessary, but we can remove that later as a separate patch when we
are sure about this. So, here is a patch that add this check back.

Best Regards,
Hou zj


Attachment

Re: Simplify some codes in pgoutput

From
Peter Smith
Date:
Hi Hou-san,

I tried to compare the logic of patch v3-0001 versus the original HEAD code.

IMO this patch logic is not exactly the same as before -- there are
some subtle differences. I am not sure if these differences represent
real problems or not.

Below are all my review comments:

======

1.
/* Switch relation if publishing via root. */
if (relentry->publish_as_relid != RelationGetRelid(relation))
{
    Assert(relation->rd_rel->relispartition);
    ancestor = RelationIdGetRelation(relentry->publish_as_relid);
    targetrel = ancestor;
}

~

The "switch relation if publishing via root" logic is now happening
first, whereas the original code was doing this after the slot
assignments. AFAIK it does not matter, it's just a small point of
difference.

======

2.
/* Convert tuple if needed. */
if (relentry->attrmap)
{
    ...
}

The "Convert tuple if needed." logic looks the same, but when it is
executed is NOT the same. It could be a problem.

Previously, the conversion would only happen within the "Switch
relation if publishing via root." condition. But the patch no longer
has that extra condition -- now I think it attempts conversion every
time regardless of "publishing via root".

I would expect the "publish via root" case to be less common, so even
if the current code works, by omitting that check won't this patch
have an unnecessary performance hit due to the extra conversions?

~~

3.
if (old_slot)
    old_slot = execute_attr_map_slot(relentry->attrmap,old_slot,MakeTupleTableSlot(tupdesc,
&TTSOpsVirtual));

~

The previous conversion code for UPDATE (shown above) was checking
"if (old_slot)". Actually, I don't know why that check was even
necessary before but it seems to have been accounting for a
possibility that UPDATE might not have "oldtuple".

But this combination (if indeed it was possible) is not handled
anymore with the patch code because the old_slot is unconditionally
assigned in the same block doing this conversion. Perhaps that
original HEAD extra check was just overkill?  TAP tests obviously
still are passing with the patch, but anyway, this is yet another
small point of difference for the refactored patch code.

======

4.
AFAIK, the "if (change->data.tp.newtuple)" can only be true for INSERT
or UPDATE, so the code would be better to include a sanity Assert.

SUGGESTION
if (change->data.tp.newtuple)
{
    Assert(action == REORDER_BUFFER_CHANGE_INSERT || action ==
REORDER_BUFFER_CHANGE_UPDATE);
...
}

======

5.
AFAIK, the "if (change->data.tp.oldtuple)" can only be true for UPDATE
or DELETE, so the code would be better to include a sanity Assert.

SUGGESTION
if (change->data.tp.oldtuple)
{
    Assert(action == REORDER_BUFFER_CHANGE_UPDATE || action ==
REORDER_BUFFER_CHANGE_DELETE);
...
}

======

6.
I suggest moving the "change->data.tp.oldtuple" check before the
"change->data.tp.newtuple" check. I don't think it makes any
difference, but it seems more natural IMO to have old before new.


------
Kind Regards,
Peter Smith



RE: Simplify some codes in pgoutput

From
"houzj.fnst@fujitsu.com"
Date:
On Thursday, March 30, 2023 9:15 AM Peter Smith <smithpb2250@gmail.com> wrote:
> 
> Hi Hou-san,
> 
> I tried to compare the logic of patch v3-0001 versus the original HEAD code.
> 
> IMO this patch logic is not exactly the same as before -- there are
> some subtle differences. I am not sure if these differences represent
> real problems or not.
> 
> Below are all my review comments:

Thanks for the check and comments.

> 
> ======
> 
> 1.
> /* Switch relation if publishing via root. */
> if (relentry->publish_as_relid != RelationGetRelid(relation))
> {
>     Assert(relation->rd_rel->relispartition);
>     ancestor = RelationIdGetRelation(relentry->publish_as_relid);
>     targetrel = ancestor;
> }
> 
> ~
> 
> The "switch relation if publishing via root" logic is now happening
> first, whereas the original code was doing this after the slot
> assignments. AFAIK it does not matter, it's just a small point of
> difference.

I also think it doesn't matter.

> ======
> 
> 2.
> /* Convert tuple if needed. */
> if (relentry-> attrmap)
> {
>     ...
> }
> 
> The "Convert tuple if needed." logic looks the same, but when it is
> executed is NOT the same. It could be a problem.
> 
> Previously, the conversion would only happen within the "Switch
> relation if publishing via root." condition. But the patch no longer
> has that extra condition -- now I think it attempts conversion every
> time regardless of "publishing via root".
> 
> I would expect the "publish via root" case to be less common, so even
> if the current code works, by omitting that check won't this patch
> have an unnecessary performance hit due to the extra conversions?

No, the conversions won't happen in normal cases because "if (relentry-> attrmap)"
will pass only if we need to switch relation(publish via root).

> ~~
> 
> 3.
> if (old_slot)
>     old_slot =
> execute_attr_map_slot(relentry->attrmap,old_slot,MakeTupleTableSlot(tupde
> sc,
> &TTSOpsVirtual));
> 
> ~
> 
> The previous conversion code for UPDATE (shown above) was checking
> "if (old_slot)". Actually, I don't know why that check was even
> necessary before but it seems to have been accounting for a
> possibility that UPDATE might not have "oldtuple".

If the RI key wasn't updated, then it's possible the old tuple is null.

> 
> But this combination (if indeed it was possible) is not handled
> anymore with the patch code because the old_slot is unconditionally
> assigned in the same block doing this conversion.

I think this case is handled by the generic old slot conversion in the patch.

> ======
> 
> 4.
> AFAIK, the "if (change->data.tp.newtuple)" can only be true for INSERT
> or UPDATE, so the code would be better to include a sanity Assert.
> 
> SUGGESTION
> if (change->data.tp.newtuple)
> {
>     Assert(action == REORDER_BUFFER_CHANGE_INSERT || action ==
> REORDER_BUFFER_CHANGE_UPDATE);
> ...
> }
> 
> ======
> 
> 5.
> AFAIK, the "if (change->data.tp.oldtuple)" can only be true for UPDATE
> or DELETE, so the code would be better to include a sanity Assert.
> 
> SUGGESTION
> if (change->data.tp.oldtuple)
> {
>     Assert(action == REORDER_BUFFER_CHANGE_UPDATE || action ==
> REORDER_BUFFER_CHANGE_DELETE);
> ...
> 

It might be fine but I am not sure if it's necessary to add this in this
patch as we don't have such assertion before.

> 
> ======
> 
> 6.
> I suggest moving the "change->data.tp.oldtuple" check before the
> "change->data.tp.newtuple" check. I don't think it makes any
> difference, but it seems more natural IMO to have old before new.

Changed.

Best Regards,
Hou zj


Attachment

Re: Simplify some codes in pgoutput

From
Peter Smith
Date:
Hi Hou-san,

I looked again at v4-0001.

On Thu, Mar 30, 2023 at 2:01 PM houzj.fnst@fujitsu.com
<houzj.fnst@fujitsu.com> wrote:
>
> On Thursday, March 30, 2023 9:15 AM Peter Smith <smithpb2250@gmail.com> wrote:
> >
...
> >
> > 2.
> > /* Convert tuple if needed. */
> > if (relentry-> attrmap)
> > {
> >     ...
> > }
> >
> > The "Convert tuple if needed." logic looks the same, but when it is
> > executed is NOT the same. It could be a problem.
> >
> > Previously, the conversion would only happen within the "Switch
> > relation if publishing via root." condition. But the patch no longer
> > has that extra condition -- now I think it attempts conversion every
> > time regardless of "publishing via root".
> >
> > I would expect the "publish via root" case to be less common, so even
> > if the current code works, by omitting that check won't this patch
> > have an unnecessary performance hit due to the extra conversions?
>
> No, the conversions won't happen in normal cases because "if (relentry-> attrmap)"
> will pass only if we need to switch relation(publish via root).
>

OK.

> > ~~
> >
> > 3.
> > if (old_slot)
> >     old_slot =
> > execute_attr_map_slot(relentry->attrmap,old_slot,MakeTupleTableSlot(tupde
> > sc,
> > &TTSOpsVirtual));
> >
> > ~
> >
> > The previous conversion code for UPDATE (shown above) was checking
> > "if (old_slot)". Actually, I don't know why that check was even
> > necessary before but it seems to have been accounting for a
> > possibility that UPDATE might not have "oldtuple".
>
> If the RI key wasn't updated, then it's possible the old tuple is null.
>
> >
> > But this combination (if indeed it was possible) is not handled
> > anymore with the patch code because the old_slot is unconditionally
> > assigned in the same block doing this conversion.
>
> I think this case is handled by the generic old slot conversion in the patch.

Yeah, I think you are right. Sorry, this was my mistake when reading v3.

>
> > ======
> >
> > 4.
> > AFAIK, the "if (change->data.tp.newtuple)" can only be true for INSERT
> > or UPDATE, so the code would be better to include a sanity Assert.
> >
> > SUGGESTION
> > if (change->data.tp.newtuple)
> > {
> >     Assert(action == REORDER_BUFFER_CHANGE_INSERT || action ==
> > REORDER_BUFFER_CHANGE_UPDATE);
> > ...
> > }
> >
> > ======
> >
> > 5.
> > AFAIK, the "if (change->data.tp.oldtuple)" can only be true for UPDATE
> > or DELETE, so the code would be better to include a sanity Assert.
> >
> > SUGGESTION
> > if (change->data.tp.oldtuple)
> > {
> >     Assert(action == REORDER_BUFFER_CHANGE_UPDATE || action ==
> > REORDER_BUFFER_CHANGE_DELETE);
> > ...
> >
>
> It might be fine but I am not sure if it's necessary to add this in this
> patch as we don't have such assertion before.

The Asserts are just for sanity and self-documentation regarding what
actions can get into this logic. IMO including them does no harm,
rather it does some small amount of good, so why not do it?

You can't really use the fact they were not there before as a reason
to not add them now --  There were no Asserts in the original code
because this same logic was duplicated multiple times and was always
within obvious scope of a particular switch (action) case:

~

Apart from the question of the Asserts, I have no more review comments
for this patch.

(FYI - patch v4 applied cleanly and the regression tests and TAP
subscription tests all pass OK)

------
Kind Regards,
Peter Smith.



Re: Simplify some codes in pgoutput

From
Amit Kapila
Date:
On Thu, Mar 30, 2023 at 11:12 AM Peter Smith <smithpb2250@gmail.com> wrote:
>
> > >
> > > 5.
> > > AFAIK, the "if (change->data.tp.oldtuple)" can only be true for UPDATE
> > > or DELETE, so the code would be better to include a sanity Assert.
> > >
> > > SUGGESTION
> > > if (change->data.tp.oldtuple)
> > > {
> > >     Assert(action == REORDER_BUFFER_CHANGE_UPDATE || action ==
> > > REORDER_BUFFER_CHANGE_DELETE);
> > > ...
> > >
> >
> > It might be fine but I am not sure if it's necessary to add this in this
> > patch as we don't have such assertion before.
>
> The Asserts are just for sanity and self-documentation regarding what
> actions can get into this logic. IMO including them does no harm,
> rather it does some small amount of good, so why not do it?
>
> You can't really use the fact they were not there before as a reason
> to not add them now --  There were no Asserts in the original code
> because this same logic was duplicated multiple times and was always
> within obvious scope of a particular switch (action) case:
>

I see your point but like Hou-San I am also not really sure if these
new Asserts will be better. The patch looks good to me, so will push
in some time.

--
With Regards,
Amit Kapila.