Thread: Add common function ReplicationOriginName.

Add common function ReplicationOriginName.

From
Peter Smith
Date:
Hi.

There are already multiple places that are building the subscription
origin name, and there are more coming with some new patches [1] doing
the same:

e.g.
snprintf(originname, sizeof(originname), "pg_%u", subid);

~~

IMO it is better to encapsulate this name formatting in common code
instead of the format string being scattered around the place.

PSA a patch to add a common function ReplicationOriginName. This is
the equivalent of a similar function for tablesync
(ReplicationOriginNameForTablesync) which already existed.

------
[1] https://www.postgresql.org/message-id/flat/CAA4eK1%2BwyN6zpaHUkCLorEWNx75MG0xhMwcFhvjqm2KURZEAGw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Add common function ReplicationOriginName.

From
Aleksander Alekseev
Date:
Hi Peter,

> PSA a patch to add a common function ReplicationOriginName

The patch looks good to me.

One nitpick I have is that the 2nd argument of snprintf is size_t
while we are passing int's. Your patch is consistent with the current
implementation of ReplicationOriginNameForTablesync() and similar
functions in tablesync.c. However I would like to mention this in case
the committer will be interested in replacing ints with Size's while
on it.


-- 
Best regards,
Aleksander Alekseev



Re: Add common function ReplicationOriginName.

From
Amit Kapila
Date:
On Mon, Sep 19, 2022 at 2:27 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Peter,
>
> > PSA a patch to add a common function ReplicationOriginName
>
> The patch looks good to me.
>
> One nitpick I have is that the 2nd argument of snprintf is size_t
> while we are passing int's. Your patch is consistent with the current
> implementation of ReplicationOriginNameForTablesync() and similar
> functions in tablesync.c.
>

I think it is better to use Size. Even though, it may not fail now as
the size of names for origin will always be much lesser but it is
better if we are consistent. If we agree with this, then as a first
patch, we can make it to use Size in existing places and then
introduce this new function.

-- 
With Regards,
Amit Kapila.



Re: Add common function ReplicationOriginName.

From
Aleksander Alekseev
Date:
Hi Amit,

> I think it is better to use Size. Even though, it may not fail now as
> the size of names for origin will always be much lesser but it is
> better if we are consistent. If we agree with this, then as a first
> patch, we can make it to use Size in existing places and then
> introduce this new function.

OK, here is the updated patchset.

* 0001 replaces int's with Size's in the existing code
* 0002 applies Peter's patch on top of 0001

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: Add common function ReplicationOriginName.

From
Peter Smith
Date:
On Tue, Sep 20, 2022 at 6:36 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Amit,
>
> > I think it is better to use Size. Even though, it may not fail now as
> > the size of names for origin will always be much lesser but it is
> > better if we are consistent. If we agree with this, then as a first
> > patch, we can make it to use Size in existing places and then
> > introduce this new function.
>
> OK, here is the updated patchset.
>
> * 0001 replaces int's with Size's in the existing code
> * 0002 applies Peter's patch on top of 0001
>

LGTM. Thanks!

-----
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Add common function ReplicationOriginName.

From
Amit Kapila
Date:
On Tue, Sep 20, 2022 at 2:06 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Amit,
>
> > I think it is better to use Size. Even though, it may not fail now as
> > the size of names for origin will always be much lesser but it is
> > better if we are consistent. If we agree with this, then as a first
> > patch, we can make it to use Size in existing places and then
> > introduce this new function.
>
> OK, here is the updated patchset.
>
> * 0001 replaces int's with Size's in the existing code
>

Pushed this one.

> * 0002 applies Peter's patch on top of 0001
>

Can't we use the existing function ReplicationOriginNameForTablesync()
by passing relid as InvalidOid for this purpose? We need a check
inside to decide which name to construct, otherwise, it should be
fine. If we agree with this, then we can change the name of the
function to something like ReplicationOriginNameForLogicalRep or
ReplicationOriginNameForLogicalRepWorkers.

-- 
With Regards,
Amit Kapila.



Re: Add common function ReplicationOriginName.

From
Peter Smith
Date:
On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
...

> Can't we use the existing function ReplicationOriginNameForTablesync()
> by passing relid as InvalidOid for this purpose? We need a check
> inside to decide which name to construct, otherwise, it should be
> fine. If we agree with this, then we can change the name of the
> function to something like ReplicationOriginNameForLogicalRep or
> ReplicationOriginNameForLogicalRepWorkers.
>

This suggestion attaches special meaning to the reild param.

Won't it seem a bit strange for the non-tablesync callers (who
probably have a perfectly valid 'relid') to have to pass an InvalidOid
relid just so they can format the correct origin name?

------
Kind Regards,
Peter Smith.
Fujitsu Australia



Re: Add common function ReplicationOriginName.

From
Amit Kapila
Date:
On Wed, Sep 21, 2022 at 3:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
> On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> ...
>
> > Can't we use the existing function ReplicationOriginNameForTablesync()
> > by passing relid as InvalidOid for this purpose? We need a check
> > inside to decide which name to construct, otherwise, it should be
> > fine. If we agree with this, then we can change the name of the
> > function to something like ReplicationOriginNameForLogicalRep or
> > ReplicationOriginNameForLogicalRepWorkers.
> >
>
> This suggestion attaches special meaning to the reild param.
>
> Won't it seem a bit strange for the non-tablesync callers (who
> probably have a perfectly valid 'relid') to have to pass an InvalidOid
> relid just so they can format the correct origin name?
>

For non-tablesync workers, relid should always be InvalidOid. See, how
we launch apply workers in ApplyLauncherMain(). Do you see any case
for non-tablesync workers where relid is not InvalidOid?


-- 
With Regards,
Amit Kapila.



Re: Add common function ReplicationOriginName.

From
Peter Smith
Date:
On Wed, Sep 21, 2022 at 8:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Sep 21, 2022 at 3:09 PM Peter Smith <smithpb2250@gmail.com> wrote:
> >
> > On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > ...
> >
> > > Can't we use the existing function ReplicationOriginNameForTablesync()
> > > by passing relid as InvalidOid for this purpose? We need a check
> > > inside to decide which name to construct, otherwise, it should be
> > > fine. If we agree with this, then we can change the name of the
> > > function to something like ReplicationOriginNameForLogicalRep or
> > > ReplicationOriginNameForLogicalRepWorkers.
> > >
> >
> > This suggestion attaches special meaning to the reild param.
> >
> > Won't it seem a bit strange for the non-tablesync callers (who
> > probably have a perfectly valid 'relid') to have to pass an InvalidOid
> > relid just so they can format the correct origin name?
> >
>
> For non-tablesync workers, relid should always be InvalidOid. See, how
> we launch apply workers in ApplyLauncherMain(). Do you see any case
> for non-tablesync workers where relid is not InvalidOid?
>

Hmm, my mistake. I was thinking more of all the calls coming from the
subscriptioncmds.c, but now that I look at it maybe none of those has
any relid either.

OK, I can unify the 2 functions as you suggested. I will post another
patch in a few days.

------
Kind Regards,
Peter Smith,
Fujitsu Australia.



Re: Add common function ReplicationOriginName.

From
Peter Smith
Date:
On Wed, Sep 21, 2022 at 8:22 PM Peter Smith <smithpb2250@gmail.com> wrote:
>
...
> > > On Wed, Sep 21, 2022 at 3:23 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > ...
> > >
> > > > Can't we use the existing function ReplicationOriginNameForTablesync()
> > > > by passing relid as InvalidOid for this purpose? We need a check
> > > > inside to decide which name to construct, otherwise, it should be
> > > > fine. If we agree with this, then we can change the name of the
> > > > function to something like ReplicationOriginNameForLogicalRep or
> > > > ReplicationOriginNameForLogicalRepWorkers.
> > > >
...
>
> OK, I can unify the 2 functions as you suggested. I will post another
> patch in a few days.
>

PSA patch v3 to combine the different replication origin name
formatting in a single function ReplicationOriginNameForLogicalRep as
suggested.

------
Kind Regards,
Peter Smith.
Fujitsu Australia

Attachment

Re: Add common function ReplicationOriginName.

From
Peter Smith
Date:
On Tue, Sep 20, 2022 at 6:36 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Amit,
>
> > I think it is better to use Size. Even though, it may not fail now as
> > the size of names for origin will always be much lesser but it is
> > better if we are consistent. If we agree with this, then as a first
> > patch, we can make it to use Size in existing places and then
> > introduce this new function.
>
> OK, here is the updated patchset.
>
> * 0001 replaces int's with Size's in the existing code
> * 0002 applies Peter's patch on top of 0001
>

Hi Aleksander.

FYI - although it is outside the scope of this thread, I did notice at
least one other example where you might want to substitute Size for
int in the same way as your v2-0001 patch did.

e.g. Just searching code for 'snprintf' where there is some parameter
for the size I quickly found:

File: src/bin/pg_dump/pg_dump_sort.c:

static void
describeDumpableObject(DumpableObject *obj, char *buf, int bufsize)

caller:
describeDumpableObject(loop[i], buf, sizeof(buf));

~~

I expect you can find more like just this if you look harder than I did.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.



Re: Add common function ReplicationOriginName.

From
Aleksander Alekseev
Date:
Hi Peter,

> PSA patch v3 to combine the different replication origin name
> formatting in a single function ReplicationOriginNameForLogicalRep as
> suggested.

LGTM except for minor issues with the formatting; fixed.

> I expect you can find more like just this if you look harder than I did.

Thanks. You are right, there are more places that pass int as the
second argument of *nprintf(). I used a command:

$ grep -r nprintf ./ | perl -lne 'print if($_ !~
/nprintf\([^\,]+,\s*(sizeof|([0-9A-Z_ \-]+\,))/ )' > found.txt

... and then re-checked the results manually. This excludes patterns
like *nprintf(..., sizeof(...)) and *nprintf(..., MACRO) and leaves
only something like *nprintf(..., variable). The cases where we
subtract an integer from a Size, etc were ignored.

I don't have a strong opinion on whether we should be really worried
by this. But in case we do, here is the patch. The order of 0001 and
0002 doesn't matter.

As I understand, ecpg uses size_t rather than Size, so for this
library I used size_t. Not 100% sure if the changes I made to
src/backend/utils/adt/jsonb.c add much value. I leave this to the
committer to decide.

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: Add common function ReplicationOriginName.

From
Amit Kapila
Date:
On Tue, Sep 27, 2022 at 5:04 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi Peter,
>
> > PSA patch v3 to combine the different replication origin name
> > formatting in a single function ReplicationOriginNameForLogicalRep as
> > suggested.
>
> LGTM except for minor issues with the formatting; fixed.
>

LGTM as well. I'll push this tomorrow unless there are any more comments.


-- 
With Regards,
Amit Kapila.



Re: Add common function ReplicationOriginName.

From
Amit Kapila
Date:
On Mon, Oct 10, 2022 at 7:09 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Sep 27, 2022 at 5:04 PM Aleksander Alekseev
> <aleksander@timescale.com> wrote:
> >
> > Hi Peter,
> >
> > > PSA patch v3 to combine the different replication origin name
> > > formatting in a single function ReplicationOriginNameForLogicalRep as
> > > suggested.
> >
> > LGTM except for minor issues with the formatting; fixed.
> >
>
> LGTM as well. I'll push this tomorrow unless there are any more comments.
>

Pushed.

-- 
With Regards,
Amit Kapila.



Re: Add common function ReplicationOriginName.

From
Aleksander Alekseev
Date:
Hi Amit,

> Pushed.

Thanks!

> I don't have a strong opinion on whether we should be really worried
> by this. But in case we do, here is the patch.

This leaves us one patch to deal with.

-- 
Best regards,
Aleksander Alekseev

Attachment

Re: Add common function ReplicationOriginName.

From
Tom Lane
Date:
Aleksander Alekseev <aleksander@timescale.com> writes:
> This leaves us one patch to deal with.
> [ v4-0001-Pass-Size-size_t-as-a-2nd-argument-of-snprintf.patch ]

I looked at this and am inclined to reject it.  None of these
places realistically need to deal with strings longer than
MAXPATHLEN or so, let alone multiple gigabytes.  So it's just
code churn, creating backpatch hazards (admittedly not big ones)
for no real gain.

            regards, tom lane



Re: Add common function ReplicationOriginName.

From
Aleksander Alekseev
Date:
Hi Tom,

> I looked at this and am inclined to reject it. [...]

OK, thanks. Then we are done with this thread. I closed the
corresponding CF entry.

-- 
Best regards,
Aleksander Alekseev