Thread: Logical Replication - detail message with names of missing columns

Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Hi,

I observed that, in logical replication when a subscriber is missing
some columns, it currently emits an error message that says "some"
columns are missing(see logicalrep_rel_open()), but it doesn't specify
what the missing column names are. The comment there also says that
finding the missing column names is a todo item(/* TODO, detail
message with names of missing columns */).

I propose a patch to find the missing columns on the subscriber
relation using the publisher relation columns and show them in the
error message which can make the error more informative to the user.

Here's a snapshot how the error looks with the patch:
2020-09-04 01:00:36.721 PDT [843128] ERROR:  logical replication
target relation "public.test_1" is missing "b1, d1" replicated columns
2020-09-04 01:00:46.784 PDT [843132] ERROR:  logical replication
target relation "public.test_1" is missing "b1" replicated columns
2020-09-06 21:24:53.645 PDT [902945] ERROR:  logical replication
target relation "public.test_1" is missing "a1, c1, d1, b1" replicated
columns

Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Logical Replication - detail message with names of missing columns

From
Kyotaro Horiguchi
Date:
Thank you for working on this.

At Mon, 7 Sep 2020 16:30:59 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Hi,
> 
> I observed that, in logical replication when a subscriber is missing
> some columns, it currently emits an error message that says "some"
> columns are missing(see logicalrep_rel_open()), but it doesn't specify
> what the missing column names are. The comment there also says that
> finding the missing column names is a todo item(/* TODO, detail
> message with names of missing columns */).
> 
> I propose a patch to find the missing columns on the subscriber
> relation using the publisher relation columns and show them in the
> error message which can make the error more informative to the user.

+1 for objective. However, that can be done simpler way that doesn't
need additional loops by using bitmapset to hold missing remote
attribute numbers. This also make the variable "found" useless.

> Here's a snapshot how the error looks with the patch:
> 2020-09-04 01:00:36.721 PDT [843128] ERROR:  logical replication
> target relation "public.test_1" is missing "b1, d1" replicated columns
> 2020-09-04 01:00:46.784 PDT [843132] ERROR:  logical replication
> target relation "public.test_1" is missing "b1" replicated columns
> 2020-09-06 21:24:53.645 PDT [902945] ERROR:  logical replication
> target relation "public.test_1" is missing "a1, c1, d1, b1" replicated
> columns
> 
> Thoughts?

FWIW, I would prefer that the message be like

 logical replication target relation "public.test_1" is missing
 replicated columns: "a1", "c1"

I'm not sure we need to have separate messages for singlar and plural.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
On Tue, Sep 8, 2020 at 6:50 AM Kyotaro Horiguchi
<horikyota.ntt@gmail.com> wrote:
>
> +1 for objective. However, that can be done simpler way that doesn't
> need additional loops by using bitmapset to hold missing remote
> attribute numbers. This also make the variable "found" useless.
>

Thanks. I will look into it and post a v2 patch soon.

>
> > Here's a snapshot how the error looks with the patch:
> > 2020-09-04 01:00:36.721 PDT [843128] ERROR:  logical replication
> > target relation "public.test_1" is missing "b1, d1" replicated columns
> > 2020-09-04 01:00:46.784 PDT [843132] ERROR:  logical replication
> > target relation "public.test_1" is missing "b1" replicated columns
> > 2020-09-06 21:24:53.645 PDT [902945] ERROR:  logical replication
> > target relation "public.test_1" is missing "a1, c1, d1, b1" replicated
> > columns
> >
> > Thoughts?
>
> FWIW, I would prefer that the message be like
>
>  logical replication target relation "public.test_1" is missing
>  replicated columns: "a1", "c1"
>

This looks fine, I will change that.

>
> I'm not sure we need to have separate messages for singular and plural.
>

I don't think we need to have separate messages. To keep it simple,
let's have plural form.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Thanks for the comments, v2 patch is attached.

>
> On Tue, Sep 8, 2020 at 6:50 AM Kyotaro Horiguchi
> <horikyota.ntt@gmail.com> wrote:
> >
> > +1 for objective. However, that can be done simpler way that doesn't
> > need additional loops by using bitmapset to hold missing remote
> > attribute numbers. This also make the variable "found" useless.
> >
>
> Thanks. I will look into it and post a v2 patch soon.
>

Changed.

> >
> > FWIW, I would prefer that the message be like
> >
> >  logical replication target relation "public.test_1" is missing
> >  replicated columns: "a1", "c1"
> >
>
> This looks fine, I will change that.
>

Changed. Now the error looks like as shown below:

ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"d1"
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"c1","d1"
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"a1","c1","d1"
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"a1","b1","c1","d1"
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"a1","b1","c1","d1","e1"



With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Logical Replication - detail message with names of missing columns

From
Alvaro Herrera
Date:
On 2020-Sep-08, Bharath Rupireddy wrote:

> +    /* Find the remote attributes that are missing in the local relation. */
> +    for (i = 0; i < remoterel->natts; i++)
> +    {
> +        if (!bms_is_member(i, localattnums))
> +        {
> +            if (missingatts->len == 0)
> +            {
> +                appendStringInfoChar(missingatts, '"');
> +                appendStringInfoString(missingatts, remoterel->attnames[i]);
> +            }
> +            else if (missingatts->len > 0)
> +            {
> +                appendStringInfoChar(missingatts, ',');
> +                appendStringInfoChar(missingatts, '"');
> +                appendStringInfo(missingatts, "%s", remoterel->attnames[i]);
> +            }
> +
> +            appendStringInfoChar(missingatts, '"');
> +        }

This looks a bit fiddly.  Would it be less cumbersome to use
quote_identifier here instead?


>              ereport(ERROR,
>                      (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>                       errmsg("logical replication target relation \"%s.%s\" is missing "
> -                            "some replicated columns",
> -                            remoterel->nspname, remoterel->relname)));
> +                            "replicated columns:%s",
> +                            remoterel->nspname, remoterel->relname,
> +                            missingatts.data)));

Please do use errmsg_plural -- have the new function return the number
of missing columns.  Should be pretty straightforward.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Thanks for the comments. Attaching the v3 patch.

On Tue, Sep 8, 2020 at 8:19 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> This looks a bit fiddly.  Would it be less cumbersome to use
> quote_identifier here instead?
>

Changed. quote_identifier() adds quotes wherever necessary.

>
> Please do use errmsg_plural -- have the new function return the number
> of missing columns.  Should be pretty straightforward.
>

Changed. Now the error message looks as below:

CREATE TABLE test_tbl1(a1 int, b1 text, c1 int, d1 real, e1 int);
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated column:c1
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:b1,c1
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:b1,c1,e1

CREATE TABLE test_tbl1("!A1" int, "%b1" text, "@C1" int, d1 real, E1 int);
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated column:"@C1"
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"@C1",d1
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"!A1","@C1",d1
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"!A1","@C1",d1,e1
ERROR:  logical replication target relation "public.test_tbl1" is
missing replicated columns:"!A1","%b1","@C1",d1,e1

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Added this to the commitfest - https://commitfest.postgresql.org/30/2727/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Logical Replication - detail message with names of missing columns

From
Kyotaro Horiguchi
Date:
Thanks for the revised version.

At Tue, 8 Sep 2020 22:36:17 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> Thanks for the comments. Attaching the v3 patch.
> 
> On Tue, Sep 8, 2020 at 8:19 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > This looks a bit fiddly.  Would it be less cumbersome to use
> > quote_identifier here instead?
> >
> 
> Changed. quote_identifier() adds quotes wherever necessary.
> 
> >
> > Please do use errmsg_plural -- have the new function return the number
> > of missing columns.  Should be pretty straightforward.
> >
> 
> Changed. Now the error message looks as below:

^^;

I don't think the logicalrep_find_missing_attrs worth a separate
function. The detection code would be short enough to be embedded in
the checking loop in logicalrep_rel_open. Note that remoterel doesn't
have missing columns since they are already removed when it is
constructed.  See fetch_remote_table_info and
logicalrep_rel_att_by_name is relying on that fact.  As the result
this code could be reduced to something like the following.

+ /* remoterel doesn't contain dropped attributes, see .... */
- found = 0;
+ missing_attr = bms_add_range(NULL, 0, remoterel->natts - 1);
  for (i = 0; i < desc->natts; i++)
    if (attnum >= 0)
-     found++;
+     missing_attrs = bms_del_member(missing_attrs, attnum);
- if (found < remoterel->natts)
+ if (!bms_is_empty(missing_attrs))
+ {
+   while ((i = bms_first_memeber(missing_attrs)) >= 0)
+   {
+      if (not_first) appendStringInfo(<delimter>);
+      appendStringInfo(str, remoterel->attnames[i])
+   }
-   erreport("some attrs missing");
+   ereport(ERROR, <blah blah>);
+ }

> ERROR:  logical replication target relation "public.test_tbl1" is
> missing replicated columns:b1,c1,e1

I think we always double-quote identifiers in error messages. For
example:

./catalog/index.c�254: errmsg("primary key column \"%s\" is not marked NOT NULL",
./catalog/heap.c�614:  errmsg("column \"%s\" has pseudo-type %s",
./catalog/heap.c�706:  errmsg("no collation was derived for column \"%s\" with collatable type %s",

And we need a space after the semicolon and commas in the message
string.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Thanks for the review comments. Attaching v4 patch.

On Fri, Sep 11, 2020 at 6:44 AM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
>
> + /* remoterel doesn't contain dropped attributes, see .... */
> - found = 0;
> + missing_attr = bms_add_range(NULL, 0, remoterel->natts - 1);
>   for (i = 0; i < desc->natts; i++)
>     if (attnum >= 0)
> -     found++;
> +     missing_attrs = bms_del_member(missing_attrs, attnum);
> - if (found < remoterel->natts)
> + if (!bms_is_empty(missing_attrs))
> + {
> +   while ((i = bms_first_memeber(missing_attrs)) >= 0)
> +   {
> +      if (not_first) appendStringInfo(<delimter>);
> +      appendStringInfo(str, remoterel->attnames[i])
> +   }
> -   erreport("some attrs missing");
> +   ereport(ERROR, <blah blah>);
> + }
>

+1. Yes, the remoterel doesn't contain dropped attributes.

>
> > ERROR:  logical replication target relation "public.test_tbl1" is
> > missing replicated columns:b1,c1,e1
>
> I think we always double-quote identifiers in error messages. For
> example:
>
> ./catalog/index.c 254: errmsg("primary key column \"%s\" is not marked NOT NULL",
> ./catalog/heap.c 614:  errmsg("column \"%s\" has pseudo-type %s",
> ./catalog/heap.c 706:  errmsg("no collation was derived for column \"%s\" with collatable type %s",
>

Double-quoting identifiers such as database object names helps in clearly identifying them from the normal error message text. Seems like everywhere we double-quote columns in the error messages. One exception I found though, for relation names(there may be more places where we are not double-quoting) is in errmsg("could not open parent table of index %s", RelationGetRelationName(indrel).

How about quoting all the individual columns? Looks like quote_identifier() doesn't serve our purpose here as it selectively quotes or quotes all identifiers only in case quote_all_identifiers config variable is set.

CREATE TABLE t1(a1 int, b1 text, c1 real, d1 int, e1 int);
2020-09-10 23:08:26.737 PDT [217404] ERROR:  logical replication target relation "public.t1" is missing replicated column: "c1"
2020-09-10 23:08:31.768 PDT [217406] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "c1", "d1"
2020-09-10 23:08:51.898 PDT [217417] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "c1", "d1", "e1"

CREATE TABLE t1("!A1" int, "%b1" text, "@C1" int, d1 real, E1 int);
2020-09-10 23:12:29.768 PDT [217501] ERROR:  logical replication target relation "public.t1" is missing replicated column: "!A1"
2020-09-10 23:12:56.640 PDT [217515] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "!A1", "@C1"
2020-09-10 23:13:31.848 PDT [217533] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "!A1", "@C1", "d1"

>
> And we need a space after the semicolon and commas in the message
> string.
>

Changed.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Logical Replication - detail message with names of missing columns

From
Tom Lane
Date:
Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes:
>> I think we always double-quote identifiers in error messages. For
>> example:
>> ./catalog/heap.c 706:  errmsg("no collation was derived for column \"%s\"
>> with collatable type %s",

Right.  Anything in this patch that is not doing that needs to be fixed.
(As this example shows, type names are exempt from the rule.)

> How about quoting all the individual columns? Looks like quote_identifier()
> doesn't serve our purpose here as it selectively quotes or quotes all
> identifiers only in case quote_all_identifiers config variable is set.

NO.  The convention is to write \"...\" in the translatable message.
Not all languages use double quote symbols for this purpose, so that
way lets translators replace them with something else.

Yeah, this means that messages containing names that contain double
quotes might be a bit ambiguous.  We live with it.

            regards, tom lane



Re: Logical Replication - detail message with names of missing columns

From
Alvaro Herrera
Date:
On 2020-Sep-11, Tom Lane wrote:

> > How about quoting all the individual columns? Looks like quote_identifier()
> > doesn't serve our purpose here as it selectively quotes or quotes all
> > identifiers only in case quote_all_identifiers config variable is set.
> 
> NO.  The convention is to write \"...\" in the translatable message.
> Not all languages use double quote symbols for this purpose, so that
> way lets translators replace them with something else.
> 
> Yeah, this means that messages containing names that contain double
> quotes might be a bit ambiguous.  We live with it.

There is a problem here though, which is that the quoted strings in
question are part of a list of columns.  There's no way to keep that
list as a translatable string, so that approach doesn't work here.
What appears in the translatable string is:

logical replication target relation "%s" is missing replicated columns: %s

Or in the singular case:
logical replication target relation "%s" is missing replicated columns: %s

where the second %s is the list of columns.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication - detail message with names of missing columns

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-11, Tom Lane wrote:
>> NO.  The convention is to write \"...\" in the translatable message.

> There is a problem here though, which is that the quoted strings in
> question are part of a list of columns.  There's no way to keep that
> list as a translatable string, so that approach doesn't work here.
> What appears in the translatable string is:

> logical replication target relation "%s" is missing replicated columns: %s

Check, but you could imagine that the column-list string is constructed
with code along the lines of

    if (first)
        appendStringInfo(buf, _("\"%s\""), colname);
    else
        appendStringInfo(buf, _(", \"%s\""), colname);

thus allowing a translator to replace the quote marks.  Might not be
worth the trouble.  In any case, the point here is that we're not
trying to construct valid SQL so quote_identifier is not the right
tool for the job.

            regards, tom lane



Re: Logical Replication - detail message with names of missing columns

From
Alvaro Herrera
Date:
On 2020-Sep-11, Tom Lane wrote:

> Check, but you could imagine that the column-list string is constructed
> with code along the lines of
> 
>     if (first)
>         appendStringInfo(buf, _("\"%s\""), colname);
>     else
>         appendStringInfo(buf, _(", \"%s\""), colname);
> thus allowing a translator to replace the quote marks.

This works OK for my language at least.  I couldn't find any guidance on
whether there's a problem doing things this way for RTL languages etc,
but +1 for doing it this way.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Logical Replication - detail message with names of missing columns

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> This works OK for my language at least.  I couldn't find any guidance on
> whether there's a problem doing things this way for RTL languages etc,
> but +1 for doing it this way.

Hmm ... fortunately, there's not any large semantic significance to the
order in which the columns are mentioned here.  Maybe an RTL speaker
would read the column names in the opposite order than we do, but I
don't think it's a problem.

            regards, tom lane



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
On Fri, Sep 11, 2020 at 9:05 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>
> On 2020-Sep-11, Tom Lane wrote:
>
> > Check, but you could imagine that the column-list string is constructed
> > with code along the lines of
> >
> >       if (first)
> >           appendStringInfo(buf, _("\"%s\""), colname);
> >       else
> >           appendStringInfo(buf, _(", \"%s\""), colname);
> > thus allowing a translator to replace the quote marks.
>
> This works OK for my language at least.  I couldn't find any guidance on
> whether there's a problem doing things this way for RTL languages etc,
> but +1 for doing it this way.
>

Thanks for the comments. I changed the patch to use the string
preparation in below fashion. Attaching the v5 patch. Please let me
know if there are any further inputs.

+                if (missingattcnt == 1)
+                    appendStringInfo(&missingattsbuf, _("\"%s\""),
+                                     remoterel->attnames[i]);
+                else
+                    appendStringInfo(&missingattsbuf, _(", \"%s\""),
+                                     remoterel->attnames[i]);

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Attaching v6 patch, rebased because of a recent commit
3d65b0593c5578014f62e09d4008006f1783f64d. Please consider this for
further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: Logical Replication - detail message with names of missing columns

From
Amit Kapila
Date:
On Wed, Sep 16, 2020 at 9:58 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Attaching v6 patch, rebased because of a recent commit
> 3d65b0593c5578014f62e09d4008006f1783f64d. Please consider this for
> further review.
>

Few comments:
==============
1.
+ /* Report error with names of the missing localrel column(s). */
+ if (!bms_is_empty(missingatts))
+ {
+ StringInfoData missingattsbuf;
+ int    missingattcnt = 0;
+
+ initStringInfo(&missingattsbuf);
+ while ((i = bms_first_member(missingatts)) >= 0)
+ {
+ missingattcnt++;
+ if (missingattcnt == 1)
+ appendStringInfo(&missingattsbuf, _("\"%s\""),
+ remoterel->attnames[i]);
+ else
+ appendStringInfo(&missingattsbuf, _(", \"%s\""),
+ remoterel->attnames[i]);
+ }
+
+ bms_free(missingatts);
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("logical replication target relation \"%s.%s\" is missing "
- "some replicated columns",
- remoterel->nspname, remoterel->relname)));
+ errmsg_plural("logical replication target relation \"%s.%s\" is missing "
+ "replicated column: %s",
+ "logical replication target relation \"%s.%s\" is missing "
+ "replicated columns: %s",
+ missingattcnt,
+ remoterel->nspname,
+ remoterel->relname,
+ missingattsbuf.data)));
+ }

I think it is better to move the above code in a separate function
(say logicalrep_report_missing_attrs or something like that).

2. I think we always need to call bms_free(missingatts) because it is
possible that there is no missing attribute and in that case, we won't
free the memory allocated in bms_add_range.

3. The patch doesn't seem to be freeing the memory allocated for missingattsbuf.

4.
  ereport(ERROR,
  (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
- errmsg("logical replication target relation \"%s.%s\" is missing "
- "some replicated columns",
- remoterel->nspname, remoterel->relname)));
+ errmsg_plural("logical replication target relation \"%s.%s\" is missing "
+ "replicated column: %s",
+ "logical replication target relation \"%s.%s\" is missing "
+ "replicated columns: %s",
+ missingattcnt,
+ remoterel->nspname,
+ remoterel->relname,
+ missingattsbuf.data)));

From the second line onwards, the message lines are not aligned in
errmsg_plural.

5. Also, in the above message, keep the error string in a single line.
For ex. see one of the existing messages:
errmsg_plural("WAL segment size must be a power of two between 1 MB
and 1 GB, but the control file specifies %d byte", .. I think it will
be easy to read that way. I know this is not exactly related to your
patch but improving it while changing this message seems fine.

6. I think we should add one test case for this in the existing
regression test (src/test/subscription/t/008_diff_schema).


-- 
With Regards,
Amit Kapila.



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Thanks Amit for the review comments. I will post an updated patch soon.

On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> 6. I think we should add one test case for this in the existing
> regression test (src/test/subscription/t/008_diff_schema).
>

This patch logs the missing column names message in subscriber server
logs. Is there a way to see the logs in these tests and use that as
expected result for the test case?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Logical Replication - detail message with names of missing columns

From
Amit Kapila
Date:
On Mon, Oct 5, 2020 at 8:00 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> Thanks Amit for the review comments. I will post an updated patch soon.
>
> On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> > 6. I think we should add one test case for this in the existing
> > regression test (src/test/subscription/t/008_diff_schema).
> >
>
> This patch logs the missing column names message in subscriber server
> logs. Is there a way to see the logs in these tests and use that as
> expected result for the test case?
>

I don't think there is any direct way to achieve that. What we can do
is to check that the data is not replicated in such a case but I don't
think it is worth to add a test for that behavior. So, I think we can
skip adding a test for this unless someone else has a better idea to
do the same.

-- 
With Regards,
Amit Kapila.



Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
Thanks Amit for the review comments.

On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> Few comments:
> ==============
> 1.
> + /* Report error with names of the missing localrel column(s). */
> + if (!bms_is_empty(missingatts))
> + {
> + StringInfoData missingattsbuf;
> + int    missingattcnt = 0;
> + remoterel->nspname,
> + remoterel->relname,
> + missingattsbuf.data)));
> + }
>
> I think it is better to move the above code in a separate function
> (say logicalrep_report_missing_attrs or something like that).
>

Added a new function logicalrep_report_missing_attrs().

>
> 2. I think we always need to call bms_free(missingatts) because it is
> possible that there is no missing attribute and in that case, we won't
> free the memory allocated in bms_add_range.
>

Done. Yes we palloc memory for missingatts bitmap irrespective of missing attributes. Added bms_free() out of if(!bms_is_empty(missingatts)) as well. I also kept bms_free() before ereport(ERROR,..) to free up before throwing the error. In anycase, only one bms_free() would get hit.

if (!bms_is_empty(missingatts))
{
StringInfoData missingattsbuf;
int   missingattcnt = 0;

bms_free(missingatts);
ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
errmsg_plural("logical replication target relation \"%s.%s\" is missing replicated column: %s",
 "logical replication target relation \"%s.%s\" is missing replicated columns: %s",
 missingattcnt,
 remoterel->nspname,
 remoterel->relname,
 missingattsbuf.data)));
}
bms_free(missingatts);

>
> 3. The patch doesn't seem to be freeing the memory allocated for missingattsbuf.
>

I don't think we need to do that. We are passing missingattsbuf.data to ereport and we are safe without freeing up missingattsbuf(we don't reach the code after ereprot(ERROR,...)as the table sync worker anyways goes away after throwing missing attributes error( if (sigsetjmp(local_sigjmp_buf, 1) != 0) in StartBackgroundWorker and then proc_exit(1)). Each time a new table sync bg worker is spawned.

2020-10-06 10:18:27.063 IST [1599963] ERROR:  logical replication target relation "public.t1" is missing replicated column: "@C1"
2020-10-06 10:18:47.179 IST [1600134] ERROR:  logical replication target relation "public.t1" is missing replicated column: "@C1"
2020-10-06 10:18:57.234 IST [1600214] ERROR:  logical replication target relation "public.t1" is missing replicated column: "@C1"

2020-10-06 10:19:27.415 IST [1600458] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "%b1", "@C1"
2020-10-06 10:19:42.506 IST [1600588] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "%b1", "@C1"
2020-10-06 10:19:52.565 IST [1600669] ERROR:  logical replication target relation "public.t1" is missing replicated columns: "%b1", "@C1"

>
> 4.
>   ereport(ERROR,
>   (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> - errmsg("logical replication target relation \"%s.%s\" is missing "
>
> From the second line onwards, the message lines are not aligned in
> errmsg_plural.
>

Done.

>
> 5. Also, in the above message, keep the error string in a single line.
> For ex. see one of the existing messages:
> errmsg_plural("WAL segment size must be a power of two between 1 MB
> and 1 GB, but the control file specifies %d byte", .. I think it will
> be easy to read that way. I know this is not exactly related to your
> patch but improving it while changing this message seems fine.
>

Done.

Attaching v7 patch please consider it for further review.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
Attachment

Re: Logical Replication - detail message with names of missing columns

From
Amit Kapila
Date:
On Tue, Oct 6, 2020 at 12:14 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
>
> On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> >
> >
> > 3. The patch doesn't seem to be freeing the memory allocated for missingattsbuf.
> >
>
> I don't think we need to do that. We are passing missingattsbuf.data to ereport and we are safe without freeing up
missingattsbuf(wedon't reach the code after ereprot(ERROR,...)as the table sync worker anyways goes away after throwing
missingattributes error( if (sigsetjmp(local_sigjmp_buf, 1) != 0) in StartBackgroundWorker and then proc_exit(1)). Each
timea new table sync bg worker is spawned. 
>

Okay, by that logic, we don't even need to free memory for missingatts.

I have made a few changes, (a) moved free of missingatts in the caller
where we are allocating it. (b) added/edited/removed comments, (c) ran
pgindent.

Shall we backpatch this? I don't see any particular need to backpatch
this as this is a minor error message improvement and nobody has
reported any problem due to this. What do you think?

--
With Regards,
Amit Kapila.

Attachment

Re: Logical Replication - detail message with names of missing columns

From
Bharath Rupireddy
Date:
On Tue, Oct 6, 2020 at 5:24 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Tue, Oct 6, 2020 at 12:14 PM Bharath Rupireddy
> <bharath.rupireddyforpostgres@gmail.com> wrote:
> >
> > On Mon, Oct 5, 2020 at 9:39 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > >
> > >
> > > 3. The patch doesn't seem to be freeing the memory allocated for missingattsbuf.
> > >
> >
> > I don't think we need to do that. We are passing missingattsbuf.data to ereport and we are safe without freeing up
missingattsbuf(wedon't reach the code after ereprot(ERROR,...)as the table sync worker anyways goes away after throwing
missingattributes error( if (sigsetjmp(local_sigjmp_buf, 1) != 0) in StartBackgroundWorker and then proc_exit(1)). Each
timea new table sync bg worker is spawned. 
> >
>
> Okay, by that logic, we don't even need to free memory for missingatts.
>
> I have made a few changes, (a) moved free of missingatts in the caller
> where we are allocating it. (b) added/edited/removed comments, (c) ran
> pgindent.
>

Thanks Amit. v8 patch looks good to me.

>
> Shall we backpatch this? I don't see any particular need to backpatch
> this as this is a minor error message improvement and nobody has
> reported any problem due to this. What do you think?
>

IMO, no backpatch is required as this is not a bug or something
reported by anyone.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: Logical Replication - detail message with names of missing columns

From
Alvaro Herrera
Date:
On 2020-Oct-06, Amit Kapila wrote:

> I have made a few changes, (a) moved free of missingatts in the caller
> where we are allocating it. (b) added/edited/removed comments, (c) ran
> pgindent.

This is committed as f07707099c17.

Thanks