Thread: BUG #18203: Logical Replication initial sync failure
The following bug has been logged on the website: Bug reference: 18203 Logged by: Justin G Email address: zzzzz.graf@gmail.com PostgreSQL version: 15.2 Operating system: Ubuntu 20 Description: We’ve found an edge case that breaks logical replication. Procedure: alter system set wal_level to 'logical' ; create database source; --pg 11 create database destin; --pg 15 create table no_col(); -- pg 11 insert into no_col default values; insert into no_col default values; insert into no_col default values; insert into no_col default values; insert into no_col default values; CREATE PUBLICATION no_col_pub for table no_col; create table no_col(); --pg 15 CREATE SUBSCRIPTION no_col_sub --pg15 CONNECTION 'host=<hostname/ip> port=5432 user=<set username> dbname=source password=<set the password>' PUBLICATION no_col_pub WITH ( CREATE_SLOT = TRUE, ENABLED = TRUE, COPY_DATA = TRUE ); We now have a table with 5 null rows. This is a valid table, pg_dump will dump it and restore it. count(*) will return 5. The dump file produced by pg_dump version 15 generates this SQL COPY public.no_col FROM stdin; However, the copy command from PG 15 using logical replication includes (), which normally has a list of columns between the parentheses. COPY public.no_col () TO STDOUT Logical replication worker returns with the following error: 2023-11-17 20:40:16.141 UTC [151084] ERROR: could not start initial contents copy for table "public.no_col": ERROR: syntax error at or near ")" LINE 1: COPY public.no_col () TO STDOUT If the direction of replication is reversed from PG15 to PG14, the above error is not thrown, and the 5 rows are copied over. Version tested Publishers PG 11.14, 14.2, 15.2 Subscriber PG 14.2, 15.2, , RDS 14.7 The error has been observed in 15.2 and in RDS PostgreSQL 14.7.
On Sat, 18 Nov 2023 at 18:10, PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 18203 > Logged by: Justin G > Email address: zzzzz.graf@gmail.com > PostgreSQL version: 15.2 > Operating system: Ubuntu 20 > Description: > > We’ve found an edge case that breaks logical replication. > > Procedure: > > alter system set wal_level to 'logical' ; > > create database source; --pg 11 > create database destin; --pg 15 > > create table no_col(); -- pg 11 > > insert into no_col default values; > insert into no_col default values; > insert into no_col default values; > insert into no_col default values; > insert into no_col default values; > > CREATE PUBLICATION no_col_pub for table no_col; > > create table no_col(); --pg 15 > > CREATE SUBSCRIPTION no_col_sub --pg15 > CONNECTION > 'host=<hostname/ip> > port=5432 > user=<set username> > dbname=source > password=<set the password>' > PUBLICATION no_col_pub > WITH ( > CREATE_SLOT = TRUE, > ENABLED = TRUE, > COPY_DATA = TRUE > ); > > > We now have a table with 5 null rows. This is a valid table, pg_dump will > dump it and restore it. count(*) will return 5. > > The dump file produced by pg_dump version 15 generates this SQL > COPY public.no_col FROM stdin; > > However, the copy command from PG 15 using logical replication includes (), > which normally has a list of columns between the parentheses. > > COPY public.no_col () TO STDOUT > > Logical replication worker returns with the following error: > 2023-11-17 20:40:16.141 UTC [151084] ERROR: could not start initial > contents copy for table "public.no_col": ERROR: syntax error at or near > ")" > LINE 1: COPY public.no_col () TO STDOUT Thanks for reporting this issue, I was able to reproduce the issue. This issue is happening because we are trying to specify the column list while the table sync worker is copying data for the table. I felt we should not specify the column list when the table has no columns. Attached patch has the changes to handle the same. Regards, Vignesh
Attachment
On Mon, Nov 20, 2023 at 2:53 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > Logical replication worker returns with the following error: > > 2023-11-17 20:40:16.141 UTC [151084] ERROR: could not start initial > > contents copy for table "public.no_col": ERROR: syntax error at or near > > ")" > > LINE 1: COPY public.no_col () TO STDOUT > > Thanks for reporting this issue, I was able to reproduce the issue. > This issue is happening because we are trying to specify the column > list while the table sync worker is copying data for the table. > I felt we should not specify the column list when the table has no columns. > Attached patch has the changes to handle the same. > I think we can add a test for such cases which tests both initial sync and regular replication on a table with no column. -- With Regards, Amit Kapila.
On Mon, Nov 20, 2023 at 8:23 PM vignesh C <vignesh21@gmail.com> wrote: > ... > Thanks for reporting this issue, I was able to reproduce the issue. > This issue is happening because we are trying to specify the column > list while the table sync worker is copying data for the table. > I felt we should not specify the column list when the table has no columns. > Attached patch has the changes to handle the same. > Hi Vignesh, One small comment about the patch: - appendStringInfo(&cmd, "COPY %s (", + appendStringInfo(&cmd, "COPY %s ", quote_qualified_identifier(lrel.nspname, lrel.relname)); - /* - * XXX Do we need to list the columns in all cases? Maybe we're - * replicating all columns? - */ - for (int i = 0; i < lrel.natts; i++) + /* If the table has columns, then specify the columns */ + if (lrel.natts) { - if (i > 0) - appendStringInfoString(&cmd, ", "); + appendStringInfoString(&cmd, "("); For consistent whitespace handling, and to prevent double-spacing when there are no columns, I think that the trailing space of "COPY %s " should be removed/replaced by a leading space for the "(". ====== Kind Regards, Peter Smith. Fujitsu Australia
On Mon, 20 Nov 2023 at 18:38, Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Nov 20, 2023 at 2:53 PM vignesh C <vignesh21@gmail.com> wrote: > > > > > > > > Logical replication worker returns with the following error: > > > 2023-11-17 20:40:16.141 UTC [151084] ERROR: could not start initial > > > contents copy for table "public.no_col": ERROR: syntax error at or near > > > ")" > > > LINE 1: COPY public.no_col () TO STDOUT > > > > Thanks for reporting this issue, I was able to reproduce the issue. > > This issue is happening because we are trying to specify the column > > list while the table sync worker is copying data for the table. > > I felt we should not specify the column list when the table has no columns. > > Attached patch has the changes to handle the same. > > > > I think we can add a test for such cases which tests both initial sync > and regular replication on a table with no column. I have added a test for the same and tested upto PG15 where the changes were added. The patches for the same are attached. Regards, Vignesh
Attachment
On Tue, 21 Nov 2023 at 03:38, Peter Smith <smithpb2250@gmail.com> wrote: > > On Mon, Nov 20, 2023 at 8:23 PM vignesh C <vignesh21@gmail.com> wrote: > > > ... > > Thanks for reporting this issue, I was able to reproduce the issue. > > This issue is happening because we are trying to specify the column > > list while the table sync worker is copying data for the table. > > I felt we should not specify the column list when the table has no columns. > > Attached patch has the changes to handle the same. > > > > Hi Vignesh, > > One small comment about the patch: > > - appendStringInfo(&cmd, "COPY %s (", > + appendStringInfo(&cmd, "COPY %s ", > quote_qualified_identifier(lrel.nspname, lrel.relname)); > > - /* > - * XXX Do we need to list the columns in all cases? Maybe we're > - * replicating all columns? > - */ > - for (int i = 0; i < lrel.natts; i++) > + /* If the table has columns, then specify the columns */ > + if (lrel.natts) > { > - if (i > 0) > - appendStringInfoString(&cmd, ", "); > + appendStringInfoString(&cmd, "("); > > For consistent whitespace handling, and to prevent double-spacing when > there are no columns, I think that the trailing space of "COPY %s " > should be removed/replaced by a leading space for the "(". This is fixed in the v2 version patch attached at: https://www.postgresql.org/message-id/CALDaNm2kSJCVree6R1iKxjbA%3DSg4raFi3Fi_%2BwfF2xn2roxqCg%40mail.gmail.com Regards, Vignesh
The v2 patch LGTM except for the commit message. a) wording - "for tables have no columns too" b) the plural of "parenthesis" is "parentheses" ====== Kind Regards, Peter Smith. Fujitsu Australia
On Tue, 21 Nov 2023 at 13:31, Peter Smith <smithpb2250@gmail.com> wrote: > > The v2 patch LGTM except for the commit message. > > a) wording - "for tables have no columns too" > b) the plural of "parenthesis" is "parentheses" Thanks, these are fixed in the attached v3 version patch. Regards, Vignesh