Re: Add support for specifying tables in pg_createsubscriber. - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Add support for specifying tables in pg_createsubscriber.
Date
Msg-id CAHut+PtkKLHZp5q3e-Tpmg_TmB=uCMfAzYO6SA7b5YyodwDkCg@mail.gmail.com
Whole thread Raw
In response to Re: Add support for specifying tables in pg_createsubscriber.  (Shubham Khanna <khannashubham1197@gmail.com>)
List pgsql-hackers
Hi Shubham,

Some review comments for patch v2-0001

======
1. General - compile errors!

Patch applies OK, but I cannot build pg_createsubscriber. e.g.

pg_createsubscriber.c: In function ‘main’:
pg_createsubscriber.c:2237:5: error: a label can only be part of a
statement and a declaration is not a statement
     TableListPerDB * newdb;
     ^
pg_createsubscriber.c:2324:5: error: a label can only be part of a
statement and a declaration is not a statement
     TableSpec * ts = pg_malloc0(sizeof(TableSpec));
     ^
pg_createsubscriber.c:2325:5: error: expected expression before ‘PQExpBuffer’
     PQExpBuffer dbbuf;
     ^
pg_createsubscriber.c:2326:5: warning: ISO C90 forbids mixed
declarations and code [-Wdeclaration-after-statement]
     PQExpBuffer schemabuf;
     ^
pg_createsubscriber.c:2335:5: error: ‘dbbuf’ undeclared (first use in
this function)
     dbbuf = createPQExpBuffer();

I'm not sure how this can be working for you (???). You cannot declare
variables without introducing a code block in the scope of the switch
case.

======
doc/src/sgml/ref/pg_createsubscriber.sgml

2.
--table is missing from synopsis

~~~

3.
+     <term><option>--table=<replaceable
class="parameter">table</replaceable></option></term>
+     <listitem>
+      <para>
+       Adds a table to be included in the publication for the most recently
+       specified database. Can be repeated multiple times. The syntax
+       supports optional column lists and WHERE clauses.
+      </para>
+     </listitem>
+    </varlistentry>

Lacks detail, so I can't tell how to use this. e.g:
- What does the syntax actually look like?
- code suggests you can specify DB, but this doc says it only applies
to the most recent DB
- how supports column lists and WHERE clause (needs examples)
- needs rules FOR ALL TABLES, etc.
- allowed in combination with --all?
- etc.

======
src/bin/pg_basebackup/pg_createsubscriber.c

4.
+typedef struct TableListPerDB
+{
+ char    *dbname;
+ TableSpec  *tables;
+ struct TableListPerDB *next;
+} TableListPerDB;

I didn't understand the need for this "per-DB" structure.

Later, you declare "TableSpec  *tables;" within the "LogicalRepInfo"
structure (which is per-DB) so you already have the "per-DB" table
list right there. Even if you need to maintain some global static
list, then I imagine you could just put a 'dbname' member in
TableSpec. You don't need a whole new structure to do it.

~~~

create_publication:

5.
+ if (dbinfo->tables == NULL)
+ appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES", ipubname_esc);
+ else
+ {
+ bool first = true;
+
+ appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR TABLE ", ipubname_esc);
+ for (TableSpec * tbl = dbinfo->tables; tbl != NULL; tbl = tbl->next)

What if '*' is specified for the table name? Should that cause a
"CREATE PUBLICATION ... FOR TABLES IN SCHEMA ..." instead of making a
publication with 100s or more tables in a FOR TABLES?

~~~

6.
+ for (int i = 0; i < PQntuples(tres); i++)
+ {
+ char    *escaped_schema = PQescapeIdentifier(conn, PQgetvalue(tres, i, 0),
+ strlen(PQgetvalue(tres, i, 0)));
+ char    *escaped_table = PQescapeIdentifier(conn, PQgetvalue(tres, i, 1),
+    strlen(PQgetvalue(tres, i, 1)));
+
+ appendPQExpBuffer(str, "%s%s.%s", first ? "" : ", ",
+   escaped_schema, escaped_table);
+
+ PQfreemem(escaped_schema);
+ PQfreemem(escaped_table);
+ first = false;
+ }

6a.
How about some other simple variables to avoid all the repeated PQgetvalue?

e.g.
char *sch = PQgetvalue(tres, i, 0);
char *tbl = PQgetvalue(tres, i, 1);
char *escaped_schema = PQescapeIdentifier(conn, sch, strlen(sch));
char *escaped_table = PQescapeIdentifier(conn, tbl, strlen(tbl));

~

6b.
Variable 'first' is redundant. Same as checking 'i == 0'.

~~~

7.
+ if (dry_run)
+ {
+ res = PQexec(conn, "BEGIN");
+ if (PQresultStatus(res) != PGRES_COMMAND_OK)
+ {

Would it be better to use if/else instead of:

if (dry_run)
if (!dry_run)

~~~

main:

8.
+ TableListPerDB * newdb;
  if (!simple_string_list_member(&opt.database_names, optarg))
  {
  simple_string_list_append(&opt.database_names, optarg);


8a.
Compile error. Need {} scope to declare that variable!

~

8b.
This seems a strange muddle of 'd' which represents DATABASE and
TableListPerDB, which is a list of tables (not a database). I have
doubts that most of this linked list code is even necessary for the
'd' case.

~~~

9.
+ case 6:
+ TableSpec * ts = pg_malloc0(sizeof(TableSpec));
+ PQExpBuffer dbbuf;

Compile error. Need {} scope to declare that variable!

~~~

10.
+ if (dotcnt == 2)
+ {
+ ts->pattern_db_regex = NULL;
+ ts->pattern_schema_regex = pg_strdup(schemabuf->data);
+ ts->pattern_table_regex = pg_strdup(namebuf->data);
+ }
+ else if (dotcnt == 1)
+ {
+ ts->pattern_db_regex = NULL;
+ ts->pattern_schema_regex = pg_strdup(dbbuf->data);
+ ts->pattern_table_regex = pg_strdup(schemabuf->data);
+ }
+ else
+ pg_fatal("invalid --table specification: %s", optarg);

10a.
This code seems quite odd to me:
- The DB becomes the schema?
- The schema becomes the table?

Rather than fudging all the names of the --table parts, if you don't
know what they represent up-fron,t probably it is better to call them
just part1,part2,part3.

~~~

10b.
Why is pattern_db_regex always NULL? If it is always NULL why have it at all?


======
.../t/040_pg_createsubscriber.pl

11.
+# Test: Table-level publication creation
+$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)");
+$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)");
+$node_p->safe_psql($db4,
+ "CREATE TABLE public.t3 (id int, val text, extra int)");
+

IIUC, the schema name is part of the table syntax. So, you should
include test cases for different schemas.

~~~

12.
+# Run pg_createsubscriber with table-level options
+command_ok(
+ [
+ 'pg_createsubscriber',
+ '--verbose',
+ '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
+ '--pgdata' => $node_s2->data_dir,
+ '--publisher-server' => $node_p->connstr($db3),
+ '--socketdir' => $node_s2->host,
+ '--subscriber-port' => $node_s2->port,
+ '--database' => $db3,
+ '--table' => "$db3.public.t1",
+ '--table' => "$db3.public.t2",
+ '--database' => $db4,
+ '--table' => "$db4.public.t3",
+ ],
+ 'pg_createsubscriber runs with table-level publication (existing nodes)');


12a.
This is not really testing the same as what the commit message
describes. e.g. what about a test case where --table does not mention
the db explicitly, so relies on the most recent.

~

12b.
What should happen if the explicitly named DB in --table is not the
same as the most recent --database, even though it is otherwise
correct?

e.g.
'--database' => $db3,
'--table' => "$db3.public.t1",
'--database' => $db4,
'--table' => "$db3.public.t2",
'--table' => "$db4.public.t3",
I quickly tried it and AFAICT this was silently accepted and then the
test failed because it gave unexpected results. It doesn't seem good
behaviour.

~

12c.

(related to 12b/12c).

I became suspicious that the DB name part of the --table option is
completely bogus. And it is. The test still passes OK even after I
write junk in the --table database part, like below.

command_ok(
[
'pg_createsubscriber',
'--verbose',
'--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default,
'--pgdata' => $node_s2->data_dir,
'--publisher-server' => $node_p->connstr($db3),
'--socketdir' => $node_s2->host,
'--subscriber-port' => $node_s2->port,
'--database' => $db3,
'--table' => "$db3.public.t1",
'--table' => "REALLY???.public.t2",
'--database' => $db4,
'--table' => "$db4.public.t3",
],
'pg_createsubscriber runs with table-level publication (existing nodes)');
~~~

13.
+# Get the publication name created by pg_createsubscriber for db3
+my $pubname1 = $node_p->safe_psql(
+ $db3, qq(
+    SELECT pubname FROM pg_publication
+    WHERE pubname LIKE 'pg_createsubscriber_%'
+    ORDER BY pubname LIMIT 1
+));
+

Why don't you just name the publication explicitly in the command?
Then you don't need any of this code to discover the publication name
here.

~~~

14.
+# Check publication tables for db3
+my $actual1 = $node_p->safe_psql(
+ $db3, qq(
+ SELECT pubname || '|public|' || tablename
+ FROM pg_publication_tables
+ WHERE pubname = '$pubname1'
+ ORDER BY tablename
+));
+is($actual1, "$pubname1|public|t1\n$pubname1|public|t2",
+ 'single publication for both tables created successfully on database db3'
+);

What is the point of hardwiring the 'public' in the concatenated
string, and then verifying that it is still there in the result? Why
not hardwire 'banana' instead of 'public' -- it passes the test just
the same.

~~~

15.
+# Get the publication name created by pg_createsubscriber for db4
+my $pubname2 = $node_p->safe_psql(
+ $db4, qq(
+ SELECT pubname FROM pg_publication
+ WHERE pubname LIKE 'pg_createsubscriber_%'
+ ORDER BY pubname LIMIT 1
+));
+

(same as #13 before)

Why don't you simply name the publication explicitly in the command?
Then you don't need any of this code to discover the publication name
here.

~~~

16.
+# Check publication tables for db4
+my $actual2 = $node_p->safe_psql(
+ $db4, qq(
+ SELECT pubname || '|public|' || tablename
+ FROM pg_publication_tables
+ WHERE pubname = '$pubname2'
+ ORDER BY tablename
+));
+is($actual2, "$pubname2|public|t3",
+ 'single publication for t3 created successfully on database db4');
+

(same as #14 before)

What is the point of the hardwired  'public'?

======
Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Compilation issues for HASH_STATISTICS and HASH_DEBUG options
Next
From: Kirill Reshke
Date:
Subject: Re: Sequence Access Methods, round two