Re: Add support for specifying tables in pg_createsubscriber. - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Add support for specifying tables in pg_createsubscriber. |
Date | |
Msg-id | CAHut+PtkKLHZp5q3e-Tpmg_TmB=uCMfAzYO6SA7b5YyodwDkCg@mail.gmail.com Whole thread Raw |
In response to | Re: Add support for specifying tables in pg_createsubscriber. (Shubham Khanna <khannashubham1197@gmail.com>) |
List | pgsql-hackers |
Hi Shubham, Some review comments for patch v2-0001 ====== 1. General - compile errors! Patch applies OK, but I cannot build pg_createsubscriber. e.g. pg_createsubscriber.c: In function ‘main’: pg_createsubscriber.c:2237:5: error: a label can only be part of a statement and a declaration is not a statement TableListPerDB * newdb; ^ pg_createsubscriber.c:2324:5: error: a label can only be part of a statement and a declaration is not a statement TableSpec * ts = pg_malloc0(sizeof(TableSpec)); ^ pg_createsubscriber.c:2325:5: error: expected expression before ‘PQExpBuffer’ PQExpBuffer dbbuf; ^ pg_createsubscriber.c:2326:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] PQExpBuffer schemabuf; ^ pg_createsubscriber.c:2335:5: error: ‘dbbuf’ undeclared (first use in this function) dbbuf = createPQExpBuffer(); I'm not sure how this can be working for you (???). You cannot declare variables without introducing a code block in the scope of the switch case. ====== doc/src/sgml/ref/pg_createsubscriber.sgml 2. --table is missing from synopsis ~~~ 3. + <term><option>--table=<replaceable class="parameter">table</replaceable></option></term> + <listitem> + <para> + Adds a table to be included in the publication for the most recently + specified database. Can be repeated multiple times. The syntax + supports optional column lists and WHERE clauses. + </para> + </listitem> + </varlistentry> Lacks detail, so I can't tell how to use this. e.g: - What does the syntax actually look like? - code suggests you can specify DB, but this doc says it only applies to the most recent DB - how supports column lists and WHERE clause (needs examples) - needs rules FOR ALL TABLES, etc. - allowed in combination with --all? - etc. ====== src/bin/pg_basebackup/pg_createsubscriber.c 4. +typedef struct TableListPerDB +{ + char *dbname; + TableSpec *tables; + struct TableListPerDB *next; +} TableListPerDB; I didn't understand the need for this "per-DB" structure. Later, you declare "TableSpec *tables;" within the "LogicalRepInfo" structure (which is per-DB) so you already have the "per-DB" table list right there. Even if you need to maintain some global static list, then I imagine you could just put a 'dbname' member in TableSpec. You don't need a whole new structure to do it. ~~~ create_publication: 5. + if (dbinfo->tables == NULL) + appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR ALL TABLES", ipubname_esc); + else + { + bool first = true; + + appendPQExpBuffer(str, "CREATE PUBLICATION %s FOR TABLE ", ipubname_esc); + for (TableSpec * tbl = dbinfo->tables; tbl != NULL; tbl = tbl->next) What if '*' is specified for the table name? Should that cause a "CREATE PUBLICATION ... FOR TABLES IN SCHEMA ..." instead of making a publication with 100s or more tables in a FOR TABLES? ~~~ 6. + for (int i = 0; i < PQntuples(tres); i++) + { + char *escaped_schema = PQescapeIdentifier(conn, PQgetvalue(tres, i, 0), + strlen(PQgetvalue(tres, i, 0))); + char *escaped_table = PQescapeIdentifier(conn, PQgetvalue(tres, i, 1), + strlen(PQgetvalue(tres, i, 1))); + + appendPQExpBuffer(str, "%s%s.%s", first ? "" : ", ", + escaped_schema, escaped_table); + + PQfreemem(escaped_schema); + PQfreemem(escaped_table); + first = false; + } 6a. How about some other simple variables to avoid all the repeated PQgetvalue? e.g. char *sch = PQgetvalue(tres, i, 0); char *tbl = PQgetvalue(tres, i, 1); char *escaped_schema = PQescapeIdentifier(conn, sch, strlen(sch)); char *escaped_table = PQescapeIdentifier(conn, tbl, strlen(tbl)); ~ 6b. Variable 'first' is redundant. Same as checking 'i == 0'. ~~~ 7. + if (dry_run) + { + res = PQexec(conn, "BEGIN"); + if (PQresultStatus(res) != PGRES_COMMAND_OK) + { Would it be better to use if/else instead of: if (dry_run) if (!dry_run) ~~~ main: 8. + TableListPerDB * newdb; if (!simple_string_list_member(&opt.database_names, optarg)) { simple_string_list_append(&opt.database_names, optarg); 8a. Compile error. Need {} scope to declare that variable! ~ 8b. This seems a strange muddle of 'd' which represents DATABASE and TableListPerDB, which is a list of tables (not a database). I have doubts that most of this linked list code is even necessary for the 'd' case. ~~~ 9. + case 6: + TableSpec * ts = pg_malloc0(sizeof(TableSpec)); + PQExpBuffer dbbuf; Compile error. Need {} scope to declare that variable! ~~~ 10. + if (dotcnt == 2) + { + ts->pattern_db_regex = NULL; + ts->pattern_schema_regex = pg_strdup(schemabuf->data); + ts->pattern_table_regex = pg_strdup(namebuf->data); + } + else if (dotcnt == 1) + { + ts->pattern_db_regex = NULL; + ts->pattern_schema_regex = pg_strdup(dbbuf->data); + ts->pattern_table_regex = pg_strdup(schemabuf->data); + } + else + pg_fatal("invalid --table specification: %s", optarg); 10a. This code seems quite odd to me: - The DB becomes the schema? - The schema becomes the table? Rather than fudging all the names of the --table parts, if you don't know what they represent up-fron,t probably it is better to call them just part1,part2,part3. ~~~ 10b. Why is pattern_db_regex always NULL? If it is always NULL why have it at all? ====== .../t/040_pg_createsubscriber.pl 11. +# Test: Table-level publication creation +$node_p->safe_psql($db3, "CREATE TABLE public.t1 (id int, val text)"); +$node_p->safe_psql($db3, "CREATE TABLE public.t2 (id int, val text)"); +$node_p->safe_psql($db4, + "CREATE TABLE public.t3 (id int, val text, extra int)"); + IIUC, the schema name is part of the table syntax. So, you should include test cases for different schemas. ~~~ 12. +# Run pg_createsubscriber with table-level options +command_ok( + [ + 'pg_createsubscriber', + '--verbose', + '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default, + '--pgdata' => $node_s2->data_dir, + '--publisher-server' => $node_p->connstr($db3), + '--socketdir' => $node_s2->host, + '--subscriber-port' => $node_s2->port, + '--database' => $db3, + '--table' => "$db3.public.t1", + '--table' => "$db3.public.t2", + '--database' => $db4, + '--table' => "$db4.public.t3", + ], + 'pg_createsubscriber runs with table-level publication (existing nodes)'); 12a. This is not really testing the same as what the commit message describes. e.g. what about a test case where --table does not mention the db explicitly, so relies on the most recent. ~ 12b. What should happen if the explicitly named DB in --table is not the same as the most recent --database, even though it is otherwise correct? e.g. '--database' => $db3, '--table' => "$db3.public.t1", '--database' => $db4, '--table' => "$db3.public.t2", '--table' => "$db4.public.t3", I quickly tried it and AFAICT this was silently accepted and then the test failed because it gave unexpected results. It doesn't seem good behaviour. ~ 12c. (related to 12b/12c). I became suspicious that the DB name part of the --table option is completely bogus. And it is. The test still passes OK even after I write junk in the --table database part, like below. command_ok( [ 'pg_createsubscriber', '--verbose', '--recovery-timeout' => $PostgreSQL::Test::Utils::timeout_default, '--pgdata' => $node_s2->data_dir, '--publisher-server' => $node_p->connstr($db3), '--socketdir' => $node_s2->host, '--subscriber-port' => $node_s2->port, '--database' => $db3, '--table' => "$db3.public.t1", '--table' => "REALLY???.public.t2", '--database' => $db4, '--table' => "$db4.public.t3", ], 'pg_createsubscriber runs with table-level publication (existing nodes)'); ~~~ 13. +# Get the publication name created by pg_createsubscriber for db3 +my $pubname1 = $node_p->safe_psql( + $db3, qq( + SELECT pubname FROM pg_publication + WHERE pubname LIKE 'pg_createsubscriber_%' + ORDER BY pubname LIMIT 1 +)); + Why don't you just name the publication explicitly in the command? Then you don't need any of this code to discover the publication name here. ~~~ 14. +# Check publication tables for db3 +my $actual1 = $node_p->safe_psql( + $db3, qq( + SELECT pubname || '|public|' || tablename + FROM pg_publication_tables + WHERE pubname = '$pubname1' + ORDER BY tablename +)); +is($actual1, "$pubname1|public|t1\n$pubname1|public|t2", + 'single publication for both tables created successfully on database db3' +); What is the point of hardwiring the 'public' in the concatenated string, and then verifying that it is still there in the result? Why not hardwire 'banana' instead of 'public' -- it passes the test just the same. ~~~ 15. +# Get the publication name created by pg_createsubscriber for db4 +my $pubname2 = $node_p->safe_psql( + $db4, qq( + SELECT pubname FROM pg_publication + WHERE pubname LIKE 'pg_createsubscriber_%' + ORDER BY pubname LIMIT 1 +)); + (same as #13 before) Why don't you simply name the publication explicitly in the command? Then you don't need any of this code to discover the publication name here. ~~~ 16. +# Check publication tables for db4 +my $actual2 = $node_p->safe_psql( + $db4, qq( + SELECT pubname || '|public|' || tablename + FROM pg_publication_tables + WHERE pubname = '$pubname2' + ORDER BY tablename +)); +is($actual2, "$pubname2|public|t3", + 'single publication for t3 created successfully on database db4'); + (same as #14 before) What is the point of the hardwired 'public'? ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: