Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHut+PuYOE3C_uv1MAD+k7K7wO0Aj1ffOtRAAeoy4FrhD-2T9g@mail.gmail.com Whole thread Raw |
In response to | Re: Pgoutput not capturing the generated columns (Shlok Kyal <shlok.kyal.oss@gmail.com>) |
Responses |
Re: Pgoutput not capturing the generated columns
|
List | pgsql-hackers |
Here are some review comments for the patch v10-0002. ====== Commit Message 1. Note that we don't copy columns when the subscriber-side column is also generated. Those will be filled as normal with the subscriber-side computed or default data. ~ Now this patch also introduced some errors etc, so I think that patch comment should be written differently to explicitly spell out behaviour of every combination, something like the below: Summary when (include_generated_column = true) * publisher not-generated column => subscriber not-generated column: This is just normal logical replication (not changed by this patch). * publisher not-generated column => subscriber generated column: This will give ERROR. * publisher generated column => subscriber not-generated column: The publisher generated column value is copied. * publisher generated column => subscriber generated column: The publisher generated column value is not copied. The subscriber generated column will be filled with the subscriber-side computed or default data. when (include_generated_columns = false) * publisher not-generated column => subscriber not-generated column: This is just normal logical replication (not changed by this patch). * publisher not-generated column => subscriber generated column: This will give ERROR. * publisher generated column => subscriber not-generated column: This will replicate nothing. Publisher generate-column is not replicated. The subscriber column will be filled with the subscriber-side default data. * publisher generated column => subscriber generated column: This will replicate nothing. Publisher generate-column is not replicated. The subscriber generated column will be filled with the subscriber-side computed or default data. ====== src/backend/replication/logical/relation.c 2. logicalrep_rel_open: I tested some of the "missing column" logic, and got the following results: Scenario A: PUB test_pub=# create table t2(a int, b int); test_pub=# create publication pub2 for table t2; SUB test_sub=# create table t2(a int, b int generated always as (a*2) stored); test_sub=# create subscription sub2 connection 'dbname=test_pub' publication pub2 with (include_generated_columns = false); Result: ERROR: logical replication target relation "public.t2" is missing replicated column: "b" ~ Scenario B: PUB/SUB identical to above, but subscription sub2 created "with (include_generated_columns = true);" Result: ERROR: logical replication target relation "public.t2" has a generated column "b" but corresponding column on source relation is not a generated column ~~~ 2a. Question Why should we get 2 different error messages for what is essentially the same problem according to whether the 'include_generated_columns' is false or true? Isn't the 2nd error message the more correct and useful one for scenarios like this involving generated columns? Thoughts? ~ 2b. Missing tests? I also noticed there seems no TAP test for the current "missing replicated column" message. IMO there should be a new test introduced for this because the loop involved too much bms logic to go untested... ====== src/backend/replication/logical/tablesync.c make_copy_attnamelist: NITPICK - minor comment tweak NITPICK - add some spaces after "if" code 3. Should you pfree the gencollist at the bottom of this function when you no longer need it, for tidiness? ~~~ 4. static void -fetch_remote_table_info(char *nspname, char *relname, +fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist, LogicalRepRelation *lrel, List **qual) { WalRcvExecResult *res; StringInfoData cmd; TupleTableSlot *slot; Oid tableRow[] = {OIDOID, CHAROID, CHAROID}; - Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID}; + Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID}; Oid qualRow[] = {TEXTOID}; bool isnull; + bool *remotegenlist_res; IMO the names 'remotegenlist' and 'remotegenlist_res' should be swapped the other way around, because it is the function parameter that is the "result", whereas the 'remotegenlist_res' is just the local working var for it. ~~~ 5. fetch_remote_table_info Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple places, I think it will be better to assign this to a 'server_version' variable to be used everywhere instead of having multiple function calls. ~~~ 6. "SELECT a.attnum," " a.attname," " a.atttypid," - " a.attnum = ANY(i.indkey)" + " a.attnum = ANY(i.indkey)," + " a.attgenerated != ''" " FROM pg_catalog.pg_attribute a" " LEFT JOIN pg_catalog.pg_index i" " ON (i.indexrelid = pg_get_replica_identity_index(%u))" " WHERE a.attnum > 0::pg_catalog.int2" - " AND NOT a.attisdropped %s" + " AND NOT a.attisdropped", lrel->remoteid); + + if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 && + walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) || + !MySubscription->includegencols) + appendStringInfo(&cmd, " AND a.attgenerated = ''"); + If the server version is < PG12 then AFAIK there was no such thing as "a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''" part also be guarded by some version checking condition like in the WHERE? Otherwise won't it cause an ERROR for old servers? ~~~ 7. /* - * For non-tables and tables with row filters, we need to do COPY - * (SELECT ...), but we can't just do SELECT * because we need to not - * copy generated columns. For tables with any row filters, build a - * SELECT query with OR'ed row filters for COPY. + * For non-tables and tables with row filters and when + * 'include_generated_columns' is specified as 'true', we need to do + * COPY (SELECT ...), as normal COPY of generated column is not + * supported. For tables with any row filters, build a SELECT query + * with OR'ed row filters for COPY. */ NITPICK. I felt this was not quite right. AFAIK the reasons for using this COPY (SELECT ...) syntax is different for row-filters and generated-columns. Anyway, I updated the comment slightly in my nitpicks attachment. Please have a look at it to see if you agree with the suggestions. Maybe I am wrong. ~~~ 8. - for (int i = 0; i < lrel.natts; i++) + foreach_ptr(String, att_name, attnamelist) I'm not 100% sure, but isn't foreach_node the macro to use here, rather than foreach_ptr? ====== src/test/subscription/t/011_generated.pl 9. Please discuss with Shubham how to make all the tab1, tab2, tab3, tab4, tab5, tab6 comments use the same kind of style/wording. Currently, the patches 0001 and 0002 test comments are a bit inconsistent. ~~~ 10. Related to above -- now that patch 0002 supports copy_data=true I don't see why we need to test generated columns *both* for copy_data=false and also for copy_data=true. IOW, is it really necessary to have so many tables/tests? For example, I am thinking some of those tests from patch 0001 can be re-used or just removed now that copy_data=true works. ~~~ NITPICK - minor comment tweak ~~~ 11. For tab4 and tab6 I saw the initial sync and normal replication data tests are all merged together, but I had expected to see the initial sync and normal replication data tests separated so it would be consistent with the earlier tab1, tab2, tab3 tests. ====== 99. Also, I have attached a nitpicks diff for some of the cosmetic review comments mentioned above. Please apply whatever of these that you agree with. ====== Kind Regards, Peter Smith. Fujitsu Australia
Attachment
pgsql-hackers by date: