Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Shlok Kyal |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CANhcyEW95M_usF1OJDudeejs0L0+YOEb=dXmt_4Hs-70=CXa-g@mail.gmail.com Whole thread Raw |
In response to | Re: Pgoutput not capturing the generated columns (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Pgoutput not capturing the generated columns
|
List | pgsql-hackers |
On Tue, 25 Jun 2024 at 11:56, Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some review comments for the patch v10-0002. > > ====== > Commit Message > > 1. > Note that we don't copy columns when the subscriber-side column is also > generated. Those will be filled as normal with the subscriber-side computed or > default data. > > ~ > > Now this patch also introduced some errors etc, so I think that patch > comment should be written differently to explicitly spell out > behaviour of every combination, something like the below: > > Summary > > when (include_generated_column = true) > > * publisher not-generated column => subscriber not-generated column: > This is just normal logical replication (not changed by this patch). > > * publisher not-generated column => subscriber generated column: This > will give ERROR. > > * publisher generated column => subscriber not-generated column: The > publisher generated column value is copied. > > * publisher generated column => subscriber generated column: The > publisher generated column value is not copied. The subscriber > generated column will be filled with the subscriber-side computed or > default data. > > when (include_generated_columns = false) > > * publisher not-generated column => subscriber not-generated column: > This is just normal logical replication (not changed by this patch). > > * publisher not-generated column => subscriber generated column: This > will give ERROR. > > * publisher generated column => subscriber not-generated column: This > will replicate nothing. Publisher generate-column is not replicated. > The subscriber column will be filled with the subscriber-side default > data. > > * publisher generated column => subscriber generated column: This > will replicate nothing. Publisher generate-column is not replicated. > The subscriber generated column will be filled with the > subscriber-side computed or default data. Modified > ====== > src/backend/replication/logical/relation.c > > 2. > logicalrep_rel_open: > > I tested some of the "missing column" logic, and got the following results: > > Scenario A: > PUB > test_pub=# create table t2(a int, b int); > test_pub=# create publication pub2 for table t2; > SUB > test_sub=# create table t2(a int, b int generated always as (a*2) stored); > test_sub=# create subscription sub2 connection 'dbname=test_pub' > publication pub2 with (include_generated_columns = false); > Result: > ERROR: logical replication target relation "public.t2" is missing > replicated column: "b" > > ~ > > Scenario B: > PUB/SUB identical to above, but subscription sub2 created "with > (include_generated_columns = true);" > Result: > ERROR: logical replication target relation "public.t2" has a > generated column "b" but corresponding column on source relation is > not a generated column > > ~~~ > > 2a. Question > > Why should we get 2 different error messages for what is essentially > the same problem according to whether the 'include_generated_columns' > is false or true? Isn't the 2nd error message the more correct and > useful one for scenarios like this involving generated columns? > > Thoughts? Did the modification to give same error in both cases > ~ > > 2b. Missing tests? > > I also noticed there seems no TAP test for the current "missing > replicated column" message. IMO there should be a new test introduced > for this because the loop involved too much bms logic to go > untested... Added the tests 004_sync.pl > ====== > src/backend/replication/logical/tablesync.c > > make_copy_attnamelist: > NITPICK - minor comment tweak > NITPICK - add some spaces after "if" code Applied the changes > 3. > Should you pfree the gencollist at the bottom of this function when > you no longer need it, for tidiness? Fixed > ~~~ > > 4. > static void > -fetch_remote_table_info(char *nspname, char *relname, > +fetch_remote_table_info(char *nspname, char *relname, bool **remotegenlist, > LogicalRepRelation *lrel, List **qual) > { > WalRcvExecResult *res; > StringInfoData cmd; > TupleTableSlot *slot; > Oid tableRow[] = {OIDOID, CHAROID, CHAROID}; > - Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID}; > + Oid attrRow[] = {INT2OID, TEXTOID, OIDOID, BOOLOID, BOOLOID}; > Oid qualRow[] = {TEXTOID}; > bool isnull; > + bool *remotegenlist_res; > > IMO the names 'remotegenlist' and 'remotegenlist_res' should be > swapped the other way around, because it is the function parameter > that is the "result", whereas the 'remotegenlist_res' is just the > local working var for it. Fixed > ~~~ > > 5. fetch_remote_table_info > > Now walrcv_server_version(LogRepWorkerWalRcvConn) is used in multiple > places, I think it will be better to assign this to a 'server_version' > variable to be used everywhere instead of having multiple function > calls. Fixed > ~~~ > > 6. > "SELECT a.attnum," > " a.attname," > " a.atttypid," > - " a.attnum = ANY(i.indkey)" > + " a.attnum = ANY(i.indkey)," > + " a.attgenerated != ''" > " FROM pg_catalog.pg_attribute a" > " LEFT JOIN pg_catalog.pg_index i" > " ON (i.indexrelid = pg_get_replica_identity_index(%u))" > " WHERE a.attnum > 0::pg_catalog.int2" > - " AND NOT a.attisdropped %s" > + " AND NOT a.attisdropped", lrel->remoteid); > + > + if ((walrcv_server_version(LogRepWorkerWalRcvConn) >= 120000 && > + walrcv_server_version(LogRepWorkerWalRcvConn) <= 160000) || > + !MySubscription->includegencols) > + appendStringInfo(&cmd, " AND a.attgenerated = ''"); > + > > If the server version is < PG12 then AFAIK there was no such thing as > "a.attgenerated", so shouldn't that SELECT " a.attgenerated != ''" > part also be guarded by some version checking condition like in the > WHERE? Otherwise won't it cause an ERROR for old servers? Fixed > ~~~ > > 7. > /* > - * For non-tables and tables with row filters, we need to do COPY > - * (SELECT ...), but we can't just do SELECT * because we need to not > - * copy generated columns. For tables with any row filters, build a > - * SELECT query with OR'ed row filters for COPY. > + * For non-tables and tables with row filters and when > + * 'include_generated_columns' is specified as 'true', we need to do > + * COPY (SELECT ...), as normal COPY of generated column is not > + * supported. For tables with any row filters, build a SELECT query > + * with OR'ed row filters for COPY. > */ > > NITPICK. I felt this was not quite right. AFAIK the reasons for using > this COPY (SELECT ...) syntax is different for row-filters and > generated-columns. Anyway, I updated the comment slightly in my > nitpicks attachment. Please have a look at it to see if you agree with > the suggestions. Maybe I am wrong. Fixed > ~~~ > > 8. > - for (int i = 0; i < lrel.natts; i++) > + foreach_ptr(String, att_name, attnamelist) > > I'm not 100% sure, but isn't foreach_node the macro to use here, > rather than foreach_ptr? Fixed > ====== > src/test/subscription/t/011_generated.pl > > 9. > Please discuss with Shubham how to make all the tab1, tab2, tab3, > tab4, tab5, tab6 comments use the same kind of style/wording. > Currently, the patches 0001 and 0002 test comments are a bit > inconsistent. Fixed > ~~~ > > 10. > Related to above -- now that patch 0002 supports copy_data=true I > don't see why we need to test generated columns *both* for > copy_data=false and also for copy_data=true. IOW, is it really > necessary to have so many tables/tests? For example, I am thinking > some of those tests from patch 0001 can be re-used or just removed now > that copy_data=true works. Fixed > ~~~ > > NITPICK - minor comment tweak Fixed > ~~~ > > 11. > For tab4 and tab6 I saw the initial sync and normal replication data > tests are all merged together, but I had expected to see the initial > sync and normal replication data tests separated so it would be > consistent with the earlier tab1, tab2, tab3 tests. Fixed > ====== > > 99. > Also, I have attached a nitpicks diff for some of the cosmetic review > comments mentioned above. Please apply whatever of these that you > agree with. Applied the relevant changes I have attached a v14 to fix the comments. Thanks and Regards, Shlok Kyal
Attachment
pgsql-hackers by date: