Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Shubham Khanna |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHv8RjJcOsk=y+vJ3y+vXhzR9ZUzUEURvS_90hQW3MNfJ5di7A@mail.gmail.com Whole thread Raw |
In response to | Re: Pgoutput not capturing the generated columns (Peter Smith <smithpb2250@gmail.com>) |
Responses |
Re: Pgoutput not capturing the generated columns
Re: Pgoutput not capturing the generated columns |
List | pgsql-hackers |
On Thu, May 16, 2024 at 11:35 AM Peter Smith <smithpb2250@gmail.com> wrote: > > Here are some review comments for the patch v1-0001. > > ====== > GENERAL > > G.1. Use consistent names > > It seems to add unnecessary complications by having different names > for all the new options, fields and API parameters. > > e.g. sometimes 'include_generated_columns' > e.g. sometimes 'publish_generated_columns' > > Won't it be better to just use identical names everywhere for > everything? I don't mind which one you choose; I just felt you only > need one name, not two. This comment overrides everything else in this > post so whatever name you choose, make adjustments for all my other > review comments as necessary. I have updated the name to 'include_generated_columns' everywhere in the Patch. > ====== > > G.2. Is it possible to just use the existing bms? > > A very large part of this patch is adding more API parameters to > delegate the 'publish_generated_columns' flag value down to when it is > finally checked and used. e.g. > > The functions: > - logicalrep_write_insert(), logicalrep_write_update(), > logicalrep_write_delete() > ... are delegating the new parameter 'publish_generated_column' down to: > - logicalrep_write_tuple > > The functions: > - logicalrep_write_rel() > ... are delegating the new parameter 'publish_generated_column' down to: > - logicalrep_write_attrs > > AFAICT in all these places the API is already passing a "Bitmapset > *columns". I was wondering if it might be possible to modify the > "Bitmapset *columns" BEFORE any of those functions get called so that > the "columns" BMS either does or doesn't include generated cols (as > appropriate according to the option). > > Well, it might not be so simple because there are some NULL BMS > considerations also, but I think it would be worth investigating at > least, because if indeed you can find some common place (somewhere > like pgoutput_change()?) where the columns BMS can be filtered to > remove bits for generated cols then it could mean none of those other > patch API changes are needed at all -- then the patch would only be > 1/2 the size. I will analyse and reply to this in the next version. > ====== > Commit message > > 1. > Now if include_generated_columns option is specified, the generated > column information and generated column data also will be sent. > > Usage from pgoutput plugin: > SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, > 'proto_version', '1', 'publication_names', 'pub1', > 'include_generated_columns', 'true'); > > Usage from test_decoding plugin: > SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, > 'include-xids', '0', 'skip-empty-xacts', '1', > 'include_generated_columns', '1'); > > ~ > > I think there needs to be more background information given here. This > commit message doesn't seem to describe anything about what is the > problem and how this patch fixes it. It just jumps straight into > giving usages of a 'include_generated_columns' option. > > It also doesn't say that this is an option that was newly *introduced* > by the patch -- it refers to it as though the reader should already > know about it. > > Furthermore, your hacker's post says "Currently it is not supported as > a subscription option because table sync for the generated column is > not possible as copy command does not support getting data for the > generated column. If this feature is required we can remove this > limitation from the copy command and then add it as a subscription > option later." IMO that all seems like the kind of information that > ought to also be mentioned in this commit message. I have updated the Commit message mentioning the suggested changes. > ====== > contrib/test_decoding/sql/ddl.sql > > 2. > +-- check include_generated_columns option with generated column > +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS > AS (a * 2) STORED); > +INSERT INTO gencoltable (a) VALUES (1), (2), (3); > +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, > NULL, 'include-xids', '0', 'skip-empty-xacts', '1', > 'include_generated_columns', '1'); > +DROP TABLE gencoltable; > + > > 2a. > Perhaps you should include both option values to demonstrate the > difference in behaviour: > > 'include_generated_columns', '0' > 'include_generated_columns', '1' Added the other option values to demonstrate the difference in behaviour: > 2b. > I think you maybe need to include more some test combinations where > there is and isn't a COLUMN LIST, because I am not 100% sure I > understand the current logic/expectations for all combinations. > > e.g. When the generated column is in a column list but > 'publish_generated_columns' is false then what should happen? etc. > Also if there are any special rules then those should be mentioned in > the commit message. Test case is added and the same is mentioned in the documentation. > ====== > src/backend/replication/logical/proto.c > > 3. > For all the API changes the new parameter name should be plural. > > /publish_generated_column/publish_generated_columns/ Updated the name to 'include_generated_columns' > 4. logical_rep_write_tuple: > > - if (att->attisdropped || att->attgenerated) > + if (att->attisdropped) > continue; > > - if (!column_in_column_list(att->attnum, columns)) > + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) > + continue; > + > + if (att->attgenerated && !publish_generated_column) > continue; > That code seems confusing. Shouldn't the logic be exactly as also in > logicalrep_write_attrs()? > > e.g. Shouldn't they both look like this: > > if (att->attisdropped) > continue; > > if (att->attgenerated && !publish_generated_column) > continue; > > if (!column_in_column_list(att->attnum, columns)) > continue; Fixed. > ====== > src/backend/replication/pgoutput/pgoutput.c > > 5. > static void send_relation_and_attrs(Relation relation, TransactionId xid, > LogicalDecodingContext *ctx, > - Bitmapset *columns); > + Bitmapset *columns, > + bool publish_generated_column); > > Use plural. /publish_generated_column/publish_generated_columns/ Updated the name to 'include_generated_columns' > 6. parse_output_parameters > > bool origin_option_given = false; > + bool generate_column_option_given = false; > > data->binary = false; > data->streaming = LOGICALREP_STREAM_OFF; > data->messages = false; > data->two_phase = false; > + data->publish_generated_column = false; > > I think the 1st var should be 'include_generated_columns_option_given' > for consistency with the name of the actual option that was given. Updated the name to 'include_generated_columns_option_given' > ====== > src/include/replication/logicalproto.h > > 7. > (Same as a previous review comment) > > For all the API changes the new parameter name should be plural. > > /publish_generated_column/publish_generated_columns/ Updated the name to 'include_generated_columns' > ====== > src/include/replication/pgoutput.h > > 8. > bool publish_no_origin; > + bool publish_generated_column; > } PGOutputData; > > /publish_generated_column/publish_generated_columns/ Updated the name to 'include_generated_columns' The attached Patch contains the suggested changes. Thanks and Regards, Shubham Khanna.
Attachment
pgsql-hackers by date: