Re: Pgoutput not capturing the generated columns - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Pgoutput not capturing the generated columns |
Date | |
Msg-id | CAHut+PsuJfcaeg6zst=6PE5uyJv_UxVRHU3ck7W2aHb1uQYKng@mail.gmail.com Whole thread Raw |
In response to | Re: Pgoutput not capturing the generated columns (Shubham Khanna <khannashubham1197@gmail.com>) |
Responses |
Re: Pgoutput not capturing the generated columns
|
List | pgsql-hackers |
Here are some review comments for the patch v1-0001. ====== GENERAL G.1. Use consistent names It seems to add unnecessary complications by having different names for all the new options, fields and API parameters. e.g. sometimes 'include_generated_columns' e.g. sometimes 'publish_generated_columns' Won't it be better to just use identical names everywhere for everything? I don't mind which one you choose; I just felt you only need one name, not two. This comment overrides everything else in this post so whatever name you choose, make adjustments for all my other review comments as necessary. ====== G.2. Is it possible to just use the existing bms? A very large part of this patch is adding more API parameters to delegate the 'publish_generated_columns' flag value down to when it is finally checked and used. e.g. The functions: - logicalrep_write_insert(), logicalrep_write_update(), logicalrep_write_delete() ... are delegating the new parameter 'publish_generated_column' down to: - logicalrep_write_tuple The functions: - logicalrep_write_rel() ... are delegating the new parameter 'publish_generated_column' down to: - logicalrep_write_attrs AFAICT in all these places the API is already passing a "Bitmapset *columns". I was wondering if it might be possible to modify the "Bitmapset *columns" BEFORE any of those functions get called so that the "columns" BMS either does or doesn't include generated cols (as appropriate according to the option). Well, it might not be so simple because there are some NULL BMS considerations also, but I think it would be worth investigating at least, because if indeed you can find some common place (somewhere like pgoutput_change()?) where the columns BMS can be filtered to remove bits for generated cols then it could mean none of those other patch API changes are needed at all -- then the patch would only be 1/2 the size. ====== Commit message 1. Now if include_generated_columns option is specified, the generated column information and generated column data also will be sent. Usage from pgoutput plugin: SELECT * FROM pg_logical_slot_peek_binary_changes('slot1', NULL, NULL, 'proto_version', '1', 'publication_names', 'pub1', 'include_generated_columns', 'true'); Usage from test_decoding plugin: SELECT data FROM pg_logical_slot_get_changes('slot2', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); ~ I think there needs to be more background information given here. This commit message doesn't seem to describe anything about what is the problem and how this patch fixes it. It just jumps straight into giving usages of a 'include_generated_columns' option. It also doesn't say that this is an option that was newly *introduced* by the patch -- it refers to it as though the reader should already know about it. Furthermore, your hacker's post says "Currently it is not supported as a subscription option because table sync for the generated column is not possible as copy command does not support getting data for the generated column. If this feature is required we can remove this limitation from the copy command and then add it as a subscription option later." IMO that all seems like the kind of information that ought to also be mentioned in this commit message. ====== contrib/test_decoding/sql/ddl.sql 2. +-- check include_generated_columns option with generated column +CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) STORED); +INSERT INTO gencoltable (a) VALUES (1), (2), (3); +SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1', 'include_generated_columns', '1'); +DROP TABLE gencoltable; + 2a. Perhaps you should include both option values to demonstrate the difference in behaviour: 'include_generated_columns', '0' 'include_generated_columns', '1' ~ 2b. I think you maybe need to include more some test combinations where there is and isn't a COLUMN LIST, because I am not 100% sure I understand the current logic/expectations for all combinations. e.g. When the generated column is in a column list but 'publish_generated_columns' is false then what should happen? etc. Also if there are any special rules then those should be mentioned in the commit message. ====== src/backend/replication/logical/proto.c 3. For all the API changes the new parameter name should be plural. /publish_generated_column/publish_generated_columns/ ~~~ 4. logical_rep_write_tuple: - if (att->attisdropped || att->attgenerated) + if (att->attisdropped) continue; - if (!column_in_column_list(att->attnum, columns)) + if (!column_in_column_list(att->attnum, columns) && !att->attgenerated) + continue; + + if (att->attgenerated && !publish_generated_column) continue; That code seems confusing. Shouldn't the logic be exactly as also in logicalrep_write_attrs()? e.g. Shouldn't they both look like this: if (att->attisdropped) continue; if (att->attgenerated && !publish_generated_column) continue; if (!column_in_column_list(att->attnum, columns)) continue; ====== src/backend/replication/pgoutput/pgoutput.c 5. static void send_relation_and_attrs(Relation relation, TransactionId xid, LogicalDecodingContext *ctx, - Bitmapset *columns); + Bitmapset *columns, + bool publish_generated_column); Use plural. /publish_generated_column/publish_generated_columns/ ~~~ 6. parse_output_parameters bool origin_option_given = false; + bool generate_column_option_given = false; data->binary = false; data->streaming = LOGICALREP_STREAM_OFF; data->messages = false; data->two_phase = false; + data->publish_generated_column = false; I think the 1st var should be 'include_generated_columns_option_given' for consistency with the name of the actual option that was given. ====== src/include/replication/logicalproto.h 7. (Same as a previous review comment) For all the API changes the new parameter name should be plural. /publish_generated_column/publish_generated_columns/ ====== src/include/replication/pgoutput.h 8. bool publish_no_origin; + bool publish_generated_column; } PGOutputData; /publish_generated_column/publish_generated_columns/ ====== Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: