Re: Allow logical replication to copy tables in binary format - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: Allow logical replication to copy tables in binary format |
Date | |
Msg-id | CAHut+PsAS8HpjdbDv+RM-YUJaLO0UC3f5be+qN296+GrewsGXg@mail.gmail.com Whole thread Raw |
In response to | Re: Allow logical replication to copy tables in binary format (Melih Mutlu <m.melihmutlu@gmail.com>) |
Responses |
Re: Allow logical replication to copy tables in binary format
Re: Allow logical replication to copy tables in binary format |
List | pgsql-hackers |
Here are some review comments for patch v12-0001 ====== General 1. There is no new test code. Are we sure that there are already sufficient TAP tests doing binary testing with/without copy_data and covering all the necessary combinations? ====== Commit Message 2. Without this patch, table are copied in text format even if the subscription is created with binary option enabled. This patch allows logical replication to perform in binary format starting from initial sync. When binary format is beneficial to use, allowing the subscription to copy tables in binary in table sync phase may reduce the time spent on copy depending on column types. ~ a. "table" -> "tables" b. I don't think need to keep referring to the initial table synchronization many times. SUGGESTION Without this patch, table synchronization COPY uses text format even if the subscription is created with the binary option enabled. Copying tables in binary format may reduce the time spent depending on column types. ====== doc/src/sgml/logical-replication.sgml 3. @@ -241,10 +241,11 @@ types of the columns do not need to match, as long as the text representation of the data can be converted to the target type. For example, you can replicate from a column of type <type>integer</type> to a - column of type <type>bigint</type>. The target table can also have - additional columns not provided by the published table. Any such columns - will be filled with the default value as specified in the definition of the - target table. + column of type <type>bigint</type>. However, replication in binary format is + type specific and does not allow to replicate data between different types + according to its restrictions. The target table can also have additional + columns not provided by the published table. Any such columns will be filled + with the default value as specified in the definition of the target table. </para> I am not sure if there is enough information here about the binary restrictions. - e.g. does the column order matter for tablesync COPY binary? - e.g. no mention of the send/receive function requirements of tablesync COPY. But maybe here is not the place to write all such details anyway; Instead of duplicating information IMO here should give a link to the CREATE SUBSCRIPTION notes -- something like: SUGGESTION Note that replication in binary format is more restrictive. See CREATE SUBSCRIPTION binary subscription parameter for details. ====== doc/src/sgml/ref/create_subscription.sgml 4. @@ -189,11 +189,17 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl <term><literal>binary</literal> (<type>boolean</type>)</term> <listitem> <para> - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). - The default is <literal>false</literal>. - Even when this option is enabled, only data types having - binary send and receive functions will be transferred in binary. + Specifies whether the subscription will both copy the initial data to + synchronize relations and request the publisher to send the data in + binary format (as opposed to text). The default is <literal>false</literal>. + Binary format can be faster than the text format, but it is less portable + across machine architectures and PostgreSQL versions. Binary format is + also very data type specific, it will not allow copying between different + column types as opposed to text format. Even when this option is enabled, + only data types having binary send and receive functions will be + transferred in binary. Note that the initial synchronization requires + all data types to have binary send and receive functions, otherwise + the synchronization will fail. </para> There seems to be a small ambiguity because this wording comes more from our code-level understanding, rather than what the user sees. e.g. I think "will be transferred" could mean also the COPY phase as far as the user is concerned. Maybe some slight rewording can help. There is also some use of "copy" (e.g. "will not allow copying") which can be confused with the initial tablesync phase which is not what was intended. SUGGESTION Specifies whether the subscription will request the publisher to send the data in binary format (as opposed to text). The default is <literal>false</literal>. Any initial table synchronization copy [link to copy_data] also uses the same format. Using binary format can be faster than the text format, but it is less portable across machine architectures and PostgreSQL versions. Binary format is also data type specific, it will not allow transfer between different column types as opposed to text format. Even when the binary option is enabled, only data types having binary send/receive functions can be transferred in binary format. If these functions don't exist then the publisher send will revert to sending text format. Note that the binary initial table synchronization copy requires all data types to have binary send/receive functions, otherwise it will fail. ====== src/backend/replication/logical/tablesync.c 5. + + /* + * If the publisher is v14 or later, copy data in the required data format. + * If the publisher version is earlier, it doesn't support COPY with binary + * option. + */ + if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 140000 && + MySubscription->binary) + { + appendStringInfo(&cmd, " WITH (FORMAT binary)"); + options = lappend(options, makeDefElem("format", (Node *) makeString("binary"), -1)); + } + 5a. I didn't think you need to say "copy data in the required data format". Can’t you just say like: SUGGESTION If the publisher version is earlier than v14, it COPY command doesn't support the binary option. ~ 5b. Does this also need to be mentioned as a note on the CREATE SUBSCRIPTION docs page? e.g. COPY binary from server versions < v14 will work because it will just be skipping anyway and use text. ====== doc/src/sgml/ref/alter_subscription.sgml 6. The v12 patch does not update the ALTER SUBSCRIPTION DOCS, but I thought perhaps there should be some reference from the ALTER copy_data back to the main CREATE SUBSCRIPTION page because if the user leaves the default (copy_data=true) then the ALTER might cause some unexpected errors is the subscription was already using binary format. ------ Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: