Re: Allow logical replication to copy tables in binary format - Mailing list pgsql-hackers

From Peter Smith
Subject Re: Allow logical replication to copy tables in binary format
Date
Msg-id CAHut+Pvq82pkYV+vh3MaMD2rAGBZzyjQBH00OVg021G173mcww@mail.gmail.com
Whole thread Raw
In response to Re: Allow logical replication to copy tables in binary format  (Melih Mutlu <m.melihmutlu@gmail.com>)
Responses Re: Allow logical replication to copy tables in binary format  (Amit Kapila <amit.kapila16@gmail.com>)
Re: Allow logical replication to copy tables in binary format  (Melih Mutlu <m.melihmutlu@gmail.com>)
List pgsql-hackers
Here are some review comments for v13-0001

======
doc/src/sgml/logical-replication.sgml

1.
@@ -241,10 +241,13 @@
    types of the columns do not need to match, as long as the text
    representation of the data can be converted to the target type.  For
    example, you can replicate from a column of type <type>integer</type> to a
-   column of type <type>bigint</type>.  The target table can also have
-   additional columns not provided by the published table.  Any such columns
-   will be filled with the default value as specified in the definition of the
-   target table.
+   column of type <type>bigint</type>.  However, replication in
binary format is
+   type specific and does not allow to replicate data between different types
+   according to its restrictions (See <literal>binary</literal> option of
+   <link linkend="sql-createsubscription"><command>CREATE
SUBSCRIPTION</command></link>
+   for details).  The target table can also have additional columns
not provided
+   by the published table.  Any such columns will be filled with the default
+   value as specified in the definition of the target table.
   </para>

I don’t really think we should mention details of what the binary
problems are here, because then:
i) it is just duplicating information already on the CREATE SUBSCRIPTION page
ii) you will miss some restrictions. (e.g. here you said something
about "type specific" but didn't mention send/receive functions would
be mandatory for the copy_data option)

That's why in the previous v12 review [1] (comment #3) I suggested
that this page should just say something quite generic like "However,
replication in binary format is more restrictive", and link back to
the other page which has all the gory details.

======
doc/src/sgml/ref/alter_subscription.sgml

2.
My previous v12 review [1] (comment #6) suggested maybe updating this
page. But it was not done in v13. Did you accidentally miss the review
comment, or chose not to do it?

======
doc/src/sgml/ref/create_subscription.sgml

3.
          <para>
-          Specifies whether the subscription will request the publisher to
-          send the data in binary format (as opposed to text).
-          The default is <literal>false</literal>.
-          Even when this option is enabled, only data types having
-          binary send and receive functions will be transferred in binary.
+          Specifies whether the subscription will request the publisher to send
+          the data in binary format (as opposed to text). The default is
+          <literal>false</literal>. Any initial table synchronization copy
+          (see <literal>copy_data</literal>) also uses the same format. Binary
+          format can be faster than the text format, but it is less portable
+          across machine architectures and PostgreSQL versions.
Binary format is
+          also very data type specific, it will not allow copying
between different
+          column types as opposed to text format. Even when this
option is enabled,
+          only data types having binary send and receive functions will be
+          transferred in binary. Note that the initial synchronization requires
+          all data types to have binary send and receive functions, otherwise
+          the synchronization will fail.
          </para>


BEFORE
Binary format is also very data type specific, it will not allow
copying between different column types as opposed to text format.

SUGGESTION (worded more similar to what is already on the COPY page [2])
Binary format is very data type specific; for example, it will not
allow copying from a smallint column to an integer column, even though
that would work fine in text format.


~~~

4.

+         <para>
+          If the publisher is a <productname>PostgreSQL</productname> version
+          before 14, then any initial table synchronization will use
text format
+          even if this option is enabled.
+         </para>

IMO it will be clearer to explicitly say the option instead of 'this option'.

SUGGESTION
If the publisher is a <productname>PostgreSQL</productname> version
before 14, then any initial table synchronization will use text format
even if <literal>binary = true</literal>.


======
src/backend/replication/logical/tablesync.c

5.
+
+ /*
+ * If the publisher version is earlier than v14, it COPY command doesn't
+ * support the binary option.
+ */
+ if (walrcv_server_version(LogRepWorkerWalRcvConn) >= 140000 &&
+ MySubscription->binary)
+ {
+ appendStringInfo(&cmd, " WITH (FORMAT binary)");
+ options = lappend(options, makeDefElem("format", (Node *)
makeString("binary"), -1));
+ }

Sorry, I gave a poor review comment for this previously. Now I have
revisited all the thread discussions about version checking. I feel
that some explanation should be given in the code comment so that
future readers of this code can understand why you decided to use v14
checking.

Something like this:

SUGGESTION
If the publisher version is earlier than v14, we use text format COPY.
Note - In fact COPY syntax "WITH (FORMAT binary)" has existed since
v9, but since the logical replication binary mode transfer was not
introduced until v14 it was decided to check using the later version.

------
[1] PS v12 review -
https://www.postgresql.org/message-id/CAHut%2BPsAS8HpjdbDv%2BRM-YUJaLO0UC3f5be%2BqN296%2BGrewsGXg%40mail.gmail.com
[2] pg docs COPY - https://www.postgresql.org/docs/current/sql-copy.html
[3] pg docs COPY v9.0 - https://www.postgresql.org/docs/9.0/sql-copy.html

Kind Regards,
Peter Smith.
Fujitsu Australia



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add macros for ReorderBufferTXN toptxn
Next
From: Amit Kapila
Date:
Subject: Re: Allow logical replication to copy tables in binary format