Re: Allow logical replication to copy tables in binary format - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Allow logical replication to copy tables in binary format |
Date | |
Msg-id | CALj2ACU47pbGUUxyxVbGg3RAb8JsZhUuK+KogPyiWxYZbGVrxg@mail.gmail.com Whole thread Raw |
In response to | Re: Allow logical replication to copy tables in binary format (Melih Mutlu <m.melihmutlu@gmail.com>) |
Responses |
Re: Allow logical replication to copy tables in binary format
|
List | pgsql-hackers |
On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: > > Hi, > > Thanks for your reviews. Thanks. I have some comments: 1. The performance numbers posted upthread [1] look impressive for the use-case tried, that's a 2.25X improvement or 55.6% reduction in execution times. However, it'll be great to run a few more varied tests to confirm the benefit. 2. It'll be great to capture the perf report to see if the time spent in copy_table() is reduced with the patch. 3. I think blending initial table sync's binary copy option with data-type level binary send/receive is not good. Moreover, data-type level binary send/receive has its own restrictions per 9de77b5453. IMO, it'll be good to have a new option, say copy_data_format synonyms with COPY command's FORMAT option but with only text (default) and binary values. 4. Why to add tests to existing 002_types.pl? Can we add a new file with all the data types covered? 5. It's not clear to me as to why these rows were removed in the patch? - (1, '{1, 2, 3}'), - (2, '{2, 3, 1}'), (3, '{3, 2, 1}'), (4, '{4, 3, 2}'), (5, '{5, NULL, 3}'); -- test_tbl_arrays INSERT INTO tst_arrays (a, b, c, d) VALUES - ('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1 day", "2 days", "3 days"}'), - ('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2 minutes", "3 minutes", "1 minute"}'), 6. BTW, the subbinary description is missing in pg_subscription docs https://www.postgresql.org/docs/devel/catalog-pg-subscription.html? - Specifies whether the subscription will request the publisher to - send the data in binary format (as opposed to text). + Specifies whether the subscription will copy the initial data to + synchronize relations in binary format and also request the publisher + to send the data in binary format too (as opposed to text). 7. A nitpick - space is needed after >= before 160000. + if (walrcv_server_version(LogRepWorkerWalRcvConn) >=160000 && 8. Note that the COPY binary format isn't portable across platforms (Windows to Linux for instance) or major versions https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical replication is https://www.postgresql.org/docs/devel/logical-replication.html. I don't see any handling of such cases in copy_table but only a check for the publisher version. I think we need to account for all the cases - allow binary COPY only when publisher and subscriber are of same versions, architectures, platforms. The trick here is how we identify if the subscriber is of the same type and taste (architectures and platforms) as the publisher. Looking for PG_VERSION_STR of publisher and subscriber might be naive, but I'm not sure if there's a better way to do it. Also, the COPY binary format is very data type specific - per the docs "for example it will not work to output binary data from a smallint column and read it into an integer column, even though that would work fine in text format.". I did a small experiment [2], the logical replication works with compatible data types (int -> smallint, even int -> text), whereas the COPY binary format doesn't. I think it'll complicate things a bit to account for the above cases and allow COPY with binary format for logical replication. [1] https://www.postgresql.org/message-id/CAGPVpCQEKDVKQPf6OFQ-9WiRYB1YRejm--YJTuwgzuvj1LEJ2A%40mail.gmail.com [2] DROP TABLE foo; DROP PUBLICATION mypub; CREATE TABLE foo(c1 bigint, c2 int, c3 smallint); INSERT INTO foo SELECT i , i+1, i+2 FROM generate_series(1, 5) i; CREATE PUBLICATION mypub FOR TABLE foo; SELECT COUNT(*) FROM foo; SELECT * FROM foo; DROP SUBSCRIPTION mysub; DROP TABLE foo; CREATE TABLE foo(c1 smallint, c2 smallint, c3 smallint); -- works without any problem -- OR CREATE TABLE foo(c1 smallint, c2 text, c3 smallint); -- works without any problem CREATE SUBSCRIPTION mysub CONNECTION 'port=5432 dbname=postgres user=ubuntu' PUBLICATION mypub; SELECT COUNT(*) FROM foo; SELECT * FROM foo; drop table foo; create table foo(c1 bigint, c2 int, c3 smallint); insert into foo select i, i+1, i+2 from generate_series(1, 10) i; copy foo(c1, c2, c3) to '/home/ubuntu/postgres/inst/bin/data/foo.text' with (format 'text'); copy foo(c1, c2, c3) to '/home/ubuntu/postgres/inst/bin/data/foo.binary' with (format 'binary'); drop table bar; create table bar(c1 smallint, c2 smallint, c3 smallint); -- or create table bar(c1 smallint, c2 text, c3 smallint); copy bar(c1, c2, c3) from '/home/ubuntu/postgres/inst/bin/data/foo.text' with (format 'text'); copy bar(c1, c2, c3) from '/home/ubuntu/postgres/inst/bin/data/foo.binary' with (format 'binary'); -- produces "ERROR: incorrect binary data format" -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: