Re: Allow logical replication to copy tables in binary format - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Allow logical replication to copy tables in binary format |
Date | |
Msg-id | CAA4eK1J7et6UGJVgfwi-Vj=2LAMDDWRMMb=7Wj8F=UzeaU8kOg@mail.gmail.com Whole thread Raw |
In response to | Re: Allow logical replication to copy tables in binary format (Melih Mutlu <m.melihmutlu@gmail.com>) |
Responses |
RE: Allow logical replication to copy tables in binary format
Re: Allow logical replication to copy tables in binary format |
List | pgsql-hackers |
On Mon, Jan 30, 2023 at 4:19 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: > > Hi Bharath, > > Thanks for reviewing. > > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>, 18 Oca 2023 Çar, 10:17 tarihinde şunu yazdı: >> >> On Thu, Jan 12, 2023 at 1:53 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: >> 1. The performance numbers posted upthread [1] look impressive for the >> use-case tried, that's a 2.25X improvement or 55.6% reduction in >> execution times. However, it'll be great to run a few more varied >> tests to confirm the benefit. > > > Sure, do you have any specific test case or suggestion in mind? > >> >> 2. It'll be great to capture the perf report to see if the time spent >> in copy_table() is reduced with the patch. > > > Will provide that in another email soon. > >> >> 3. I think blending initial table sync's binary copy option with >> data-type level binary send/receive is not good. Moreover, data-type >> level binary send/receive has its own restrictions per 9de77b5453. >> IMO, it'll be good to have a new option, say copy_data_format synonyms >> with COPY command's FORMAT option but with only text (default) and >> binary values. > > > Added a "copy_format" option for subscriptions with text as default value. So it would be possible to create a binary subscriptionbut copy tables in text format to avoid restrictions that you're concerned about. > >> >> 4. Why to add tests to existing 002_types.pl? Can we add a new file >> with all the data types covered? > > > Since 002_types.pl is where the replication of data types are covered. I thought it would be okay to test replication withthe binary option in that file. > Sure, we can add a new file with different data types for testing subscriptions with binary option. But then I feel likeit would have too many duplicated lines with 002_types.pl. > If you think that 002_types.pl lacks some data types needs to be tested, then we should add those into 002_types.pl toowhether we test subscription with binary option in that file or some other place, right? > >> >> 5. It's not clear to me as to why these rows were removed in the patch? >> - (1, '{1, 2, 3}'), >> - (2, '{2, 3, 1}'), >> (3, '{3, 2, 1}'), >> (4, '{4, 3, 2}'), >> (5, '{5, NULL, 3}'); >> >> -- test_tbl_arrays >> INSERT INTO tst_arrays (a, b, c, d) VALUES >> - ('{1, 2, 3}', '{"a", "b", "c"}', '{1.1, 2.2, 3.3}', '{"1 >> day", "2 days", "3 days"}'), >> - ('{2, 3, 1}', '{"b", "c", "a"}', '{2.2, 3.3, 1.1}', '{"2 >> minutes", "3 minutes", "1 minute"}'), > > > Previously, it wasn't actually testing the initial table sync since all tables were empty when subscription was created. > I just simply split the data initially inserted to test initial table sync. > > With this patch, it inserts the first two rows for all data types before subscriptions get created. > You can see these lines: >> >> +# Insert initial test data >> +$node_publisher->safe_psql( >> + 'postgres', qq( >> + -- test_tbl_one_array_col >> + INSERT INTO tst_one_array (a, b) VALUES >> + (1, '{1, 2, 3}'), >> + (2, '{2, 3, 1}'); >> + >> + -- test_tbl_arrays >> + INSERT INTO tst_arrays (a, b, c, d) VALUES > > > >> >> 6. BTW, the subbinary description is missing in pg_subscription docs >> https://www.postgresql.org/docs/devel/catalog-pg-subscription.html? >> - Specifies whether the subscription will request the publisher to >> - send the data in binary format (as opposed to text). >> + Specifies whether the subscription will copy the initial data to >> + synchronize relations in binary format and also request the publisher >> + to send the data in binary format too (as opposed to text). > > > Done. > >> >> 7. A nitpick - space is needed after >= before 160000. >> + if (walrcv_server_version(LogRepWorkerWalRcvConn) >=160000 && > > > Done. > >> >> 8. Note that the COPY binary format isn't portable across platforms >> (Windows to Linux for instance) or major versions >> https://www.postgresql.org/docs/devel/sql-copy.html, whereas, logical >> replication is https://www.postgresql.org/docs/devel/logical-replication.html. >> I don't see any handling of such cases in copy_table but only a check >> for the publisher version. I think we need to account for all the >> cases - allow binary COPY only when publisher and subscriber are of >> same versions, architectures, platforms. The trick here is how we >> identify if the subscriber is of the same type and taste >> (architectures and platforms) as the publisher. Looking for >> PG_VERSION_STR of publisher and subscriber might be naive, but I'm not >> sure if there's a better way to do it. > > > I think having the "copy_format" option with text as default, like I replied to your 3rd review above, will keep thingsas they are now. > The patch now will only allow users to choose binary copy as well, if they want it and acknowledge the limitations thatcome with binary copy. > COPY command's portability issues shouldn't be an issue right now, since the patch still supports text format. Right? > One thing that is not completely clear from above is whether we will have any problem if the subscription uses binary mode for copying across the server versions. Do we use binary file during the copy used in logical replication? >> >> Also, the COPY binary format is very data type specific - per the docs >> "for example it will not work to output binary data from a smallint >> column and read it into an integer column, even though that would work >> fine in text format.". I did a small experiment [2], the logical >> replication works with compatible data types (int -> smallint, even >> int -> text), whereas the COPY binary format doesn't. > > > Logical replication between different types like int and smallint is already not working properly on HEAD too. > Yes, the scenario you shared looks like working. But you didn't create the subscription with binary=true. The patch didnot change subscription with binary=false case. I believe what you should experiment is binary=true case which alreadyfails in the apply phase on HEAD. > > Well, with this patch, it will begin to fail in the table copy phase. But I don't think this is a problem because logicalreplication in binary format is already broken for replications between different data types. > So, doesn't this mean that there is no separate failure mode during the initial copy? I am clarifying this to see if the patch really needs a separate copy_format option for initial sync? -- With Regards, Amit Kapila.
pgsql-hackers by date: