Re: Allow logical replication to copy tables in binary format - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Allow logical replication to copy tables in binary format |
Date | |
Msg-id | CALj2ACUfE08ZNjKK-nK9JiwGhwUMRLM+qRhNKTVM9HipFk7Fow@mail.gmail.com Whole thread Raw |
In response to | Re: Allow logical replication to copy tables in binary format (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Allow logical replication to copy tables in binary format
|
List | pgsql-hackers |
On Tue, Feb 21, 2023 at 7:18 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Mon, Feb 20, 2023 at 5:17 PM Melih Mutlu <m.melihmutlu@gmail.com> wrote: > > > > Thanks for letting me know. > > Attached the fixed version of the patch. > > Thanks. I have few comments on v9 patch: > > 1. > + /* Do not allow binary = false with copy_format = binary */ > + if (!opts.binary && > + sub->copyformat == LOGICALREP_COPY_AS_BINARY && > + !IsSet(opts.specified_opts, SUBOPT_COPY_FORMAT)) > + ereport(ERROR, > + > (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot set %s for a > subscription with %s", > + "binary = false", > "copy_format = binary"))); > > I don't understand why we'd need to tie an option (binary) that deals > with data types at column-level with another option (copy_format) that > requests the entire table data to be in binary. This'd essentially > make one to set binary = true to use copy_format = binary, no? IMHO, > this inter-dependency is not good for better usability. > > 2. Why can't the tests that this patch adds be simple? Why would it > need to change the existing tests at all? I'm thinking to create a new > 00X_binary_copy_format.pl or such and setting up logical replication > with copy_format = binary and letting table sync worker request > publisher in binary format - you can verify this via publisher server > logs - look for COPY with BINARY option. If required, have the table > with different data types. This greatly reduces the patch's footprint. I've done performance testing with the v9 patch. I can constantly observe 1.34X improvement or 25% improvement in table sync/copy performance with the patch: HEAD binary = false Time: 214398.637 ms (03:34.399) PATCHED binary = true, copy_format = binary: Time: 160509.262 ms (02:40.509) There's a clear reduction (5.68% to 4.49%) in the CPU cycles spent in copy_table with the patch: perf report HEAD: - 5.68% 0.00% postgres postgres [.] copy_table - copy_table - 5.67% CopyFrom - 4.26% NextCopyFrom - 2.16% NextCopyFromRawFields - 1.55% CopyReadLine - 1.52% CopyReadLineText - 0.76% CopyLoadInputBuf 0.50% CopyConvertBuf 0.60% CopyReadAttributesText - 1.93% InputFunctionCall 0.69% timestamptz_in 0.53% byteain - 0.73% CopyMultiInsertInfoFlush - 0.73% CopyMultiInsertBufferFlush - 0.66% table_multi_insert 0.65% heap_multi_insert perf report PATCHED: - 4.49% 0.00% postgres postgres [.] copy_table - copy_table - 4.48% CopyFrom - 2.36% NextCopyFrom - 1.81% CopyReadBinaryAttribute 1.47% ReceiveFunctionCall - 1.21% CopyMultiInsertInfoFlush - 1.21% CopyMultiInsertBufferFlush - 1.11% table_multi_insert - 1.09% heap_multi_insert - 0.71% RelationGetBufferForTuple - 0.63% ReadBufferBI - 0.62% ReadBufferExtended 0.62% ReadBuffer_common I've tried to choose the table columns such that the binary send/recv vs non-binary/plain/text copy has some benefit. The table has 100mn rows, and is of 11GB size. I've measured the benefit using Melih's helper function wait_for_rep(). Note that I had to compile source code with -ggdb3 -O0 for perf report, otherwise with -O3 for performance numbers: wal_level = 'logical' shared_buffers = '8GB' wal_buffers = '1GB' max_wal_size = '32GB' create table foo(i int, n numeric, v varchar, b bytea, t timestamp with time zone default current_timestamp); insert into foo select i, i+1, md5(i::text), md5(i::text)::bytea from generate_series(1, 100000000) i; CREATE OR REPLACE PROCEDURE wait_for_rep() LANGUAGE plpgsql AS $$ BEGIN WHILE (SELECT count(*) != 0 FROM pg_subscription_rel WHERE srsubstate <> 'r') LOOP COMMIT; END LOOP; END; $$; -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: