Re: Reduce useless changes before reassembly during logical replication - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Reduce useless changes before reassembly during logical replication |
Date | |
Msg-id | CALj2ACXYGQ2nYF0LOYi5u7xBqH+MEb8eWsTEvJrx_QrC+107Zw@mail.gmail.com Whole thread Raw |
In response to | Reduce useless changes before reassembly during logical replication (li jie <ggysxcq@gmail.com>) |
Responses |
Re: Reduce useless changes before reassembly during logical replication
|
List | pgsql-hackers |
On Wed, Jan 17, 2024 at 11:45 AM li jie <ggysxcq@gmail.com> wrote: > > Hi hackers, > > During logical replication, if there is a large write transaction, some > spill files will be written to disk, depending on the setting of > logical_decoding_work_mem. > > This behavior can effectively avoid OOM, but if the transaction > generates a lot of change before commit, a large number of files may > fill the disk. For example, you can update a TB-level table. > > However, I found an inelegant phenomenon. If the modified large table is not > published, its changes will also be written with a large number of spill files. > Look at an example below: Thanks. I agree that decoding and queuing the changes of unpublished tables' data into reorder buffer is an unnecessary task for walsender. It takes processing efforts (CPU overhead), consumes disk space and uses memory configured via logical_decoding_work_mem for a replication connection inefficiently. > Later you will see a large number of spill files in the > > We can see that table tbl_t1 is not published in mypub. It also won't be sent > downstream because it's not subscribed. > After the transaction is reorganized, the pgoutput decoding plugin filters out > changes to these unpublished relationships when sending logical changes. > See function pgoutput_change. Right. Here's my testing [1]. > Most importantly, if we filter out unpublished relationship-related > changes after constructing the changes but before queuing the changes > into a transaction, will it reduce the workload of logical decoding > and avoid disk > or memory growth as much as possible? Right. It can. > The patch in the attachment is a prototype, which can effectively reduce the > memory and disk space usage during logical replication. > > Design: > 1. Added a callback LogicalDecodeFilterByRelCB for the output plugin. > > 2. Added this callback function pgoutput_table_filter for the pgoutput plugin. > Its main implementation is based on the table filter in the > pgoutput_change function. > Its main function is to determine whether the change needs to be published based > on the parameters of the publication, and if not, filter it. > > 3. After constructing a change and before Queue a change into a transaction, > use RelidByRelfilenumber to obtain the relation associated with the change, > just like obtaining the relation in the ReorderBufferProcessTXN function. > > 4. Relation may be a toast, and there is no good way to get its real > table relation based on toast relation. Here, I get the real table oid > through toast relname, and then get the real table relation. > > 5. This filtering takes into account INSERT/UPDATE/INSERT. Other > changes have not been considered yet and can be expanded in the future. Design of this patch is based on the principle of logical decoding filtering things out early on and looks very similar to filter_prepare_cb_wrapper/pg_decode_filter_prepare and filter_by_origin_cb/pgoutput_origin_filter. Per my understanding this design looks okay unless I'm missing anything. > Test: > 1. Added a test case 034_table_filter.pl > 2. Like the case above, create two tables, the published table tbl_pub and > the non-published table tbl_t1 > 3. Insert 10,000 rows of toast data into tbl_t1 on the publisher, and use > pg_ls_replslotdir to record the total size of the slot directory > every second. > 4. Compare the size of the slot directory at the beginning of the > transaction(size1), > the size at the end of the transaction (size2), and the average > size of the entire process(size3). > 5. Assert(size1==size2==size3) I bet that the above test with 10K rows is going to take a noticeable time on some buildfarm members (it took 6 seconds on my dev system which is an AWS EC2 instance). And, the above test can get flaky. Therefore, IMO, the concrete way of testing this feature is by looking at the server logs for the following message using PostgreSQL::Test::Cluster log_contains(). +filter_done: + + if (result && RelationIsValid(relation)) + elog(DEBUG1, "logical filter change by table %s", RelationGetRelationName(relation)); + Here are some comments on the v1 patch: 1. @@ -1415,9 +1419,6 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, TupleTableSlot *old_slot = NULL; TupleTableSlot *new_slot = NULL; - if (!is_publishable_relation(relation)) - return; - Instead of removing is_publishable_relation from pgoutput_change, I think it can just be turned into an assertion Assert(is_publishable_relation(relation));, no? 2. + switch (change->action) + { + /* intentionally fall through */ Perhaps, it must use /* FALLTHROUGH */ just like elsewhere in the code, otherwise a warning is thrown. 3. From commit message: Most of the code in the FilterByTable function is transplanted from the ReorderBufferProcessTXN function, which can be called before the ReorderBufferQueueChange function.It is I think the above note can just be above the FilterByTable function for better understanding. +static bool +FilterByTable(LogicalDecodingContext *ctx, ReorderBufferChange *change) +{ 4. Why is FilterByTable(ctx, change) call placed after DecodeXLogTuple in DecodeInsert, DecodeUpdate and DecodeDelete? Is there a use for decoded tuples done by DecodeXLogTuple in the new callback filter_by_table_cb? If not, can we move FilterByTable call before DecodeXLogTuple to avoid some more extra processing? 5. Why is ReorderBufferChange needed as a parameter to FilterByTable and filter_by_table_cb? Can't just the LogicalDecodingContext and relation name, the change action be enough to decide if the table is publishable or not? If done this way, it can avoid some more processing, no? 6. Please run pgindent and pgperltidy on the new source code and new TAP test file respectively. [1] HEAD: postgres=# BEGIN; BEGIN Time: 0.110 ms postgres=*# insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba', i),repeat('dfds', i) from generate_series(0,99999) i; INSERT 0 100000 Time: 379488.265 ms (06:19.488) postgres=*# ubuntu:~/postgres/pg17/bin$ du -sh /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub 837M /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub ubuntu:~/postgres/pg17/bin$ du -sh /home/ubuntu/postgres/pg17/bin/db17 2.6G /home/ubuntu/postgres/pg17/bin/db17 PATCHED: postgres=# BEGIN; BEGIN Time: 0.105 ms postgres=*# insert into tbl_t1 select i,repeat('xyzzy', i),repeat('abcba', i),repeat('dfds', i) from generate_series(0,99999) i; INSERT 0 100000 Time: 380044.554 ms (06:20.045) ubuntu:~/postgres$ du -sh /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub 8.0K /home/ubuntu/postgres/pg17/bin/db17/pg_replslot/mysub ubuntu:~/postgres$ du -sh /home/ubuntu/postgres/pg17/bin/db17 1.8G /home/ubuntu/postgres/pg17/bin/db17 -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: