Re: logical replication empty transactions - Mailing list pgsql-hackers
From | Peter Smith |
---|---|
Subject | Re: logical replication empty transactions |
Date | |
Msg-id | CAHut+PtMX9x8ssTsJof5_nJyfJC+NygvkTFWjnkq2HcA0f1LUg@mail.gmail.com Whole thread Raw |
In response to | Re: logical replication empty transactions (Ajin Cherian <itsajin@gmail.com>) |
Responses |
Re: logical replication empty transactions
|
List | pgsql-hackers |
On Fri, Apr 23, 2021 at 3:46 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > > On Mon, Apr 19, 2021 at 6:22 PM Peter Smith <smithpb2250@gmail.com> wrote: >> >> >> Here are a some review comments: >> >> ------ >> >> 1. The patch v3 applied OK but with whitespace warnings >> >> [postgres@CentOS7-x64 oss_postgres_2PC]$ git apply >> ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch >> ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98: >> indent with spaces. >> /* output BEGIN if we haven't yet, avoid for streaming and >> non-transactional messages */ >> ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99: >> indent with spaces. >> if (!data->xact_wrote_changes && !in_streaming && transactional) >> warning: 2 lines add whitespace errors. >> >> ------ > > > Fixed. > >> >> >> 2. Please create a CF entry in [1] for this patch. >> >> ------ >> >> 3. Patch comment >> >> The comment describes the problem and then suddenly just says >> "Postpone the BEGIN message until the first change." >> >> I suggest changing it to say more like... "(blank line) This patch >> addresses the above problem by postponing the BEGIN message until the >> first change." >> >> ------ >> > > Updated. > >> >> 4. pgoutput.h >> >> Maybe for consistency with the context member, the comment for the new >> member should be to the right instead of above it? >> >> @@ -20,6 +20,9 @@ typedef struct PGOutputData >> MemoryContext context; /* private memory context for transient >> * allocations */ >> >> + /* flag indicating whether messages have previously been sent */ >> + bool xact_wrote_changes; >> + >> >> ------ >> >> 5. pgoutput.h >> >> + /* flag indicating whether messages have previously been sent */ >> >> "previously been sent" --> "already been sent" ?? >> >> ------ >> >> 6. pgoutput.h - misleading member name >> >> Actually, now that I have read all the rest of the code and how this >> member is used I feel that this name is very misleading. e.g. For >> "streaming" case then you still are writing changes but are not >> setting this member at all - therefore it does not always mean what it >> says. >> >> I feel a better name for this would be something like >> "sent_begin_txn". Then if you have sent BEGIN it is true. If you >> haven't sent BEGIN it is false. It eliminates all ambiguity naming it >> this way instead. >> >> (This makes my feedback #5 redundant because the comment will be a bit >> different if you do this). >> >> ------ > > > Fixed above comments. >> >> >> 7. pgoutput.c - function pgoutput_begin_txn >> >> @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, >> OutputPluginOptions *opt, >> static void >> pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) >> { >> >> I guess that you still needed to pass the txn because that is how the >> API is documented, right? >> >> But I am wondering if you ought to flag it as unused so you wont get >> some BF machine giving warnings about it. >> >> e.g. Syntax like this? >> >> pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) { >> (void)txn; >> ... > > > Updated. >> >> ------ >> >> 8. pgoutput.c - function pgoutput_begin_txn >> >> @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, >> OutputPluginOptions *opt, >> static void >> pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) >> { >> + PGOutputData *data = ctx->output_plugin_private; >> + >> + /* >> + * Don't send BEGIN message here. Instead, postpone it until the first >> + * change. In logical replication, a common scenario is to replicate a set >> + * of tables (instead of all tables) and transactions whose changes were on >> + * table(s) that are not published will produce empty transactions. These >> + * empty transactions will send BEGIN and COMMIT messages to subscribers, >> + * using bandwidth on something with little/no use for logical replication. >> + */ >> + data->xact_wrote_changes = false; >> + elog(LOG,"Holding of begin"); >> +} >> >> Why is this loglevel LOG? Looks like leftover debugging. > > > Removed. >> >> >> ------ >> >> 9. pgoutput.c - function pgoutput_commit_txn >> >> @@ -384,8 +401,14 @@ static void >> pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >> XLogRecPtr commit_lsn) >> { >> + PGOutputData *data = ctx->output_plugin_private; >> + >> OutputPluginUpdateProgress(ctx); >> >> + /* skip COMMIT message if nothing was sent */ >> + if (!data->xact_wrote_changes) >> + return; >> + >> >> In the case where you decided to do nothing does it make sense that >> you still called the function OutputPluginUpdateProgress(ctx); ? >> I thought perhaps that your new check should come first so this call >> would never happen. > > > Even though the empty transaction is not sent, the LSN is tracked as decoded, hence the progress needs to be updated. > >> >> ------ >> >> 10. pgoutput.c - variable declarations without casts >> >> + PGOutputData *data = ctx->output_plugin_private; >> >> I noticed the new stack variable you declare have no casts. >> >> This differs from the existing code which always looks like: >> PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; >> >> There are a couple of examples of this so please search new code to >> find them all. >> >> ----- > > > Fixed. > >> >> 11. pgoutput.c - function pgoutput_change >> >> @@ -551,6 +574,13 @@ pgoutput_change(LogicalDecodingContext *ctx, >> ReorderBufferTXN *txn, >> Assert(false); >> } >> >> + /* output BEGIN if we haven't yet */ >> + if (!data->xact_wrote_changes && !in_streaming) >> + { >> + pgoutput_begin(ctx, txn); >> + data->xact_wrote_changes = true; >> + } >> >> If the variable is renamed as previously suggested then the assignment >> data->sent_BEGIN_txn = true; can be assigned in just 1 common place >> INSIDE the pgoutput_begin function. >> >> ------ > > > Updated. >> >> >> 12. pgoutput.c - pgoutput_truncate function >> >> @@ -693,6 +723,13 @@ pgoutput_truncate(LogicalDecodingContext *ctx, >> ReorderBufferTXN *txn, >> >> if (nrelids > 0) >> { >> + /* output BEGIN if we haven't yet */ >> + if (!data->xact_wrote_changes && !in_streaming) >> + { >> + pgoutput_begin(ctx, txn); >> + data->xact_wrote_changes = true; >> + } >> >> (same comment as above) >> >> If the variable is renamed as previously suggested then the assignment >> data->sent_BEGIN_txn = true; can be assigned in just 1 common place >> INSIDE the pgoutput_begin function. >> >> 13. pgoutput.c - pgoutput_message >> >> @@ -725,6 +762,13 @@ pgoutput_message(LogicalDecodingContext *ctx, >> ReorderBufferTXN *txn, >> if (in_streaming) >> xid = txn->xid; >> >> + /* output BEGIN if we haven't yet, avoid for streaming and >> non-transactional messages */ >> + if (!data->xact_wrote_changes && !in_streaming && transactional) >> + { >> + pgoutput_begin(ctx, txn); >> + data->xact_wrote_changes = true; >> + } >> >> (same comment as above) >> >> If the variable is renamed as previously suggested then the assignment >> data->sent_BEGIN_txn = true; can be assigned in just 1 common place >> INSIDE the pgoutput_begin function. >> >> ------ > > > Fixed. >> >> >> 14. Test Code. >> >> I noticed that there is no test code specifically for seeing if empty >> transactions get sent or not. Is it possible to write such a test or >> is this traffic improvement only observable using the debugger? >> > > The 020_messages.pl actually has a test case for tracking empty messages even though it is part of the messages test. > > regards, > Ajin Cherian > Fujitsu Australia Thanks for addressing my v3 review comments above. I tested the latest v4. The v4 patch applied cleanly. make check-world completed successfully. So this patch v4 looks LGTM, apart from the following 2 nitpick comments: ====== 1. Suggest to add a blank line after the (void)txn; ? @@ -345,10 +345,29 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + + (void)txn; /* keep compiler quiet */ + /* + * Don't send BEGIN message here. Instead, postpone it until the first ====== 2. Unnecessary statement blocks? AFAIK those { } are not the usual PG code-style when there is only one statement, so suggest to remove them. Appies to 3 places: @@ -551,6 +576,12 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Assert(false); } + /* output BEGIN if we haven't yet */ + if (!data->sent_begin_txn && !in_streaming) + { + pgoutput_begin(ctx, txn); + } @@ -693,6 +724,12 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, if (nrelids > 0) { + /* output BEGIN if we haven't yet */ + if (!data->sent_begin_txn && !in_streaming) + { + pgoutput_begin(ctx, txn); + } @@ -725,6 +762,12 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, if (in_streaming) xid = txn->xid; + /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ + if (!data->sent_begin_txn && !in_streaming && transactional) + { + pgoutput_begin(ctx, txn); + } ------ Kind Regards, Peter Smith. Fujitsu Australia
pgsql-hackers by date: