Thread: Gripes about walsender command processing
While trying to understand a recent buildfarm failure [1], I got annoyed by the fact that a walsender exposes its last SQL command in pg_stat_activity even when that was a long time ago and what it's been doing since then is replication commands. This leads to *totally* misleading reports, for instance in [1] we read 2020-09-13 19:10:09.632 CEST [5f5e526d.242415:4] LOG: server process (PID 2368853) was terminated by signal 11: Segmentationfault 2020-09-13 19:10:09.632 CEST [5f5e526d.242415:5] DETAIL: Failed process was running: SELECT pg_catalog.set_config('search_path','', false); even though the process's own log messages paint a much better picture of reality: 2020-09-13 19:10:07.302 CEST [5f5e526f.242555:1] LOG: connection received: host=[local] 2020-09-13 19:10:07.303 CEST [5f5e526f.242555:2] LOG: replication connection authorized: user=bf application_name=sub2 2020-09-13 19:10:07.303 CEST [5f5e526f.242555:3] LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2020-09-13 19:10:07.334 CEST [5f5e526f.242555:4] LOG: received replication command: IDENTIFY_SYSTEM 2020-09-13 19:10:07.335 CEST [5f5e526f.242555:5] LOG: received replication command: START_REPLICATION SLOT "sub2" LOGICAL0/0 (proto_version '2', publication_names '"pub2"') 2020-09-13 19:10:07.335 CEST [5f5e526f.242555:6] LOG: starting logical decoding for slot "sub2" 2020-09-13 19:10:07.335 CEST [5f5e526f.242555:7] DETAIL: Streaming transactions committing after 0/159EB38, reading WALfrom 0/159EB00. 2020-09-13 19:10:07.335 CEST [5f5e526f.242555:8] LOG: logical decoding found consistent point at 0/159EB00 2020-09-13 19:10:07.335 CEST [5f5e526f.242555:9] DETAIL: There are no running transactions. I think that walsenders ought to report their current replication command as though it were a SQL command. They oughta set debug_query_string, too. While trying to fix this, I also observed that exec_replication_command fails to clean up the temp context it made for parsing the command string, if that turns out to be a SQL command. This very accidentally fails to lead to a long-term memory leak, because that context will be a child of MessageContext so it'll be cleaned out in the next iteration of PostgresMain's main loop. But it's still unacceptably sloppy coding IMHO, and it's closely related to a lot of other randomness in the function; it'd be better to have a separate early-exit path for SQL commands. Attached find a proposed fix for these things. What I'd really like to do is adjust pgstat_report_activity so that callers are *required* to provide some kind of command string when transitioning into STATE_RUNNING state, ie something like Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL); However, before we could do that, we'd have to clean up the rat's nest of seemingly-inserted-with-the-aid-of-a-dartboard pgstat_report_activity calls in replication/logical/worker.c. I couldn't make heads or tails of what is going on there, so I did not try. BTW, while I didn't change it here, isn't the second SnapBuildClearExportedSnapshot call in exec_replication_command just useless? We already did that before parsing the command. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03 diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 3f756b470a..9a2b154797 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1545,6 +1545,9 @@ exec_replication_command(const char *cmd_string) CHECK_FOR_INTERRUPTS(); + /* + * Parse the command. + */ cmd_context = AllocSetContextCreate(CurrentMemoryContext, "Replication command context", ALLOCSET_DEFAULT_SIZES); @@ -1560,15 +1563,38 @@ exec_replication_command(const char *cmd_string) cmd_node = replication_parse_result; + /* + * If it's a SQL command, just clean up our mess and return false; the + * caller will take care of executing it. + */ + if (IsA(cmd_node, SQLCmd)) + { + if (MyDatabaseId == InvalidOid) + ereport(ERROR, + (errmsg("cannot execute SQL commands in WAL sender for physical replication"))); + + MemoryContextSwitchTo(old_context); + MemoryContextDelete(cmd_context); + + /* Tell the caller that this wasn't a WalSender command. */ + return false; + } + + /* + * Report query to various monitoring facilities. For this purpose, we + * report replication commands just like SQL commands. + */ + debug_query_string = cmd_string; + + pgstat_report_activity(STATE_RUNNING, cmd_string); + /* * Log replication command if log_replication_commands is enabled. Even * when it's disabled, log the command with DEBUG1 level for backward - * compatibility. Note that SQL commands are not logged here, and will be - * logged later if log_statement is enabled. + * compatibility. */ - if (cmd_node->type != T_SQLCmd) - ereport(log_replication_commands ? LOG : DEBUG1, - (errmsg("received replication command: %s", cmd_string))); + ereport(log_replication_commands ? LOG : DEBUG1, + (errmsg("received replication command: %s", cmd_string))); /* * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was @@ -1578,10 +1604,9 @@ exec_replication_command(const char *cmd_string) SnapBuildClearExportedSnapshot(); /* - * For aborted transactions, don't allow anything except pure SQL, the - * exec_simple_query() will handle it correctly. + * Disallow replication commands in aborted transaction blocks. */ - if (IsAbortedTransactionBlockState() && !IsA(cmd_node, SQLCmd)) + if (IsAbortedTransactionBlockState()) ereport(ERROR, (errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION), errmsg("current transaction is aborted, " @@ -1597,9 +1622,6 @@ exec_replication_command(const char *cmd_string) initStringInfo(&reply_message); initStringInfo(&tmpbuf); - /* Report to pgstat that this process is running */ - pgstat_report_activity(STATE_RUNNING, NULL); - switch (cmd_node->type) { case T_IdentifySystemCmd: @@ -1651,17 +1673,6 @@ exec_replication_command(const char *cmd_string) } break; - case T_SQLCmd: - if (MyDatabaseId == InvalidOid) - ereport(ERROR, - (errmsg("cannot execute SQL commands in WAL sender for physical replication"))); - - /* Report to pgstat that this process is now idle */ - pgstat_report_activity(STATE_IDLE, NULL); - - /* Tell the caller that this wasn't a WalSender command. */ - return false; - default: elog(ERROR, "unrecognized replication command node tag: %u", cmd_node->type); @@ -1677,6 +1688,7 @@ exec_replication_command(const char *cmd_string) /* Report to pgstat that this process is now idle */ pgstat_report_activity(STATE_IDLE, NULL); + debug_query_string = NULL; return true; }
I wrote: > I think that walsenders ought to report their current replication command > as though it were a SQL command. They oughta set debug_query_string, too. I also notice that a walsender does not maintain its ps status: postgres 921109 100 0.0 32568 11880 ? Rs 18:57 0:51 postgres: publisher: walsender postgres [local] idle I don't mind if we decide we don't need to reflect the walsender's true activity, but let's not have it showing an outright lie. regards, tom lane
On Sun, Sep 13, 2020 at 03:47:51PM -0400, Tom Lane wrote: > While trying to fix this, I also observed that exec_replication_command > fails to clean up the temp context it made for parsing the command string, > if that turns out to be a SQL command. This very accidentally fails to > lead to a long-term memory leak, because that context will be a child of > MessageContext so it'll be cleaned out in the next iteration of > PostgresMain's main loop. But it's still unacceptably sloppy coding > IMHO, and it's closely related to a lot of other randomness in the > function; it'd be better to have a separate early-exit path for SQL > commands. Looks fine seen from here. +1. > What I'd really like to do is adjust pgstat_report_activity so that > callers are *required* to provide some kind of command string when > transitioning into STATE_RUNNING state, ie something like > > Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL); > > However, before we could do that, we'd have to clean up the rat's nest > of seemingly-inserted-with-the-aid-of-a-dartboard pgstat_report_activity > calls in replication/logical/worker.c. I couldn't make heads or tails > of what is going on there, so I did not try. For this one, I don't actually agree. For now the state and the query string, actually the activity string, are completely independent. So external modules like bgworkers can mix the state enum and the activity string as they wish. I think that this flexibility is useful to keep. > BTW, while I didn't change it here, isn't the second > SnapBuildClearExportedSnapshot call in exec_replication_command just > useless? We already did that before parsing the command. Indeed. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Sun, Sep 13, 2020 at 03:47:51PM -0400, Tom Lane wrote: >> While trying to fix this, I also observed that exec_replication_command >> fails to clean up the temp context it made for parsing the command string, >> if that turns out to be a SQL command. This very accidentally fails to >> lead to a long-term memory leak, because that context will be a child of >> MessageContext so it'll be cleaned out in the next iteration of >> PostgresMain's main loop. But it's still unacceptably sloppy coding >> IMHO, and it's closely related to a lot of other randomness in the >> function; it'd be better to have a separate early-exit path for SQL >> commands. > Looks fine seen from here. +1. Pushed, thanks for looking. >> What I'd really like to do is adjust pgstat_report_activity so that >> callers are *required* to provide some kind of command string when >> transitioning into STATE_RUNNING state, ie something like >> Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL); > For this one, I don't actually agree. For now the state and the query > string, actually the activity string, are completely independent. So > external modules like bgworkers can mix the state enum and the > activity string as they wish. I think that this flexibility is useful > to keep. Meh ... but I'm not excited enough about the point to argue. I looked into the other point about ps status always claiming the walsender is "idle". This turns out to be something PostgresMain does automatically, so unless we want to uglify that logic, we have to make walsenders set the field honestly. So I propose the attached. (Is there a better way to get the name of a replication command? I didn't look very hard for one.) regards, tom lane diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c index 67093383e6..a34c5745de 100644 --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1619,19 +1619,23 @@ exec_replication_command(const char *cmd_string) switch (cmd_node->type) { case T_IdentifySystemCmd: + set_ps_display("IDENTIFY_SYSTEM"); IdentifySystem(); break; case T_BaseBackupCmd: + set_ps_display("BASE_BACKUP"); PreventInTransactionBlock(true, "BASE_BACKUP"); SendBaseBackup((BaseBackupCmd *) cmd_node); break; case T_CreateReplicationSlotCmd: + set_ps_display("CREATE_REPLICATION_SLOT"); CreateReplicationSlot((CreateReplicationSlotCmd *) cmd_node); break; case T_DropReplicationSlotCmd: + set_ps_display("DROP_REPLICATION_SLOT"); DropReplicationSlot((DropReplicationSlotCmd *) cmd_node); break; @@ -1639,6 +1643,7 @@ exec_replication_command(const char *cmd_string) { StartReplicationCmd *cmd = (StartReplicationCmd *) cmd_node; + set_ps_display("START_REPLICATION"); PreventInTransactionBlock(true, "START_REPLICATION"); if (cmd->kind == REPLICATION_KIND_PHYSICAL) @@ -1651,6 +1656,7 @@ exec_replication_command(const char *cmd_string) } case T_TimeLineHistoryCmd: + set_ps_display("TIMELINE_HISTORY"); PreventInTransactionBlock(true, "TIMELINE_HISTORY"); SendTimeLineHistory((TimeLineHistoryCmd *) cmd_node); break; @@ -1660,6 +1666,7 @@ exec_replication_command(const char *cmd_string) DestReceiver *dest = CreateDestReceiver(DestRemoteSimple); VariableShowStmt *n = (VariableShowStmt *) cmd_node; + set_ps_display("SHOW"); /* syscache access needs a transaction environment */ StartTransactionCommand(); GetPGVariable(n->name, dest); @@ -1680,8 +1687,11 @@ exec_replication_command(const char *cmd_string) SetQueryCompletion(&qc, CMDTAG_SELECT, 0); EndCommand(&qc, DestRemote, true); - /* Report to pgstat that this process is now idle */ - pgstat_report_activity(STATE_IDLE, NULL); + /* + * We need not update ps display or pg_stat_activity, because PostgresMain + * will reset those to "idle". But we must reset debug_query_string to + * ensure it doesn't become a dangling pointer. + */ debug_query_string = NULL; return true;
On Mon, Sep 14, 2020 at 12:51:39PM -0400, Tom Lane wrote: > I looked into the other point about ps status always claiming the > walsender is "idle". This turns out to be something PostgresMain > does automatically, so unless we want to uglify that logic, we have > to make walsenders set the field honestly. So I propose the attached. > (Is there a better way to get the name of a replication command? > I didn't look very hard for one.) Wouldn't that just be in cmdtaglist.h, but extended for nodes used for replication commands? Then you could just call CreateCommandTag() to get the command string to print as postgres.c does. There is already one for repslot drop, in some way. This would have the advantage to just call once set_ps_display() before the switch split. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Sep 14, 2020 at 12:51:39PM -0400, Tom Lane wrote: >> (Is there a better way to get the name of a replication command? >> I didn't look very hard for one.) > Wouldn't that just be in cmdtaglist.h, but extended for nodes used for > replication commands? Then you could just call CreateCommandTag() to > get the command string to print as postgres.c does. There is already > one for repslot drop, in some way. This would have the advantage to > just call once set_ps_display() before the switch split. Mmm, not sure whether having CreateCommandTag know about replication commands is a good thing or not. We certainly could do it like that, since there's only one namespace of NodeTag values, but conceptually it feels a bit weird to me. There's a lot of other infrastructure for SQL command nodes that we're surely never going to build out for replication commands, so should we do it in CreateCommandTag? Anybody else have an opinion about it? (I don't quite follow your comment about repslot drop, btw.) regards, tom lane
On Mon, Sep 14, 2020 at 11:34:44PM -0400, Tom Lane wrote: > (I don't quite follow your comment about repslot drop, btw.) There is already a command tag equivalent to DROP_REPLICATION_SLOT: $ git grep REPLICATION -- */cmdtaglist.h src/include/tcop/cmdtaglist.h:PG_CMDTAG(CMDTAG_DROP_REPLICATION_SLOT, "DROP REPLICATION SLOT", false, false, false) -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Mon, Sep 14, 2020 at 11:34:44PM -0400, Tom Lane wrote: >> (I don't quite follow your comment about repslot drop, btw.) > There is already a command tag equivalent to DROP_REPLICATION_SLOT: > $ git grep REPLICATION -- */cmdtaglist.h > src/include/tcop/cmdtaglist.h:PG_CMDTAG(CMDTAG_DROP_REPLICATION_SLOT, > "DROP REPLICATION SLOT", false, false, false) [ blink ... ] So why is it that DropReplicationSlot does SetQueryCompletion(&qc, CMDTAG_DROP_REPLICATION_SLOT, 0); EndCommand(&qc, DestRemote, false); when the caller will immediately after that do SetQueryCompletion(&qc, CMDTAG_SELECT, 0); EndCommand(&qc, DestRemote, true); StartLogicalReplication has a similar weirdness. The more I look at this code, the more broken it seems. Anyway, independently of whether walsender should be sending multiple command-complete messages, I don't think I really approve of including replication commands in the CommandTag enum. It still seems like a type pun. However, it looks like we'd have to duplicate SetQueryCompletion/EndCommand if we don't want to do that, so maybe I'd better just hold my nose and do it that way. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-15, Alvaro Herrera wrote: >> I overlooked this in 2f9661311b83. From this perspective, I agree it >> looks wrong. We still have to send *some* completion tag (the 'C' >> message), but maybe we can invent a separate entry point in dest.c for >> that -- EndReplicationCommand() or some such -- that takes values from a >> separate enum? > It seems simpler than that actually; we don't need to build a lot of > infrastructure. This looks moderately reasonable to me. However, with the process title reporting I want to add, we're going to end up with a switch that looks like case T_IdentifySystemCmd: + set_ps_display("IDENTIFY_SYSTEM"); IdentifySystem(); + EndReplicationCommand("IDENTIFY_SYSTEM"); break; case T_BaseBackupCmd: + set_ps_display("BASE_BACKUP"); PreventInTransactionBlock(true, "BASE_BACKUP"); SendBaseBackup((BaseBackupCmd *) cmd_node); + EndReplicationCommand("BASE_BACKUP"); break; which is starting to look a bit repetitive and copy-pasteo-prone. I don't see an easy way to improve on it though. The only obvious alternative would be to put another switch before the main one that just fills a "const char *cmdtag" variable, but that seems ugly. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-16, Tom Lane wrote: >> I don't see an easy way to improve on it though. The only obvious >> alternative would be to put another switch before the main one that >> just fills a "const char *cmdtag" variable, but that seems ugly. > The alternative of doing the assignment in each case of the same switch > does not look too terrible: > case T_IdentifySystemCmd: > + cmdtag = "IDENTIFY_SYSTEM"; > + set_ps_display(cmdtag); > IdentifySystem(); > + EndReplicationCommand(cmdtag); > break; > case T_BaseBackupCmd: > + cmdtag = "BASE_BACKUP"; > + set_ps_display(cmdtag); > PreventInTransactionBlock(true, cmdtag); > SendBaseBackup((BaseBackupCmd *) cmd_node); > + EndReplicationCommand(cmdtag); > break; Yeah, that works for me. It doesn't allow for having just one set_ps_display() call ahead of the switch, but that isn't that big a loss. We cannot merge the EndReplicationCommand calls to after the switch, because some of the cases don't want one here; so that partial duplication is inescapable. Note that your changes need to be backpatched into v13, because AFAICS this code is violating the FE/BE protocol right now --- it's just luck that libpq isn't moaning about extra CommandComplete messages. But I don't think we want to change the ps title behavior in v13 at this late date, so that part should be HEAD-only. regards, tom lane
Alvaro Herrera <alvherre@2ndquadrant.com> writes: > On 2020-Sep-16, Tom Lane wrote: >> Note that your changes need to be backpatched into v13, >> because AFAICS this code is violating the FE/BE protocol >> right now --- it's just luck that libpq isn't moaning >> about extra CommandComplete messages. But I don't think >> we want to change the ps title behavior in v13 at this >> late date, so that part should be HEAD-only. > I pushed to 13, but it's because of the removal of the command tag; the > duplicate tags we've been sending since 9.3. ¯\_(ツ)_/¯ Ugh. Well, fixing v13 is an improvement anyway. Thanks! regards, tom lane