Thread: Gripes about walsender command processing

Gripes about walsender command processing

From
Tom Lane
Date:
While trying to understand a recent buildfarm failure [1], I got annoyed
by the fact that a walsender exposes its last SQL command in
pg_stat_activity even when that was a long time ago and what it's been
doing since then is replication commands.  This leads to *totally*
misleading reports, for instance in [1] we read

2020-09-13 19:10:09.632 CEST [5f5e526d.242415:4] LOG:  server process (PID 2368853) was terminated by signal 11:
Segmentationfault 
2020-09-13 19:10:09.632 CEST [5f5e526d.242415:5] DETAIL:  Failed process was running: SELECT
pg_catalog.set_config('search_path','', false); 

even though the process's own log messages paint a much better picture of
reality:

2020-09-13 19:10:07.302 CEST [5f5e526f.242555:1] LOG:  connection received: host=[local]
2020-09-13 19:10:07.303 CEST [5f5e526f.242555:2] LOG:  replication connection authorized: user=bf application_name=sub2
2020-09-13 19:10:07.303 CEST [5f5e526f.242555:3] LOG:  statement: SELECT pg_catalog.set_config('search_path', '',
false);
2020-09-13 19:10:07.334 CEST [5f5e526f.242555:4] LOG:  received replication command: IDENTIFY_SYSTEM
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:5] LOG:  received replication command: START_REPLICATION SLOT "sub2"
LOGICAL0/0 (proto_version '2', publication_names '"pub2"') 
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:6] LOG:  starting logical decoding for slot "sub2"
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:7] DETAIL:  Streaming transactions committing after 0/159EB38, reading
WALfrom 0/159EB00. 
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:8] LOG:  logical decoding found consistent point at 0/159EB00
2020-09-13 19:10:07.335 CEST [5f5e526f.242555:9] DETAIL:  There are no running transactions.

I think that walsenders ought to report their current replication command
as though it were a SQL command.  They oughta set debug_query_string, too.

While trying to fix this, I also observed that exec_replication_command
fails to clean up the temp context it made for parsing the command string,
if that turns out to be a SQL command.  This very accidentally fails to
lead to a long-term memory leak, because that context will be a child of
MessageContext so it'll be cleaned out in the next iteration of
PostgresMain's main loop.  But it's still unacceptably sloppy coding
IMHO, and it's closely related to a lot of other randomness in the
function; it'd be better to have a separate early-exit path for SQL
commands.

Attached find a proposed fix for these things.

What I'd really like to do is adjust pgstat_report_activity so that
callers are *required* to provide some kind of command string when
transitioning into STATE_RUNNING state, ie something like

    Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL);

However, before we could do that, we'd have to clean up the rat's nest
of seemingly-inserted-with-the-aid-of-a-dartboard pgstat_report_activity
calls in replication/logical/worker.c.  I couldn't make heads or tails
of what is going on there, so I did not try.

BTW, while I didn't change it here, isn't the second
SnapBuildClearExportedSnapshot call in exec_replication_command just
useless?  We already did that before parsing the command.

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 3f756b470a..9a2b154797 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1545,6 +1545,9 @@ exec_replication_command(const char *cmd_string)

     CHECK_FOR_INTERRUPTS();

+    /*
+     * Parse the command.
+     */
     cmd_context = AllocSetContextCreate(CurrentMemoryContext,
                                         "Replication command context",
                                         ALLOCSET_DEFAULT_SIZES);
@@ -1560,15 +1563,38 @@ exec_replication_command(const char *cmd_string)

     cmd_node = replication_parse_result;

+    /*
+     * If it's a SQL command, just clean up our mess and return false; the
+     * caller will take care of executing it.
+     */
+    if (IsA(cmd_node, SQLCmd))
+    {
+        if (MyDatabaseId == InvalidOid)
+            ereport(ERROR,
+                    (errmsg("cannot execute SQL commands in WAL sender for physical replication")));
+
+        MemoryContextSwitchTo(old_context);
+        MemoryContextDelete(cmd_context);
+
+        /* Tell the caller that this wasn't a WalSender command. */
+        return false;
+    }
+
+    /*
+     * Report query to various monitoring facilities.  For this purpose, we
+     * report replication commands just like SQL commands.
+     */
+    debug_query_string = cmd_string;
+
+    pgstat_report_activity(STATE_RUNNING, cmd_string);
+
     /*
      * Log replication command if log_replication_commands is enabled. Even
      * when it's disabled, log the command with DEBUG1 level for backward
-     * compatibility. Note that SQL commands are not logged here, and will be
-     * logged later if log_statement is enabled.
+     * compatibility.
      */
-    if (cmd_node->type != T_SQLCmd)
-        ereport(log_replication_commands ? LOG : DEBUG1,
-                (errmsg("received replication command: %s", cmd_string)));
+    ereport(log_replication_commands ? LOG : DEBUG1,
+            (errmsg("received replication command: %s", cmd_string)));

     /*
      * CREATE_REPLICATION_SLOT ... LOGICAL exports a snapshot. If it was
@@ -1578,10 +1604,9 @@ exec_replication_command(const char *cmd_string)
         SnapBuildClearExportedSnapshot();

     /*
-     * For aborted transactions, don't allow anything except pure SQL, the
-     * exec_simple_query() will handle it correctly.
+     * Disallow replication commands in aborted transaction blocks.
      */
-    if (IsAbortedTransactionBlockState() && !IsA(cmd_node, SQLCmd))
+    if (IsAbortedTransactionBlockState())
         ereport(ERROR,
                 (errcode(ERRCODE_IN_FAILED_SQL_TRANSACTION),
                  errmsg("current transaction is aborted, "
@@ -1597,9 +1622,6 @@ exec_replication_command(const char *cmd_string)
     initStringInfo(&reply_message);
     initStringInfo(&tmpbuf);

-    /* Report to pgstat that this process is running */
-    pgstat_report_activity(STATE_RUNNING, NULL);
-
     switch (cmd_node->type)
     {
         case T_IdentifySystemCmd:
@@ -1651,17 +1673,6 @@ exec_replication_command(const char *cmd_string)
             }
             break;

-        case T_SQLCmd:
-            if (MyDatabaseId == InvalidOid)
-                ereport(ERROR,
-                        (errmsg("cannot execute SQL commands in WAL sender for physical replication")));
-
-            /* Report to pgstat that this process is now idle */
-            pgstat_report_activity(STATE_IDLE, NULL);
-
-            /* Tell the caller that this wasn't a WalSender command. */
-            return false;
-
         default:
             elog(ERROR, "unrecognized replication command node tag: %u",
                  cmd_node->type);
@@ -1677,6 +1688,7 @@ exec_replication_command(const char *cmd_string)

     /* Report to pgstat that this process is now idle */
     pgstat_report_activity(STATE_IDLE, NULL);
+    debug_query_string = NULL;

     return true;
 }

Re: Gripes about walsender command processing

From
Tom Lane
Date:
I wrote:
> I think that walsenders ought to report their current replication command
> as though it were a SQL command.  They oughta set debug_query_string, too.

I also notice that a walsender does not maintain its ps status:

postgres  921109  100  0.0  32568 11880 ?        Rs   18:57   0:51 postgres: publisher: walsender postgres [local] idle

I don't mind if we decide we don't need to reflect the walsender's
true activity, but let's not have it showing an outright lie.

            regards, tom lane



Re: Gripes about walsender command processing

From
Michael Paquier
Date:
On Sun, Sep 13, 2020 at 03:47:51PM -0400, Tom Lane wrote:
> While trying to fix this, I also observed that exec_replication_command
> fails to clean up the temp context it made for parsing the command string,
> if that turns out to be a SQL command.  This very accidentally fails to
> lead to a long-term memory leak, because that context will be a child of
> MessageContext so it'll be cleaned out in the next iteration of
> PostgresMain's main loop.  But it's still unacceptably sloppy coding
> IMHO, and it's closely related to a lot of other randomness in the
> function; it'd be better to have a separate early-exit path for SQL
> commands.

Looks fine seen from here.  +1.

> What I'd really like to do is adjust pgstat_report_activity so that
> callers are *required* to provide some kind of command string when
> transitioning into STATE_RUNNING state, ie something like
>
>     Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL);
>
> However, before we could do that, we'd have to clean up the rat's nest
> of seemingly-inserted-with-the-aid-of-a-dartboard pgstat_report_activity
> calls in replication/logical/worker.c.  I couldn't make heads or tails
> of what is going on there, so I did not try.

For this one, I don't actually agree.  For now the state and the query
string, actually the activity string, are completely independent.  So
external modules like bgworkers can mix the state enum and the
activity string as they wish.  I think that this flexibility is useful
to keep.

> BTW, while I didn't change it here, isn't the second
> SnapBuildClearExportedSnapshot call in exec_replication_command just
> useless?  We already did that before parsing the command.

Indeed.
--
Michael

Attachment

Re: Gripes about walsender command processing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Sun, Sep 13, 2020 at 03:47:51PM -0400, Tom Lane wrote:
>> While trying to fix this, I also observed that exec_replication_command
>> fails to clean up the temp context it made for parsing the command string,
>> if that turns out to be a SQL command.  This very accidentally fails to
>> lead to a long-term memory leak, because that context will be a child of
>> MessageContext so it'll be cleaned out in the next iteration of
>> PostgresMain's main loop.  But it's still unacceptably sloppy coding
>> IMHO, and it's closely related to a lot of other randomness in the
>> function; it'd be better to have a separate early-exit path for SQL
>> commands.

> Looks fine seen from here.  +1.

Pushed, thanks for looking.

>> What I'd really like to do is adjust pgstat_report_activity so that
>> callers are *required* to provide some kind of command string when
>> transitioning into STATE_RUNNING state, ie something like
>> Assert(state == STATE_RUNNING ? cmd_str != NULL : cmd_str == NULL);

> For this one, I don't actually agree.  For now the state and the query
> string, actually the activity string, are completely independent.  So
> external modules like bgworkers can mix the state enum and the
> activity string as they wish.  I think that this flexibility is useful
> to keep.

Meh ... but I'm not excited enough about the point to argue.

I looked into the other point about ps status always claiming the
walsender is "idle".  This turns out to be something PostgresMain
does automatically, so unless we want to uglify that logic, we have
to make walsenders set the field honestly.  So I propose the attached.
(Is there a better way to get the name of a replication command?
I didn't look very hard for one.)

            regards, tom lane

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 67093383e6..a34c5745de 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1619,19 +1619,23 @@ exec_replication_command(const char *cmd_string)
     switch (cmd_node->type)
     {
         case T_IdentifySystemCmd:
+            set_ps_display("IDENTIFY_SYSTEM");
             IdentifySystem();
             break;

         case T_BaseBackupCmd:
+            set_ps_display("BASE_BACKUP");
             PreventInTransactionBlock(true, "BASE_BACKUP");
             SendBaseBackup((BaseBackupCmd *) cmd_node);
             break;

         case T_CreateReplicationSlotCmd:
+            set_ps_display("CREATE_REPLICATION_SLOT");
             CreateReplicationSlot((CreateReplicationSlotCmd *) cmd_node);
             break;

         case T_DropReplicationSlotCmd:
+            set_ps_display("DROP_REPLICATION_SLOT");
             DropReplicationSlot((DropReplicationSlotCmd *) cmd_node);
             break;

@@ -1639,6 +1643,7 @@ exec_replication_command(const char *cmd_string)
             {
                 StartReplicationCmd *cmd = (StartReplicationCmd *) cmd_node;

+                set_ps_display("START_REPLICATION");
                 PreventInTransactionBlock(true, "START_REPLICATION");

                 if (cmd->kind == REPLICATION_KIND_PHYSICAL)
@@ -1651,6 +1656,7 @@ exec_replication_command(const char *cmd_string)
             }

         case T_TimeLineHistoryCmd:
+            set_ps_display("TIMELINE_HISTORY");
             PreventInTransactionBlock(true, "TIMELINE_HISTORY");
             SendTimeLineHistory((TimeLineHistoryCmd *) cmd_node);
             break;
@@ -1660,6 +1666,7 @@ exec_replication_command(const char *cmd_string)
                 DestReceiver *dest = CreateDestReceiver(DestRemoteSimple);
                 VariableShowStmt *n = (VariableShowStmt *) cmd_node;

+                set_ps_display("SHOW");
                 /* syscache access needs a transaction environment */
                 StartTransactionCommand();
                 GetPGVariable(n->name, dest);
@@ -1680,8 +1687,11 @@ exec_replication_command(const char *cmd_string)
     SetQueryCompletion(&qc, CMDTAG_SELECT, 0);
     EndCommand(&qc, DestRemote, true);

-    /* Report to pgstat that this process is now idle */
-    pgstat_report_activity(STATE_IDLE, NULL);
+    /*
+     * We need not update ps display or pg_stat_activity, because PostgresMain
+     * will reset those to "idle".  But we must reset debug_query_string to
+     * ensure it doesn't become a dangling pointer.
+     */
     debug_query_string = NULL;

     return true;

Re: Gripes about walsender command processing

From
Michael Paquier
Date:
On Mon, Sep 14, 2020 at 12:51:39PM -0400, Tom Lane wrote:
> I looked into the other point about ps status always claiming the
> walsender is "idle".  This turns out to be something PostgresMain
> does automatically, so unless we want to uglify that logic, we have
> to make walsenders set the field honestly.  So I propose the attached.
> (Is there a better way to get the name of a replication command?
> I didn't look very hard for one.)

Wouldn't that just be in cmdtaglist.h, but extended for nodes used for
replication commands?  Then you could just call CreateCommandTag() to
get the command string to print as postgres.c does.  There is already
one for repslot drop, in some way.  This would have the advantage to
just call once set_ps_display() before the switch split.
--
Michael

Attachment

Re: Gripes about walsender command processing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Sep 14, 2020 at 12:51:39PM -0400, Tom Lane wrote:
>> (Is there a better way to get the name of a replication command?
>> I didn't look very hard for one.)

> Wouldn't that just be in cmdtaglist.h, but extended for nodes used for
> replication commands?  Then you could just call CreateCommandTag() to
> get the command string to print as postgres.c does.  There is already
> one for repslot drop, in some way.  This would have the advantage to
> just call once set_ps_display() before the switch split.

Mmm, not sure whether having CreateCommandTag know about replication
commands is a good thing or not.  We certainly could do it like that,
since there's only one namespace of NodeTag values, but conceptually
it feels a bit weird to me.  There's a lot of other infrastructure
for SQL command nodes that we're surely never going to build out for
replication commands, so should we do it in CreateCommandTag?

Anybody else have an opinion about it?

(I don't quite follow your comment about repslot drop, btw.)

            regards, tom lane



Re: Gripes about walsender command processing

From
Michael Paquier
Date:
On Mon, Sep 14, 2020 at 11:34:44PM -0400, Tom Lane wrote:
> (I don't quite follow your comment about repslot drop, btw.)

There is already a command tag equivalent to DROP_REPLICATION_SLOT:
$ git grep REPLICATION -- */cmdtaglist.h
src/include/tcop/cmdtaglist.h:PG_CMDTAG(CMDTAG_DROP_REPLICATION_SLOT,
"DROP REPLICATION SLOT", false, false, false)
--
Michael

Attachment

Re: Gripes about walsender command processing

From
Tom Lane
Date:
Michael Paquier <michael@paquier.xyz> writes:
> On Mon, Sep 14, 2020 at 11:34:44PM -0400, Tom Lane wrote:
>> (I don't quite follow your comment about repslot drop, btw.)

> There is already a command tag equivalent to DROP_REPLICATION_SLOT:
> $ git grep REPLICATION -- */cmdtaglist.h
> src/include/tcop/cmdtaglist.h:PG_CMDTAG(CMDTAG_DROP_REPLICATION_SLOT,
> "DROP REPLICATION SLOT", false, false, false)

[ blink ... ]  So why is it that DropReplicationSlot does

    SetQueryCompletion(&qc, CMDTAG_DROP_REPLICATION_SLOT, 0);
    EndCommand(&qc, DestRemote, false);

when the caller will immediately after that do

    SetQueryCompletion(&qc, CMDTAG_SELECT, 0);
    EndCommand(&qc, DestRemote, true);

StartLogicalReplication has a similar weirdness.
The more I look at this code, the more broken it seems.

Anyway, independently of whether walsender should be sending multiple
command-complete messages, I don't think I really approve of including
replication commands in the CommandTag enum.  It still seems like a
type pun.  However, it looks like we'd have to duplicate
SetQueryCompletion/EndCommand if we don't want to do that, so maybe
I'd better just hold my nose and do it that way.

            regards, tom lane



Re: Gripes about walsender command processing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-15, Alvaro Herrera wrote:
>> I overlooked this in 2f9661311b83.  From this perspective, I agree it
>> looks wrong.  We still have to send *some* completion tag (the 'C'
>> message), but maybe we can invent a separate entry point in dest.c for
>> that -- EndReplicationCommand() or some such -- that takes values from a
>> separate enum?

> It seems simpler than that actually; we don't need to build a lot of
> infrastructure.

This looks moderately reasonable to me.  However, with the process
title reporting I want to add, we're going to end up with a switch
that looks like

         case T_IdentifySystemCmd:
+            set_ps_display("IDENTIFY_SYSTEM");
             IdentifySystem();
+            EndReplicationCommand("IDENTIFY_SYSTEM");
             break;
 
         case T_BaseBackupCmd:
+            set_ps_display("BASE_BACKUP");
             PreventInTransactionBlock(true, "BASE_BACKUP");
             SendBaseBackup((BaseBackupCmd *) cmd_node);
+            EndReplicationCommand("BASE_BACKUP");
             break;

which is starting to look a bit repetitive and copy-pasteo-prone.
I don't see an easy way to improve on it though.  The only obvious
alternative would be to put another switch before the main one that
just fills a "const char *cmdtag" variable, but that seems ugly.

            regards, tom lane



Re: Gripes about walsender command processing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-16, Tom Lane wrote:
>> I don't see an easy way to improve on it though.  The only obvious
>> alternative would be to put another switch before the main one that
>> just fills a "const char *cmdtag" variable, but that seems ugly.

> The alternative of doing the assignment in each case of the same switch
> does not look too terrible:

>          case T_IdentifySystemCmd:
> +            cmdtag = "IDENTIFY_SYSTEM";
> +            set_ps_display(cmdtag);
>              IdentifySystem();
> +            EndReplicationCommand(cmdtag);
>              break;
 
>          case T_BaseBackupCmd:
> +            cmdtag = "BASE_BACKUP";
> +            set_ps_display(cmdtag);
>              PreventInTransactionBlock(true, cmdtag);
>              SendBaseBackup((BaseBackupCmd *) cmd_node);
> +            EndReplicationCommand(cmdtag);
>              break;

Yeah, that works for me.  It doesn't allow for having just one
set_ps_display() call ahead of the switch, but that isn't that
big a loss.  We cannot merge the EndReplicationCommand calls to
after the switch, because some of the cases don't want one here;
so that partial duplication is inescapable.

Note that your changes need to be backpatched into v13,
because AFAICS this code is violating the FE/BE protocol
right now --- it's just luck that libpq isn't moaning
about extra CommandComplete messages.  But I don't think
we want to change the ps title behavior in v13 at this
late date, so that part should be HEAD-only.

            regards, tom lane



Re: Gripes about walsender command processing

From
Tom Lane
Date:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> On 2020-Sep-16, Tom Lane wrote:
>> Note that your changes need to be backpatched into v13,
>> because AFAICS this code is violating the FE/BE protocol
>> right now --- it's just luck that libpq isn't moaning
>> about extra CommandComplete messages.  But I don't think
>> we want to change the ps title behavior in v13 at this
>> late date, so that part should be HEAD-only.

> I pushed to 13, but it's because of the removal of the command tag; the
> duplicate tags we've been sending since 9.3.  ¯\_(ツ)_/¯

Ugh.  Well, fixing v13 is an improvement anyway.  Thanks!

            regards, tom lane