Re: Basebackups reported as idle - Mailing list pgsql-hackers

From Magnus Hagander
Subject Re: Basebackups reported as idle
Date
Msg-id CABUevEz-u+2ux+OuGepQYchgz6XUDCRAvKRyuK0upP1R1KFY-w@mail.gmail.com
Whole thread Raw
In response to Re: Basebackups reported as idle  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: Basebackups reported as idle
List pgsql-hackers
On Wed, Dec 20, 2017 at 6:11 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Dec 19, 2017 at 11:50 PM, David Steele <david@pgmasters.net> wrote:
> On 12/19/17 4:56 AM, Magnus Hagander wrote:
>> AFAICT, base backups running on the replication protocol are always
>> reported as "idle" in pg_stat_activity. This seems to have been an
>> oversight in the "include walsender backends in pg_stat_activity" in 10,
>> which does include it for walsenders in general, just not for the ones
>> sending base backups. (and was then improved on later with the "include
>> all non-standard backends" patch).
>>
>> Unlike the regular walsender it also has to set it back to IDLE, since
>> you can actually finish a base backup without disconnecting.
>>
>> PFA a patch that fixes this. I think this is bugfix-for-backpatch, I
>> don't think it has a large risk of breaking things. Thoughts?
>
> +1 for this being a bug, albeit a minor one.

+1 for adding calls to pgstat_report_activity() in WAL senders and
tracking the activity of those processes.  Now I don't think that this
is the correct location as what matters is tracking if replication
commands are running or not, and not only BASE_BACKUP. So those calls
should be instead in exec_replication_command().

Hmm. That does make sense. I guess I got stuck looking at the old code, which is then *also* in the wrong place.

What about the attached?

Also, I noticed that the docs for exec_replication_command() says " * Returns true if the cmd_string was recognized as WalSender command, false
 * if not.". But it doesn't actually do that, it ereport(ERROR):s if it's not. There is one "return false", but it's after an ERROR.

So all other exit points of the procedure will be ERROR, in which case we should set back to idle automatically.

Independent of this change, that function should perhaps be made to return void.


>> Also, in setting this, there is no real way to differentiate between a
>> regular walsender and a basebackup walsender, other than looking at the
>> wait events. They're both listed as walsenders. Should there be?  (That
>> might not be as easily backpatchable, so keeping that as a separate one)
>
> Maybe something like "walsender [backup]" or just "basebackup" since
> walsender is pretty misleading?  It think it would be nice to be able to
> tell them apart, though I don't think it should be back-patched.  People
> might be relying on the name in the current versions.

You can already do a join with pg_stat_replication.pid and look for
the WAL sender state which is marked as "backup" in this case. So I am
-1 for making the current code more complicated. Please note as well
that pg_stat_activity.backend_type is set depending on the process
type, not based on what the process is doing so you would need to
invent a new logic.

Good point. I'm not sure why I didn't think of that.

It's still quite a bit weird that we call this process "walsender" when it does other things as well. But the ship sailed on that many years ago, changing that completely now would not be worth the breakage. 

--
Attachment

pgsql-hackers by date:

Previous
From: Huong Dangminh
Date:
Subject: RE: User defined data types in Logical Replication
Next
From: Christoph Berg
Date:
Subject: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl