Thread: [GENERAL] Monitoring of a hot standby with a largely idle master
I think that none of the recovery information functions (https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE) can distinguish a hot standby which is connected to an idle master, versus one which is disconnected. For example, because the master has crashed, or someone has changed the firewall rules.
Is there a way to monitor from SQL the last time the standby was able to contact the master and initiate streaming with it? Other than trying to write a function that parses it out of pg_log?
Cheers,
Jeff
On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote: > > I think that none of the recovery information functions > (https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE) > can distinguish a hot standby which is connected to an idle master, versus > one which is disconnected. For example, because the master has crashed, or > someone has changed the firewall rules. > > Is there a way to monitor from SQL the last time the standby was able to > contact the master and initiate streaming with it? Other than trying to > write a function that parses it out of pg_log? Not directly I am afraid. One way I can think about is to poll periodically the state of pg_stat_replication on the primary or pg_stat_wal_receiver on the standby and save it in a custom table. The past information is not persistent as any replication-related data in catalogs is based on the shared memory state of the WAL senders and the WAL receiver, and those are wiped out at reconnection. -- Michael
On Thu, Jul 13, 2017 at 1:15 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> I think that none of the recovery information functions
> (https://www.postgresql.org/docs/9.6/static/functions-admin. html#FUNCTIONS-RECOVERY-INFO- TABLE)
> can distinguish a hot standby which is connected to an idle master, versus
> one which is disconnected. For example, because the master has crashed, or
> someone has changed the firewall rules.
>
> Is there a way to monitor from SQL the last time the standby was able to
> contact the master and initiate streaming with it? Other than trying to
> write a function that parses it out of pg_log?
Not directly I am afraid. One way I can think about is to poll
periodically the state of pg_stat_replication on the primary or
pg_stat_wal_receiver on the standby and save it in a custom table. The
past information is not persistent as any replication-related data in
catalogs is based on the shared memory state of the WAL senders and
the WAL receiver, and those are wiped out at reconnection.
Thanks, that looks like what I want (or will be, once I get the other side to upgrade to 9.6).
I think that pg_stat_wal_receiver should be crossreferenced in https://www.postgresql.org/ docs/9.6/static/hot-standby. html, near the same place which it crossreferences table 9-79. That would make it more discoverable.
Cheers,
Jeff
On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jul 13, 2017 at 1:15 AM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> >> On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote: >> > >> > I think that none of the recovery information functions >> > >> > (https://www.postgresql.org/docs/9.6/static/functions-admin.html#FUNCTIONS-RECOVERY-INFO-TABLE) >> > can distinguish a hot standby which is connected to an idle master, >> > versus >> > one which is disconnected. For example, because the master has crashed, >> > or >> > someone has changed the firewall rules. >> > >> > Is there a way to monitor from SQL the last time the standby was able to >> > contact the master and initiate streaming with it? Other than trying to >> > write a function that parses it out of pg_log? >> >> Not directly I am afraid. One way I can think about is to poll >> periodically the state of pg_stat_replication on the primary or >> pg_stat_wal_receiver on the standby and save it in a custom table. The >> past information is not persistent as any replication-related data in >> catalogs is based on the shared memory state of the WAL senders and >> the WAL receiver, and those are wiped out at reconnection. > > > Thanks, that looks like what I want (or will be, once I get the other side > to upgrade to 9.6). > > I think that pg_stat_wal_receiver should be crossreferenced in > https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the same > place which it crossreferences table 9-79. That would make it more > discoverable. Hm. Hot standby may not involve streaming replication. What about a paragraph here instead? https://www.postgresql.org/docs/devel/static/warm-standby.html#streaming-replication In the monitoring subsection, we could tell that on a standby the WAL receiver status can be retrieved from this view when changes are streamed. What do you think? -- Michael
On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
Hm. Hot standby may not involve streaming replication. What about aOn Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> I think that pg_stat_wal_receiver should be crossreferenced in
> https://www.postgresql.org/docs/9.6/static/hot-standby. html, near the same
> place which it crossreferences table 9-79. That would make it more
> discoverable.
paragraph here instead?
https://www.postgresql.org/docs/devel/static/warm- standby.html#streaming- replication
In the monitoring subsection, we could tell that on a standby the WAL
receiver status can be retrieved from this view when changes are
streamed. What do you think?
That works for me.
Cheers,
Jeff
2017-07-13 20:15 GMT+12:00 Michael Paquier <michael.paquier@gmail.com>:
On Thu, Jul 13, 2017 at 5:26 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
>
> I think that none of the recovery information functions
> (https://www.postgresql.org/docs/9.6/static/functions- admin.html#FUNCTIONS-RECOVERY- INFO-TABLE)
> can distinguish a hot standby which is connected to an idle master, versus
> one which is disconnected. For example, because the master has crashed, or
> someone has changed the firewall rules.
>
> Is there a way to monitor from SQL the last time the standby was able to
> contact the master and initiate streaming with it? Other than trying to
> write a function that parses it out of pg_log?
Not directly I am afraid. One way I can think about is to poll
periodically the state of pg_stat_replication on the primary or
pg_stat_wal_receiver on the standby and save it in a custom table. The
past information is not persistent as any replication-related data in
catalogs is based on the shared memory state of the WAL senders and
the WAL receiver, and those are wiped out at reconnection.
--
Michael
That works for me too! I do this way... cron job runs that every X minutes and if the replication lag is higher than 1 second it sends me an email..
It works pretty well and I used bash.
Lucas
On Fri, Jul 14, 2017 at 9:11 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> >> On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> > >> > I think that pg_stat_wal_receiver should be crossreferenced in >> > https://www.postgresql.org/docs/9.6/static/hot-standby.html, near the >> > same >> > place which it crossreferences table 9-79. That would make it more >> > discoverable. >> >> Hm. Hot standby may not involve streaming replication. What about a >> paragraph here instead? >> >> https://www.postgresql.org/docs/devel/static/warm-standby.html#streaming-replication >> >> In the monitoring subsection, we could tell that on a standby the WAL >> receiver status can be retrieved from this view when changes are >> streamed. What do you think? > > > That works for me. What do you think about the patch attached? <para> You can retrieve a list of WAL sender processes via the - <link linkend="monitoring-stats-views-table"> + <link linkend="monitoring-stats-dynamic-views-table"> <literal>pg_stat_replication</></link> view. Large differences between In the previous paragraph I have noticed that the link reference is incorrect. pg_stat_replication is listed under monitoring-stats-dynamic-views-table. -- Michael
Attachment
On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Fri, Jul 14, 2017 at 9:11 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Thu, Jul 13, 2017 at 10:38 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>>
>> On Thu, Jul 13, 2017 at 7:23 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
>> >
>> > I think that pg_stat_wal_receiver should be crossreferenced in
>> > https://www.postgresql.org/docs/9.6/static/hot-standby. html, near the
>> > same
>> > place which it crossreferences table 9-79. That would make it more
>> > discoverable.
>>
>> Hm. Hot standby may not involve streaming replication. What about a
>> paragraph here instead?
>>
>> https://www.postgresql.org/docs/devel/static/warm- standby.html#streaming- replication
>>
>> In the monitoring subsection, we could tell that on a standby the WAL
>> receiver status can be retrieved from this view when changes are
>> streamed. What do you think?
>
>
> That works for me.
What do you think about the patch attached?
Looks OK. Should it mention specifically "On a hot standby" rather than "On a standby"? Otherwise people might be left confused on how they are supposed to do this on a generic standby. It is the kind of thing which is obvious once you know it, but confusing the first time you encounter it.
<para>
You can retrieve a list of WAL sender processes via the
- <link linkend="monitoring-stats-views-table">
+ <link linkend="monitoring-stats-dynamic-views-table">
<literal>pg_stat_replication</></link> view. Large differences between
In the previous paragraph I have noticed that the link reference is
incorrect. pg_stat_replication is listed under
monitoring-stats-dynamic-views-table.
Yes, that is clearly wrong. But why not link directly to the description of the view itself, pg-stat-replication-view, rather than the correct table which mentions the view? Is that the accepted docs style to link to the more generic place? (Same thing applies to your patch, it could link directly to pg-stat-wal-receiver-view.
Sorry for the delay, it took me awhile to get the new doc build system to work (solution seems to be, "Don't use CentOS6 anymore")
Cheers,
Jeff
On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com> > wrote: >> What do you think about the patch attached? > > Looks OK. Should it mention specifically "On a hot standby" rather than "On > a standby"? Otherwise people might be left confused on how they are > supposed to do this on a generic standby. It is the kind of thing which is > obvious once you know it, but confusing the first time you encounter it. Yes, right. Let's update as you suggest. >> <para> >> You can retrieve a list of WAL sender processes via the >> - <link linkend="monitoring-stats-views-table"> >> + <link linkend="monitoring-stats-dynamic-views-table"> >> <literal>pg_stat_replication</></link> view. Large differences >> between >> In the previous paragraph I have noticed that the link reference is >> incorrect. pg_stat_replication is listed under >> monitoring-stats-dynamic-views-table. > > Yes, that is clearly wrong. But why not link directly to the description of > the view itself, pg-stat-replication-view, rather than the correct table > which mentions the view? Is that the accepted docs style to link to the > more generic place? (Same thing applies to your patch, it could link > directly to pg-stat-wal-receiver-view. Yes, that's even better. > Sorry for the delay, it took me awhile to get the new doc build system to > work (solution seems to be, "Don't use CentOS6 anymore") :) Thanks for the review. -- Michael
Attachment
On Wed, Jul 26, 2017 at 1:44 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
> On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com>
> wrote:
>> What do you think about the patch attached?
>
> Looks OK. Should it mention specifically "On a hot standby" rather than "On
> a standby"? Otherwise people might be left confused on how they are
> supposed to do this on a generic standby. It is the kind of thing which is
> obvious once you know it, but confusing the first time you encounter it.
Yes, right. Let's update as you suggest.
new version looks good.
Thanks,
Jeff
On 7/27/17 16:14, Jeff Janes wrote: > On Wed, Jul 26, 2017 at 1:44 AM, Michael Paquier > <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> wrote: > > On Mon, Jul 24, 2017 at 9:08 PM, Jeff Janes <jeff.janes@gmail.com > <mailto:jeff.janes@gmail.com>> wrote: > > On Sun, Jul 16, 2017 at 8:47 AM, Michael Paquier <michael.paquier@gmail.com <mailto:michael.paquier@gmail.com>> > > wrote: > >> What do you think about the patch attached? > > > > Looks OK. Should it mention specifically "On a hot standby" rather than "On > > a standby"? Otherwise people might be left confused on how they are > > supposed to do this on a generic standby. It is the kind of thing which is > > obvious once you know it, but confusing the first time you encounter it. > > Yes, right. Let's update as you suggest. > > > new version looks good. committed I changed to links to xrefs, which automatically generated the correct target texts. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general