Re: Function to track shmem reinit time - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: Function to track shmem reinit time
Date
Msg-id 1247478e-b106-9c82-0e23-d0dedbd47a78@2ndquadrant.com
Whole thread Raw
In response to Function to track shmem reinit time  (Anastasia Lubennikova <a.lubennikova@postgrespro.ru>)
List pgsql-hackers
On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:
> Attached patch introduces a new function pg_shmem_init_time(),
> which returns the time shared memory was last (re)initialized.
> It is created for use by monitoring tools to track backend crashes.
> 
> Currently, if the 'restart_after_crash' option is on, postgres will
> just restart. And the only way to know that it happened is to
> regularly parse logfile or monitor it, catching restart messages.
> This approach is really inconvenient for users, who have gigabytes of
> logs.
> 
> This new function can be periodiacally called by a monitoring agent,
> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/ 
> we know that server crashed-restarted, and also know the exact time,
> when.
> 

I don't think it really solves the problem, though. For example if the
whole VM reboots (which can be a matter of seconds), this check will say
"shmem_init_time == pg_postmaster_start_time" and you've not detected
anything.

IMHO pg_postmaster_start_time is the right way to monitor uptime, and
the right way to detect spurious restarts is to remember the last value
you've seen and compare it to the current one.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: Function to track shmem reinit time
Next
From: Tom Lane
Date:
Subject: Re: Function to track shmem reinit time