Thread: Function to track shmem reinit time

Function to track shmem reinit time

From
Anastasia Lubennikova
Date:

Attached patch introduces a new function pg_shmem_init_time(),
which returns the time shared memory was last (re)initialized.
It is created for use by monitoring tools to track backend crashes.

Currently, if the 'restart_after_crash' option is on, postgres will just restart.
And the only way to know that it happened is to regularly parse logfile
or monitor it, catching restart messages. This approach is really inconvenient for
users, who have gigabytes of logs.

This new function can be periodiacally called by a monitoring agent, and,
if shmem_init_time doesn't match pg_postmaster_start_time,
we know that server crashed-restarted, and also know the exact time, when.

Also, working on this patch, I noticed a bit of dead code
and some discordant comments in postmaster.c.
I see no reason to leave it as is.
So there is a small remove_dead_shmem_reinit_code_v0.patch.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Attachment

Re: Function to track shmem reinit time

From
Grigory Smolkin
Date:

It can be used to accurately calculate server uptime, since you can`t rely on pg_postmaster_start_time() in this.

On 02/28/2018 03:11 PM, Anastasia Lubennikova wrote:

Attached patch introduces a new function pg_shmem_init_time(),
which returns the time shared memory was last (re)initialized.
It is created for use by monitoring tools to track backend crashes.

Currently, if the 'restart_after_crash' option is on, postgres will just restart.
And the only way to know that it happened is to regularly parse logfile
or monitor it, catching restart messages. This approach is really inconvenient for
users, who have gigabytes of logs.

This new function can be periodiacally called by a monitoring agent, and,
if shmem_init_time doesn't match pg_postmaster_start_time,
we know that server crashed-restarted, and also know the exact time, when.

Also, working on this patch, I noticed a bit of dead code
and some discordant comments in postmaster.c.
I see no reason to leave it as is.
So there is a small remove_dead_shmem_reinit_code_v0.patch.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Function to track shmem reinit time

From
Peter Eisentraut
Date:
On 2/28/18 07:11, Anastasia Lubennikova wrote:
> Currently, if the 'restart_after_crash' option is on, postgres will just
> restart.
> And the only way to know that it happened is to regularly parse logfile
> or monitor it, catching restart messages. This approach is really
> inconvenient for
> users, who have gigabytes of logs.

I find this premise a bit dubious.  Why have a log file if it's too big
to find anything in it?  Server crashes aren't the only thing people are
interested in.  So we'll need a function for "last $anything".

> This new function can be periodiacally called by a monitoring agent, and,
> if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
> we know that server crashed-restarted, and also know the exact time, when.

If we want to do something like this, I'd rather have a name that does
not expose implementation details.  If we restructure shared memory
management in the future, the name won't apply anymore.  Call it "last
crash" or "last reinit" or something like that.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Function to track shmem reinit time

From
Justin Pryzby
Date:
On Sat, Mar 03, 2018 at 01:00:52PM -0500, Peter Eisentraut wrote:
> On 2/28/18 07:11, Anastasia Lubennikova wrote:
> > Currently, if the 'restart_after_crash' option is on, postgres will just
> > restart.
> > And the only way to know that it happened is to regularly parse logfile
> > or monitor it, catching restart messages. This approach is really
> > inconvenient for
> > users, who have gigabytes of logs.
> 
> I find this premise a bit dubious.  Why have a log file if it's too big
> to find anything in it?  Server crashes aren't the only thing people are
> interested in.  So we'll need a function for "last $anything".

I think one can tell if it's crashed recently by comparing start time of parent
postmaster and its main children (I'm going to go put this in place for myself
now).

ts=# SELECT backend_type, backend_start, pg_postmaster_start_time(), backend_start-pg_postmaster_start_time() FROM
pg_stat_activityORDER BY backend_start LIMIT 1;
 
    backend_type     |         backend_start         |   pg_postmaster_start_time    |    ?column?     
---------------------+-------------------------------+-------------------------------+-----------------
 autovacuum launcher | 2018-03-02 00:21:11.604468-03 | 2018-03-02 00:12:46.757642-03 | 00:08:24.846826

pryzbyj@pryzbyj:~$ ps -O lstart -upostgres # on a separate server with 9.4 which doen't have backend_type, add in v10
  PID                  STARTED S TTY          TIME COMMAND
12982 Mon Feb 19 06:03:13 2018 S ?        00:00:33 /usr/lib/postgresql/9.4/bin/postgres -D /var/lib/postgresql/9.4/main
-cconfig_file=/etc/postgresql/9.4/main/postgresql.conf
 
12997 Mon Feb 19 06:03:14 2018 S ?        00:00:00 postgres: checkpointer process
                                                      
 
12998 Mon Feb 19 06:03:14 2018 S ?        00:00:08 postgres: writer process
                                                      
 
12999 Mon Feb 19 06:03:14 2018 S ?        00:00:08 postgres: wal writer process
                                                      
 
13000 Mon Feb 19 06:03:14 2018 S ?        00:00:19 postgres: autovacuum launcher process
                                                      
 
13001 Mon Feb 19 06:03:14 2018 S ?        00:00:45 postgres: stats collector process
                                                      
 

pryzbyj@pryzbyj:~$ sudo kill -SEGV 12997
pryzbyj@pryzbyj:~$ ps -O lstart -upostgres
  PID                  STARTED S TTY          TIME COMMAND
 8644 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: checkpointer process
                                                      
 
 8645 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: writer process
                                                      
 
 8646 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: wal writer process
                                                      
 
 8647 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: autovacuum launcher process
                                                      
 
 8648 Sat Mar  3 12:05:34 2018 S ?        00:00:00 postgres: stats collector process
                                                      
 
12982 Mon Feb 19 06:03:13 2018 S ?        00:00:33 /usr/lib/postgresql/9.4/bin/postgres -D /var/lib/postgresql/9.4/main
-cconfig_file=/etc/postgresql/9.4/main/postgresql.conf
 

Justin


Re: Function to track shmem reinit time

From
Tomas Vondra
Date:
On 02/28/2018 02:39 PM, Grigory Smolkin wrote:
> It can be used to accurately calculate server uptime, since you can`t
> rely on pg_postmaster_start_time() in this.
> 

Can you please explain why pg_postmaster_start_time can't be used for
this purpose? It seems like a pretty good match, considering it's meant
to show server start time.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Function to track shmem reinit time

From
Tomas Vondra
Date:
On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:
> Attached patch introduces a new function pg_shmem_init_time(),
> which returns the time shared memory was last (re)initialized.
> It is created for use by monitoring tools to track backend crashes.
> 
> Currently, if the 'restart_after_crash' option is on, postgres will
> just restart. And the only way to know that it happened is to
> regularly parse logfile or monitor it, catching restart messages.
> This approach is really inconvenient for users, who have gigabytes of
> logs.
> 
> This new function can be periodiacally called by a monitoring agent,
> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/ 
> we know that server crashed-restarted, and also know the exact time,
> when.
> 

I don't think it really solves the problem, though. For example if the
whole VM reboots (which can be a matter of seconds), this check will say
"shmem_init_time == pg_postmaster_start_time" and you've not detected
anything.

IMHO pg_postmaster_start_time is the right way to monitor uptime, and
the right way to detect spurious restarts is to remember the last value
you've seen and compare it to the current one.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Function to track shmem reinit time

From
Tom Lane
Date:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On 02/28/2018 02:39 PM, Grigory Smolkin wrote:
>> It can be used to accurately calculate server uptime, since you can`t
>> rely on pg_postmaster_start_time() in this.

> Can you please explain why pg_postmaster_start_time can't be used for
> this purpose? It seems like a pretty good match, considering it's meant
> to show server start time.

It evidently depends on how you want to define "server uptime".  If you
get backend crashes often enough, you might feel a need to define it
as "time since last crash".  Although I would think that if that's
happening regularly in a production environment, you have a problem
you need to fix, not just measure.

My own thought about that is that if you are trying to measure
backend crashes, just knowing the time of the latest one is little help.
You want to know how often they're happening.  So this gets back to the
question of why the postmaster log isn't a useful source of that
information.  I think that if we're to do anything in this area,
improving the usefulness of the log would be more important than
providing the proposed function.

            regards, tom lane


Re: Function to track shmem reinit time

From
Tomas Vondra
Date:
On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:
> 
> This new function can be periodiacally called by a monitoring agent, 
> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/ 
> we know that server crashed-restarted, and also know the exact time, 
> when.
> 

Actually, after looking at the code a bit, I think that test would not
really work anyway, because those two timestamps come from two separate
GetCurrentTimestamp calls, so you get this:

    test=# select pg_shmem_init_time(), pg_postmaster_start_time();
          pg_shmem_init_time       |   pg_postmaster_start_time
    -------------------------------+-------------------------------
     2018-03-04 17:10:59.017058+01 | 2018-03-04 17:10:59.022071+01
    (1 row)

    test=# select pg_shmem_init_time() = pg_postmaster_start_time();
     ?column?
    ----------
     f
    (1 row)

So there would have to be some sort of "fuzz" parameter when comparing
the values, etc.

Furthermore, the patch is yet another victim of fd1a421fe - fixing the
pg_proc entries is trivial, but a new version is needed.

I'd also like to see an example/explanation how this improves this
situation compared to only having pg_postmaster_start_time.

So I'm setting this as waiting on author for now.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Function to track shmem reinit time

From
Grigory Smolkin
Date:
On 03/03/2018 09:00 PM, Peter Eisentraut wrote:
> I find this premise a bit dubious.  Why have a log file if it's too big
> to find anything in it?  Server crashes aren't the only thing people are
> interested in.  So we'll need a function for "last $anything".

Thank you for your interest in this topic.
Well, on heavy loaded machine log file will be big, because you usually 
want to log every query for later analysis, and, because postgres is 
dumping everything in a single file, searching for some specific error 
will be slow. Log file is certainly needed for digging the details of a 
crash, but pg_shmem_init_time is more about online monitoring.

On 03/03/2018 09:43 PM, Justin Pryzby wrote:

> I think one can tell if it's crashed recently by comparing start time of parent
> postmaster and its main children (I'm going to go put this in place for myself
> now).
>
> ts=# SELECT backend_type, backend_start, pg_postmaster_start_time(), backend_start-pg_postmaster_start_time() FROM
pg_stat_activityORDER BY backend_start LIMIT 1;
 
>      backend_type     |         backend_start         |   pg_postmaster_start_time    |    ?column?
> ---------------------+-------------------------------+-------------------------------+-----------------
>   autovacuum launcher | 2018-03-02 00:21:11.604468-03 | 2018-03-02 00:12:46.757642-03 | 00:08:24.846826

Thank you, though we must take into account the fact that autovacuum may 
not be enabled so it`s better to use checkpointer or bgwriter. It`s 
working solution, yes, but it looks more like workaround and requires 
from user an understanding of postgres internals.


On 03/04/2018 06:56 PM, Tomas Vondra wrote:

> Can you please explain why pg_postmaster_start_time can't be used for
> this purpose? It seems like a pretty good match, considering it's meant
> to show server start time.

Because a backend crash do not reset pg_postmaster_start_time.

On 03/04/2018 07:02 PM, Tomas Vondra wrote:
> On 02/28/2018 01:11 PM, Anastasia Lubennikova wrote:
>> Attached patch introduces a new function pg_shmem_init_time(),
>> which returns the time shared memory was last (re)initialized.
>> It is created for use by monitoring tools to track backend crashes.
>>
>> Currently, if the 'restart_after_crash' option is on, postgres will
>> just restart. And the only way to know that it happened is to
>> regularly parse logfile or monitor it, catching restart messages.
>> This approach is really inconvenient for users, who have gigabytes of
>> logs.
>>
>> This new function can be periodiacally called by a monitoring agent,
>> and, if /shmem_init_time/ doesn't match /pg_postmaster_start_time,/
>> we know that server crashed-restarted, and also know the exact time,
>> when.
>>
> I don't think it really solves the problem, though. For example if the
> whole VM reboots (which can be a matter of seconds), this check will say
> "shmem_init_time == pg_postmaster_start_time" and you've not detected
> anything.
>
> IMHO pg_postmaster_start_time is the right way to monitor uptime, and
> the right way to detect spurious restarts is to remember the last value
> you've seen and compare it to the current one.

Yes, for the described case with VM restart pg_postmaster_start_time 
works fine.
pg_postmaster_start_time is essential for almost every case and 
pg_shmem_init_time only expand his usefulness, not diminish.

On 03/04/2018 07:09 PM, Tom Lane wrote:
> It evidently depends on how you want to define "server uptime".  If you
> get backend crashes often enough, you might feel a need to define it
> as "time since last crash".  Although I would think that if that's
> happening regularly in a production environment, you have a problem
> you need to fix, not just measure.

Absolutely. And for that you need to know ASAP that backend crash 
happened in the first place.

On 03/04/2018 07:09 PM, Tom Lane wrote:
> My own thought about that is that if you are trying to measure
> backend crashes, just knowing the time of the latest one is little help.
> You want to know how often they're happening.  So this gets back to the
> question of why the postmaster log isn't a useful source of that
> information.

For that purpose pg_shmem_init_time also can be used.
For example we can build a chart based on values from "select (now() - 
pg_shmem_init_time());" taken, for example, every 10 seconds.
Values around 0 =< x =<10 will signal about memory reinitialization 
which is usually a byproduct of a backend crash.

On 03/04/2018 07:09 PM, Tom Lane wrote:
>   I think that if we're to do anything in this area,
> improving the usefulness of the log would be more important than
> providing the proposed function.

The separation of maintenance messages and query messaged into different 
log files is sorely needed.
This way server errors can be identified fast and in convenient way.
But as I mentioned earlier, pg_shmem_init_time() is about online monitoring.


On 03/04/2018 07:02 PM, Tomas Vondra wrote:
> Actually, after looking at the code a bit, I think that test would not
> really work anyway, because those two timestamps come from two separate
> GetCurrentTimestamp calls, so you get this:
>
>     test=# select pg_shmem_init_time(), pg_postmaster_start_time();
>           pg_shmem_init_time       |   pg_postmaster_start_time
> -------------------------------+-------------------------------
>      2018-03-04 17:10:59.017058+01 | 2018-03-04 17:10:59.022071+01
>     (1 row)
>
>     test=# select pg_shmem_init_time() = pg_postmaster_start_time();
>      ?column?
>     ----------
>      f
>     (1 row)
>
> So there would have to be some sort of "fuzz" parameter when comparing
> the values, etc.

The difference measured in milliseconds is expected. After all, shared 
memory is initialized after postmaster start.




-- 
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Function to track shmem reinit time

From
David Steele
Date:
On 3/4/18 11:17 AM, Tomas Vondra wrote:
> 
> Furthermore, the patch is yet another victim of fd1a421fe - fixing the
> pg_proc entries is trivial, but a new version is needed.
> 
> I'd also like to see an example/explanation how this improves this
> situation compared to only having pg_postmaster_start_time.
> 
> So I'm setting this as waiting on author for now.

I'm not sure why this was set back to Needs Review since it neither
applies cleanly nor builds.

I'm setting this entry to Waiting on Author, but based on the discussion
I think it should be Returned with Feedback.

Regards,
-- 
-David
david@pgmasters.net


Re: Function to track shmem reinit time

From
Tomas Vondra
Date:

On 03/28/2018 08:55 PM, David Steele wrote:
> On 3/4/18 11:17 AM, Tomas Vondra wrote:
>>
>> Furthermore, the patch is yet another victim of fd1a421fe - fixing the
>> pg_proc entries is trivial, but a new version is needed.
>>
>> I'd also like to see an example/explanation how this improves this
>> situation compared to only having pg_postmaster_start_time.
>>
>> So I'm setting this as waiting on author for now.
> 
> I'm not sure why this was set back to Needs Review since it neither
> applies cleanly nor builds.
> 

It likely applied/built fine when it was set to NR, but got broken by
some recent commit.

> I'm setting this entry to Waiting on Author, but based on the discussion
> I think it should be Returned with Feedback.
> 

Fine with me.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Function to track shmem reinit time

From
David Steele
Date:
On 3/29/18 9:40 AM, Tomas Vondra wrote:
> On 03/28/2018 08:55 PM, David Steele wrote:
> 
>> I'm setting this entry to Waiting on Author, but based on the discussion
>> I think it should be Returned with Feedback.
>>
> 
> Fine with me.

This entry has been marked Returned with Feedback.

Regards,
-- 
-David
david@pgmasters.net


Re: Function to track shmem reinit time

From
Robert Haas
Date:
On Tue, Apr 10, 2018 at 9:07 AM, David Steele <david@pgmasters.net> wrote:
> On 3/29/18 9:40 AM, Tomas Vondra wrote:
>> On 03/28/2018 08:55 PM, David Steele wrote:
>>> I'm setting this entry to Waiting on Author, but based on the discussion
>>> I think it should be Returned with Feedback.
>>
>> Fine with me.
>
> This entry has been marked Returned with Feedback.

I guess I'm in the minority here, but I don't see why it isn't useful
to have something like this. It's a lot easier for a monitoring
process to poll for this kind of thing than it is to monitor the logs.
Not that log monitoring isn't a good idea, but this is pretty
lightweight.

OTOH, it also seems like this could live as an out-of-core extension,
so I guess if nobody else likes the idea Anastasia could always do it
that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Function to track shmem reinit time

From
Alexander Korotkov
Date:
On Tue, Apr 10, 2018 at 8:58 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Tue, Apr 10, 2018 at 9:07 AM, David Steele <david@pgmasters.net> wrote:
> > On 3/29/18 9:40 AM, Tomas Vondra wrote:
> >> On 03/28/2018 08:55 PM, David Steele wrote:
> >>> I'm setting this entry to Waiting on Author, but based on the discussion
> >>> I think it should be Returned with Feedback.
> >>
> >> Fine with me.
> >
> > This entry has been marked Returned with Feedback.
>
> I guess I'm in the minority here, but I don't see why it isn't useful
> to have something like this. It's a lot easier for a monitoring
> process to poll for this kind of thing than it is to monitor the logs.
> Not that log monitoring isn't a good idea, but this is pretty
> lightweight.

+1,

I think the issue this patch is addressing is that PostgreSQL now
doesn't have true uptime function.  We now recommend using
pg_postmaster_start_time() for uptime calculation, but that isn't it.
When backed crashes, shmen is reinitialized and PostgreSQL is
restarted then uptime should be reset to zero, but
pg_postmaster_start_time() is not about that.

The major argument against this patch I got from the thread is that we
shouldn't introduce new functions exposing information, which could be
already fetched from logs.  However, I think following this logic we
should remove pg_postmaster_start_time() too.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: Function to track shmem reinit time

From
Alexander Korotkov
Date:
On Sun, Dec 23, 2018 at 11:14 PM Alexander Korotkov
<a.korotkov@postgrespro.ru> wrote:
> On Tue, Apr 10, 2018 at 8:58 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Tue, Apr 10, 2018 at 9:07 AM, David Steele <david@pgmasters.net> wrote:
> > > On 3/29/18 9:40 AM, Tomas Vondra wrote:
> > >> On 03/28/2018 08:55 PM, David Steele wrote:
> > >>> I'm setting this entry to Waiting on Author, but based on the discussion
> > >>> I think it should be Returned with Feedback.
> > >>
> > >> Fine with me.
> > >
> > > This entry has been marked Returned with Feedback.
> >
> > I guess I'm in the minority here, but I don't see why it isn't useful
> > to have something like this. It's a lot easier for a monitoring
> > process to poll for this kind of thing than it is to monitor the logs.
> > Not that log monitoring isn't a good idea, but this is pretty
> > lightweight.
>
> +1,
>
> I think the issue this patch is addressing is that PostgreSQL now
> doesn't have true uptime function.  We now recommend using
> pg_postmaster_start_time() for uptime calculation, but that isn't it.
> When backed crashes, shmen is reinitialized and PostgreSQL is
> restarted then uptime should be reset to zero, but
> pg_postmaster_start_time() is not about that.
>
> The major argument against this patch I got from the thread is that we
> shouldn't introduce new functions exposing information, which could be
> already fetched from logs.  However, I think following this logic we
> should remove pg_postmaster_start_time() too.

I've revised the patchset.

0001-remove_dead_shmem_reinit_code-v1.patch
 * -n option is removed from getopt() argument, initdb help and documentation
 * some comments are slightly revised

0002-shmem_init_time_v1.patch
 * detailed function description is added to the documentation

From my point of view criticism of this patch was addressed by
argument, that pg_shmem_init_time() allows to calculate the server
uptime [1].  This is very basic information, which is reasonable to
get without log files parsing.  It's more than year since [1] is
unanswered.  So, I'm going to push this if no objections.

Links
1. https://www.postgresql.org/message-id/CAPpHfds3_CgNgK4nfcOBo2BfPMFS7yrDXDCPKJSqdGrpqoLwoQ%40mail.gmail.com

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Attachment

Re: Function to track shmem reinit time

From
Tom Lane
Date:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> From my point of view criticism of this patch was addressed by
> argument, that pg_shmem_init_time() allows to calculate the server
> uptime [1].  This is very basic information, which is reasonable to
> get without log files parsing.  It's more than year since [1] is
> unanswered.  So, I'm going to push this if no objections.

I'm still going to object to it, on the grounds that

(1) it's exposing an implementation detail that clients should not be
concerned with, and that we might change in future.  The name isn't
even well chosen --- if the argument is that it's useful to monitor
server uptime, why isn't the name "pg_server_uptime"?

(2) if your server is crashing often enough that postmaster start
time isn't an adequate substitute, you have way worse problems than
whether you can measure it.  I'd rather see us put effort into
fixing whatever the underlying problem is.

            regards, tom lane



Re: Function to track shmem reinit time

From
Alexander Korotkov
Date:
On Sat, Feb 22, 2020 at 8:01 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> > From my point of view criticism of this patch was addressed by
> > argument, that pg_shmem_init_time() allows to calculate the server
> > uptime [1].  This is very basic information, which is reasonable to
> > get without log files parsing.  It's more than year since [1] is
> > unanswered.  So, I'm going to push this if no objections.
>
> I'm still going to object to it, on the grounds that

OK!

> (1) it's exposing an implementation detail that clients should not be
> concerned with, and that we might change in future.  The name isn't
> even well chosen --- if the argument is that it's useful to monitor
> server uptime, why isn't the name "pg_server_uptime"?

Choosing a more user-friendly name sounds like good idea for me.
pg_server_uptime() sounds like it should return an interval.  I'd like
this function to calculate a diff between timestamps.  I would rather
delegate it to user side.  What about pg_server_up_since()?

> (2) if your server is crashing often enough that postmaster start
> time isn't an adequate substitute, you have way worse problems than
> whether you can measure it.

Well, it's enough server to crash once per postmaster run to make this
measure absolutely inadequate.  We have different reasons for server
crash, not all of them are exact bugs.  OOM killer can cause a crash,
and it doesn't seem feasible we can exclude this reason completely.

> I'd rather see us put effort into
> fixing whatever the underlying problem is.

This is monitoring problem vs fixing problem tradeoff.  We had similar
for lwlock wait monitoring.  Ideally we should make our code never
stuck on lwlock, but that's not feasible.  So, we got lwlock wait
monitoring for problem diagnosis.  I think now we're discussing
similar issue.  Ideally, postgres should never crash.  But that's too
hard to achieve.  But we can easily get better monitoring on server
crashes and that's useful.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Function to track shmem reinit time

From
Robert Haas
Date:
On Sat, Feb 22, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'm still going to object to it, on the grounds that
>
> (1) it's exposing an implementation detail that clients should not be
> concerned with, and that we might change in future.  The name isn't
> even well chosen --- if the argument is that it's useful to monitor
> server uptime, why isn't the name "pg_server_uptime"?
>
> (2) if your server is crashing often enough that postmaster start
> time isn't an adequate substitute, you have way worse problems than
> whether you can measure it.  I'd rather see us put effort into
> fixing whatever the underlying problem is.

I don't accept the first argument, because I think the chances that
we'll change our basic approach to this problem are indistinguishable
from zero.  A few years back, you criticized some idea of mine as
tantamount to rm -rf src/backend/optimizer, which you were not ready
to endorse. That proposal was surely vastly less ambitious than some
proposal which would remove the idea of shared memory reinitialization
after an unsafe backend exit.

I don't accept the second argument either, because it's a false
dichotomy. Adding this function won't reduce the amount of energy that
gets spent fixing crashes. There are always more crashes.

I realize that several other people voted against this proposal so I
don't think it's OK to commit a patch in this area unless some more
positive votes turn up, but I'm still +1 on the idea. Granted, we
don't want an infinite amount of clutter in the system catalogs, but I
have a hard time believing that this proposed function would get any
less use than the hyperbolic trig functions.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Function to track shmem reinit time

From
David Steele
Date:
On 2/24/20 10:57 PM, Robert Haas wrote:
> On Sat, Feb 22, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm still going to object to it, on the grounds that
>>
>> (1) it's exposing an implementation detail that clients should not be
>> concerned with, and that we might change in future.  The name isn't
>> even well chosen --- if the argument is that it's useful to monitor
>> server uptime, why isn't the name "pg_server_uptime"?
>>
>> (2) if your server is crashing often enough that postmaster start
>> time isn't an adequate substitute, you have way worse problems than
>> whether you can measure it.  I'd rather see us put effort into
>> fixing whatever the underlying problem is.
> 
> I don't accept the first argument, because I think the chances that
> we'll change our basic approach to this problem are indistinguishable
> from zero.  A few years back, you criticized some idea of mine as
> tantamount to rm -rf src/backend/optimizer, which you were not ready
> to endorse. That proposal was surely vastly less ambitious than some
> proposal which would remove the idea of shared memory reinitialization
> after an unsafe backend exit.
> 
> I don't accept the second argument either, because it's a false
> dichotomy. Adding this function won't reduce the amount of energy that
> gets spent fixing crashes. There are always more crashes.
> 
> I realize that several other people voted against this proposal so I
> don't think it's OK to commit a patch in this area unless some more
> positive votes turn up, but I'm still +1 on the idea. Granted, we
> don't want an infinite amount of clutter in the system catalogs, but I
> have a hard time believing that this proposed function would get any
> less use than the hyperbolic trig functions.

Marking this Returned with Feedback again since we are still at an impasse.

Regards,
-- 
-David
david@pgmasters.net



Re: Function to track shmem reinit time

From
Kyotaro Horiguchi
Date:
FWIW, I don't object for adding the feature like this (in other words
+1), since I see the advantages Alex mentioned is valid.  (Even though
most of our customers notices server restart by client
disconnections..)

At Wed, 8 Apr 2020 11:49:11 -0400, David Steele <david@pgmasters.net> wrote in 
> On 2/24/20 10:57 PM, Robert Haas wrote:
> > On Sat, Feb 22, 2020 at 10:31 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm still going to object to it, on the grounds that
> >>
> >> (1) it's exposing an implementation detail that clients should not be
> >> concerned with, and that we might change in future.  The name isn't
> >> even well chosen --- if the argument is that it's useful to monitor
> >> server uptime, why isn't the name "pg_server_uptime"?
> >>
> >> (2) if your server is crashing often enough that postmaster start
> >> time isn't an adequate substitute, you have way worse problems than
> >> whether you can measure it.  I'd rather see us put effort into
> >> fixing whatever the underlying problem is.
> > I don't accept the first argument, because I think the chances that
> > we'll change our basic approach to this problem are indistinguishable
> > from zero.  A few years back, you criticized some idea of mine as
> > tantamount to rm -rf src/backend/optimizer, which you were not ready
> > to endorse. That proposal was surely vastly less ambitious than some
> > proposal which would remove the idea of shared memory reinitialization
> > after an unsafe backend exit.

I'm not sure the name is good, mainly for the reason of
user-friendliness. Alexander's proposal "pg_server_up_since()"
returning absolute time makes sense to me.

> > I don't accept the second argument either, because it's a false
> > dichotomy. Adding this function won't reduce the amount of energy that
> > gets spent fixing crashes. There are always more crashes.
> > I realize that several other people voted against this proposal so I
> > don't think it's OK to commit a patch in this area unless some more
> > positive votes turn up, but I'm still +1 on the idea. Granted, we
> > don't want an infinite amount of clutter in the system catalogs, but I
> > have a hard time believing that this proposed function would get any
> > less use than the hyperbolic trig functions.

I'm on Robert and Alexander side about this.  I don't think
introducing this necessarily means we are obliged to accept all
requests for "last $anything" for other events, like introducing the
hyperbolic functions doesn't mean to accept all new functions alike,
maybe.

> Marking this Returned with Feedback again since we are still at an
> impasse.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center