Thread: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why?

Hello,

I have noticed recently that our databases (ie most but not all) have their stats reset on daily basis:

select count(case when stats_reset > (now() - interval '1 day') then 1 else null end) db_reset_cnt, count(*) db_total_cnt from pg_stat_database;
 db_reset_cnt | db_total_cnt
--------------+--------------
           77 |          157

Furthermore, the archiver stats also get reset: 

select stats_reset from pg_stat_archiver;
          stats_reset
-------------------------------
 2024-11-21 13:52:35.864855+00

Some facts:
- PostgreSQL v13.16 on Linux Ubuntu 20.04
- There's no evidence in the logs that any pg_stat_reset* functions have been executed
            name            | setting
----------------------------+---------
 log_min_duration_statement | 0
 log_statement              | all
- we don't have any user procs with a pg_stat_reset call 
- the timestamps of the reset across databases are sometimes close to each other, but not always
 
Anyone has any insight on what could be doing this daily stats reset, or perhaps how to find the root cause?

Regards,
Steeve 

On 11/21/24 09:29, Steeve Boulanger wrote:
> Hello,
> 
> I have noticed recently that our databases (ie most but not all) have 
> their stats reset on daily basis:
> 
> select count(case when stats_reset > (now() - interval '1 day') then 1 
> else null end) db_reset_cnt, count(*) db_total_cnt from pg_stat_database;
>   db_reset_cnt | db_total_cnt
> --------------+--------------
>             77 |          157
> 
> Furthermore, the archiver stats also get reset:
> 
> select stats_reset from pg_stat_archiver;
>            stats_reset
> -------------------------------
>   2024-11-21 13:52:35.864855+00
> 
> Some facts:
> - PostgreSQL v13.16 on Linux Ubuntu 20.04
> - There's no evidence in the logs that any pg_stat_reset* functions have 
> been executed
>              name            | setting
> ----------------------------+---------
>   log_min_duration_statement | 0
>   log_statement              | all
> - we don't have any user procs with a pg_stat_reset call
> - the timestamps of the reset across databases are sometimes close to 
> each other, but not always
> Anyone has any insight on what could be doing this daily stats reset, or 
> perhaps how to find the root cause?

https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-SETUP

"Cumulative statistics are collected in shared memory. Every PostgreSQL 
process collects statistics locally, then updates the shared data at 
appropriate intervals. When a server, including a physical replica, 
shuts down cleanly, a permanent copy of the statistics data is stored in 
the pg_stat subdirectory, so that statistics can be retained across 
server restarts. In contrast, when starting from an unclean shutdown 
(e.g., after an immediate shutdown, a server crash, starting from a base 
backup, and point-in-time recovery), all statistics counters are reset."


> 
> Regards,
> Steeve
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 11/21/24 12:34, Steeve Boulanger wrote:

Please reply to list also.

Ccing list.


> Thanks Adrian for taking the time to respond. I will review the 
> documentation once more, just in case I missed anything.
> 
> My apologies - I forgot to mention in my original post, that our last 
> cluster shutdown was over 15 days ago, thus an "unclean" 
> shutdown would not explain these daily stats reset that we are seeing. 
> It might also be relevant to mentioned

Then something is resetting the statistics.

1) What is log_min_error_statement set to?

2) Did you reload the server when changing?:

log_min_duration_statement | 0
log_statement              | all

> 
> -Steeve
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com





> Please reply to list also.

My apologies - I thought I did a "Reply all", but apparently not. I'm a little bit of a noob with email distrib lists. 

> 1) What is log_min_error_statement set to?

          name           | setting | pending_restart
-------------------------+---------+-----------------
 log_min_error_statement | error   | f

> 2) Did you reload the server when changing?:

yes - pg_reload_conf()

-Steeve

On Thu, Nov 21, 2024 at 2:49 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 12:34, Steeve Boulanger wrote:

Please reply to list also.

Ccing list.


> Thanks Adrian for taking the time to respond. I will review the
> documentation once more, just in case I missed anything.
>
> My apologies - I forgot to mention in my original post, that our last
> cluster shutdown was over 15 days ago, thus an "unclean"
> shutdown would not explain these daily stats reset that we are seeing.
> It might also be relevant to mentioned

Then something is resetting the statistics.

1) What is log_min_error_statement set to?

2) Did you reload the server when changing?:

log_min_duration_statement | 0
log_statement              | all

>
> -Steeve
>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

On 11/21/24 12:57, Steeve Boulanger wrote:
> 
>  > Please reply to list also.
> 
> My apologies - I thought I did a "Reply all", but apparently not. I'm a 
> little bit of a noob with email distrib lists.
> 
>  > 1) What is log_min_error_statement set to?
> 
>            name           | setting | pending_restart
> -------------------------+---------+-----------------
>   log_min_error_statement | error   | f
> 
>  > 2) Did you reload the server when changing?:
> 
> yes - pg_reload_conf()

All I can think to do is look at the logs  around the stats_reset times 
for the databases and see if there is anything relevant.

> 
> -Steeve


-- 
Adrian Klaver
adrian.klaver@aklaver.com




On Thu, Nov 21, 2024 at 4:12 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 12:57, Steeve Boulanger wrote:
>
>  > Please reply to list also.
>
> My apologies - I thought I did a "Reply all", but apparently not. I'm a
> little bit of a noob with email distrib lists.
>
>  > 1) What is log_min_error_statement set to?
>
>            name           | setting | pending_restart
> -------------------------+---------+-----------------
>   log_min_error_statement | error   | f
>
>  > 2) Did you reload the server when changing?:
>
> yes - pg_reload_conf()

All I can think to do is look at the logs  around the stats_reset times
for the databases and see if there is anything relevant.
 
Also, "SELECT pg_postmaster_start_time();" to show the real Postgresql start time, instead of what you think it is.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
>  All I can think to do is look at the logs  around the stats_reset times
> for the databases and see if there is anything relevant.

That was already done, but nothing relevant was found unfortunately. 

-Steeve

On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 12:57, Steeve Boulanger wrote:
>
>  > Please reply to list also.
>
> My apologies - I thought I did a "Reply all", but apparently not. I'm a
> little bit of a noob with email distrib lists.
>
>  > 1) What is log_min_error_statement set to?
>
>            name           | setting | pending_restart
> -------------------------+---------+-----------------
>   log_min_error_statement | error   | f
>
>  > 2) Did you reload the server when changing?:
>
> yes - pg_reload_conf()

All I can think to do is look at the logs  around the stats_reset times
for the databases and see if there is anything relevant.

>
> -Steeve


--
Adrian Klaver
adrian.klaver@aklaver.com

Hello Ron,

> Also, "SELECT pg_postmaster_start_time();" to show the real Postgresql start time, instead of what you think it is.

Yes you're right - I should have been more explicit. 

[local]:5432 postgres@postgres=# select current_timestamp - pg_postmaster_start_time() as uptime;
         uptime
-------------------------
 15 days 10:03:44.048383

-Steeve

On Thu, Nov 21, 2024 at 3:30 PM Ron Johnson <ronljohnsonjr@gmail.com> wrote:
On Thu, Nov 21, 2024 at 4:12 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 12:57, Steeve Boulanger wrote:
>
>  > Please reply to list also.
>
> My apologies - I thought I did a "Reply all", but apparently not. I'm a
> little bit of a noob with email distrib lists.
>
>  > 1) What is log_min_error_statement set to?
>
>            name           | setting | pending_restart
> -------------------------+---------+-----------------
>   log_min_error_statement | error   | f
>
>  > 2) Did you reload the server when changing?:
>
> yes - pg_reload_conf()

All I can think to do is look at the logs  around the stats_reset times
for the databases and see if there is anything relevant.
 
Also, "SELECT pg_postmaster_start_time();" to show the real Postgresql start time, instead of what you think it is.

--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
On 11/21/24 13:31, Steeve Boulanger wrote:
>  > All I can think to do is look at the logs  around the stats_reset times
>  > for the databases and see if there is anything relevant.
> 
> That was already done, but nothing relevant was found unfortunately.

Unless it was not recognized as relevant. Since for the time being I am 
eliminating magic as the cause, something concrete is causing this and 
it should be leaving a trace. In your post you had this affecting 77 out 
of 157 databases in the cluster.

1) Do the 77 share some trait the other 80 don't.

2) Do the OS system logs reveal anything?

3) What was happening in the databases just prior to the time the stats 
reset?

4) Do you have external tools accessing these databases?

5) Is the cluster directly open to the world?

> 
> -Steeve
> 
> On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com 
> <mailto:adrian.klaver@aklaver.com>> wrote:
> 
>     On 11/21/24 12:57, Steeve Boulanger wrote:
>      >
>      >  > Please reply to list also.
>      >
>      > My apologies - I thought I did a "Reply all", but apparently not.
>     I'm a
>      > little bit of a noob with email distrib lists.
>      >
>      >  > 1) What is log_min_error_statement set to?
>      >
>      >            name           | setting | pending_restart
>      > -------------------------+---------+-----------------
>      >   log_min_error_statement | error   | f
>      >
>      >  > 2) Did you reload the server when changing?:
>      >
>      > yes - pg_reload_conf()
> 
>     All I can think to do is look at the logs  around the stats_reset times
>     for the databases and see if there is anything relevant.
> 
>      >
>      > -Steeve
> 
> 
>     -- 
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




>  1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

> 2) Do the OS system logs reveal anything?

Nothing found in syslog 

> 3) What was happening in the databases just prior to the time the stats
reset?

Here's an example (log extracts) for a stats reset occurrence: 

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3  limit 1;


    datname     |          stats_reset          |   since_reset
----------------+-------------------------------+-----------------
 MyDB           | 2024-11-21 13:48:34.332785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1] client=[host1] app=[[unknown]]LOG:  connection authorized: user=user1 database=MyDB applicatio
n_name=app1 <..>

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1] client=[host1] app=[app1]LOG:  duration: 1.071 ms  parse <unnamed>: SELECT <..>
<--LOGS-->

As you can see from above, the stats for MyDB were reset at  ".332" . The only logs before/after for the db was the connection (at .324), and then the parse (at .336).  NB: I also checked the logs at ".333" in case there would have been a rounding up, but nothing relevant was found. With that said, I only verified one occurence - tomorrow I'll check a few more just to validate. 


> 4) Do you have external tools accessing these databases?

We have internal micro-services accessing the databases, as well as a monitoring tool (Netdata), and some of the Devs use pgAdmin. I discarded the scenario where someone would inadvertently do a "pg_stat_reset" via pgAdmin, just because a lot of databases have their stats reset within a short period of time. 

On the other hand, Netdata does connect to most (if not all) databases frequently by its nature - so as a test, I stopped the Netdata service today to see if tomorrow we're still seeing the stats reset or not. I can report back tomorrow on this. 

> 5) Is the cluster directly open to the world?

No. It's an on-premise installation. Only local applications can connect to it. 


-Steeve

On Thu, Nov 21, 2024 at 4:32 PM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 13:31, Steeve Boulanger wrote:
>  > All I can think to do is look at the logs  around the stats_reset times
>  > for the databases and see if there is anything relevant.
>
> That was already done, but nothing relevant was found unfortunately.

Unless it was not recognized as relevant. Since for the time being I am
eliminating magic as the cause, something concrete is causing this and
it should be leaving a trace. In your post you had this affecting 77 out
of 157 databases in the cluster.

1) Do the 77 share some trait the other 80 don't.

2) Do the OS system logs reveal anything?

3) What was happening in the databases just prior to the time the stats
reset?

4) Do you have external tools accessing these databases?

5) Is the cluster directly open to the world?

>
> -Steeve
>
> On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.klaver@aklaver.com
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>     On 11/21/24 12:57, Steeve Boulanger wrote:
>      >
>      >  > Please reply to list also.
>      >
>      > My apologies - I thought I did a "Reply all", but apparently not.
>     I'm a
>      > little bit of a noob with email distrib lists.
>      >
>      >  > 1) What is log_min_error_statement set to?
>      >
>      >            name           | setting | pending_restart
>      > -------------------------+---------+-----------------
>      >   log_min_error_statement | error   | f
>      >
>      >  > 2) Did you reload the server when changing?:
>      >
>      > yes - pg_reload_conf()
>
>     All I can think to do is look at the logs  around the stats_reset times
>     for the databases and see if there is anything relevant.
>
>      >
>      > -Steeve
>
>
>     --
>     Adrian Klaver
>     adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>
>

--
Adrian Klaver
adrian.klaver@aklaver.com

On 11/21/24 15:50, Steeve Boulanger wrote:
>  > 1) Do the 77 share some trait the other 80 don't.
> 
> No pattern found yet .. but still verifying a few things
> 
>  > 2) Do the OS system logs reveal anything?
> 
> Nothing found in syslog
> 
>  > 3) What was happening in the databases just prior to the time the stats
> reset?
> 
> Here's an example (log extracts) for a stats reset occurrence:
> 
> select datname, stats_reset, now()-stats_reset as since_reset
> from pg_stat_database
> where ( now()-stats_reset ) < interval '1 day'
> order by 3  limit 1;
> 
>      datname     |          stats_reset          |   since_reset
> ----------------+-------------------------------+-----------------
> MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304
> 
> <--LOGS-->
> 2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1] 
> client=[host1] app=[[unknown]]LOG: connection authorized: user=user1 
> database=MyDB applicatio
> n_name=app1 <..>

What is the [2] referring to?

> 
> <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set 
the log timestamp. Whereas the stats_reset value is the timestamp when 
the stats system actually did the reset.

> 
> 2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1] 
> client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>: 
> SELECT <..>

The above is some garden variety select?



-- 
Adrian Klaver
adrian.klaver@aklaver.com




What is the [2] referring to?

Number of the log line for each session or process, starting at 1

My guess is the difference in time it takes to log the action and set
> the log timestamp. Whereas the stats_reset value is the timestamp when
> the stats system actually did the reset.

Very plausible. I thought the same too.

The above is some garden variety select?

Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select from an in-house application"  .. and yes it is.  

-Steeve

On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 15:50, Steeve Boulanger wrote:
>  > 1) Do the 77 share some trait the other 80 don't.
>
> No pattern found yet .. but still verifying a few things
>
>  > 2) Do the OS system logs reveal anything?
>
> Nothing found in syslog
>
>  > 3) What was happening in the databases just prior to the time the stats
> reset?
>
> Here's an example (log extracts) for a stats reset occurrence:
>
> select datname, stats_reset, now()-stats_reset as since_reset
> from pg_stat_database
> where ( now()-stats_reset ) < interval '1 day'
> order by 3  limit 1;
>
>      datname     |          stats_reset          |   since_reset
> ----------------+-------------------------------+-----------------
> MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304
>
> <--LOGS-->
> 2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
> client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
> database=MyDB applicatio
> n_name=app1 <..>

What is the [2] referring to?

>
> <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

>
> 2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
> client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>:
> SELECT <..>

The above is some garden variety select?



--
Adrian Klaver
adrian.klaver@aklaver.com

On 23/11/2024 13:06, Steeve Boulanger wrote:

The above is some garden variety select?

Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select from an in-house application"  .. and yes it is. 

Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

Ray.




-Steeve

On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 15:50, Steeve Boulanger wrote:
>  > 1) Do the 77 share some trait the other 80 don't.
>
> No pattern found yet .. but still verifying a few things
>
>  > 2) Do the OS system logs reveal anything?
>
> Nothing found in syslog
>
>  > 3) What was happening in the databases just prior to the time the stats
> reset?
>
> Here's an example (log extracts) for a stats reset occurrence:
>
> select datname, stats_reset, now()-stats_reset as since_reset
> from pg_stat_database
> where ( now()-stats_reset ) < interval '1 day'
> order by 3  limit 1;
>
>      datname     |          stats_reset          |   since_reset
> ----------------+-------------------------------+-----------------
> MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304
>
> <--LOGS-->
> 2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
> client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
> database=MyDB applicatio
> n_name=app1 <..>

What is the [2] referring to?

>
> <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

My guess is the difference in time it takes to log the action and set
the log timestamp. Whereas the stats_reset value is the timestamp when
the stats system actually did the reset.

>
> 2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
> client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>:
> SELECT <..>

The above is some garden variety select?



--
Adrian Klaver
adrian.klaver@aklaver.com



-- 
Raymond O'Donnell // Galway // Ireland
ray@rodonnell.ie
> Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)

I'm afraid that my Irish dialect is limited to "sláinte" only ;-)  In
any case, thanks for taking the time to help with this issue. I'm
still investigating, but I think that calling the "ghostbusters" is
moving up the list now lol.


On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
>
> On 23/11/2024 13:06, Steeve Boulanger wrote:
>
>
> > The above is some garden variety select?
>
> Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select
froman in-house application"  .. and yes it is. 
>
>
> Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
>
> Ray.
>
>
>
>
> -Steeve
>
> On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>
>> On 11/21/24 15:50, Steeve Boulanger wrote:
>> >  > 1) Do the 77 share some trait the other 80 don't.
>> >
>> > No pattern found yet .. but still verifying a few things
>> >
>> >  > 2) Do the OS system logs reveal anything?
>> >
>> > Nothing found in syslog
>> >
>> >  > 3) What was happening in the databases just prior to the time the stats
>> > reset?
>> >
>> > Here's an example (log extracts) for a stats reset occurrence:
>> >
>> > select datname, stats_reset, now()-stats_reset as since_reset
>> > from pg_stat_database
>> > where ( now()-stats_reset ) < interval '1 day'
>> > order by 3  limit 1;
>> >
>> >      datname     |          stats_reset          |   since_reset
>> > ----------------+-------------------------------+-----------------
>> > MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304
>> >
>> > <--LOGS-->
>> > 2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
>> > client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
>> > database=MyDB applicatio
>> > n_name=app1 <..>
>>
>> What is the [2] referring to?
>>
>> >
>> > <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
>>
>> My guess is the difference in time it takes to log the action and set
>> the log timestamp. Whereas the stats_reset value is the timestamp when
>> the stats system actually did the reset.
>>
>> >
>> > 2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
>> > client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>:
>> > SELECT <..>
>>
>> The above is some garden variety select?
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.klaver@aklaver.com
>>
>
>
> --
> Raymond O'Donnell // Galway // Ireland
> ray@rodonnell.ie



On 11/23/24 05:16, Steeve Boulanger wrote:
>> Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
> 
> I'm afraid that my Irish dialect is limited to "sláinte" only ;-)  In
> any case, thanks for taking the time to help with this issue. I'm
> still investigating, but I think that calling the "ghostbusters" is
> moving up the list now lol.

I'm guessing shutting down Netdata did not stop the resets?

You might try setting log_min_messages to info to see if that catches 
anything.


> 
> 
> On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
>>
>> On 23/11/2024 13:06, Steeve Boulanger wrote:
>>
>>


-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 11/23/24 05:16, Steeve Boulanger wrote:
>> Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
> 
> I'm afraid that my Irish dialect is limited to "sláinte" only ;-)  In
> any case, thanks for taking the time to help with this issue. I'm
> still investigating, but I think that calling the "ghostbusters" is
> moving up the list now lol.

One possible scenario:


log_min_messages = info
log_min_error_statement = info
log_statement = 'all'


psql -d test -U postgres -p 5432

CREATE OR REPLACE FUNCTION public.admin_func()
  RETURNS void
  LANGUAGE plpgsql
AS $function$
BEGIN
     PERFORM pg_stat_reset();
     RAISE NOTICE 'Reset statistics';
END;
$function$


  select datname, stats_reset from pg_stat_database;
    datname    |          stats_reset
--------------+-------------------------------
  NULL         | NULL
  postgres     | NULL
  template1    | NULL
  template0    | NULL
  test         | 2024-11-23 09:21:49.421552-08
  task_manager | NULL
  test_psql    | NULL
  production   | NULL
  pp_archive   | NULL
  farm_db      | NULL

select admin_func();
NOTICE:  Reset statistics
  admin_func
------------

(1 row)

select datname, stats_reset from pg_stat_database;
    datname    |          stats_reset
--------------+-------------------------------
  NULL         | NULL
  postgres     | NULL
  template1    | NULL
  template0    | NULL
  test         | 2024-11-23 09:26:30.749257-08
  task_manager | NULL
  test_psql    | NULL
  production   | NULL
  pp_archive   | NULL
  farm_db      | NULL


2024-11-23 09:26:30.749 PST [14501] postgres@test LOG:  statement: 
select admin_func();
2024-11-23 09:26:30.749 PST [14501] postgres@test NOTICE:  Reset statistics
2024-11-23 09:26:30.749 PST [14501] postgres@test CONTEXT:  PL/pgSQL 
function admin_func() line 4 at RAISE
2024-11-23 09:26:30.749 PST [14501] postgres@test STATEMENT:  select 
admin_func();

The issue being that the pg_stat_reset() is buried in a function and 
does not show up on its own. The RAISE NOTICE alerts in my logs just so 
I could find the function easily. It could be there is a function or 
functions in your setup doing something similar.

> 
> 
> On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
>>
>> On 23/11/2024 13:06, Steeve Boulanger wrote:
>>
>>
>>> The above is some garden variety select?
>>
>> Not 100% sure what the expression "garden variety select" means lol, but I'll take a guess that it means an "select
froman in-house application"  .. and yes it is.
 
>>
>>
>> Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
>>
>> Ray.
>>
>>
>>
>>
>> -Steeve
>>
>> On Fri, Nov 22, 2024 at 11:18 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>>>
>>> On 11/21/24 15:50, Steeve Boulanger wrote:
>>>>   > 1) Do the 77 share some trait the other 80 don't.
>>>>
>>>> No pattern found yet .. but still verifying a few things
>>>>
>>>>   > 2) Do the OS system logs reveal anything?
>>>>
>>>> Nothing found in syslog
>>>>
>>>>   > 3) What was happening in the databases just prior to the time the stats
>>>> reset?
>>>>
>>>> Here's an example (log extracts) for a stats reset occurrence:
>>>>
>>>> select datname, stats_reset, now()-stats_reset as since_reset
>>>> from pg_stat_database
>>>> where ( now()-stats_reset ) < interval '1 day'
>>>> order by 3  limit 1;
>>>>
>>>>       datname     |          stats_reset          |   since_reset
>>>> ----------------+-------------------------------+-----------------
>>>> MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304
>>>>
>>>> <--LOGS-->
>>>> 2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
>>>> client=[host1] app=[[unknown]]LOG: connection authorized: user=user1
>>>> database=MyDB applicatio
>>>> n_name=app1 <..>
>>>
>>> What is the [2] referring to?
>>>
>>>>
>>>> <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>
>>>
>>> My guess is the difference in time it takes to log the action and set
>>> the log timestamp. Whereas the stats_reset value is the timestamp when
>>> the stats system actually did the reset.
>>>
>>>>
>>>> 2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
>>>> client=[host1] app=[app1]LOG:  duration: 1.071 ms parse <unnamed>:
>>>> SELECT <..>
>>>
>>> The above is some garden variety select?
>>>
>>>
>>>
>>> --
>>> Adrian Klaver
>>> adrian.klaver@aklaver.com
>>>
>>
>>
>> --
>> Raymond O'Donnell // Galway // Ireland
>> ray@rodonnell.ie

-- 
Adrian Klaver
adrian.klaver@aklaver.com




As a superuser, rename pg_stat_reset inside one of the commonly affected databases:

alter function pg_stat_reset rename to hey_stop_running_pg_stat_reset_already;

Then see who starts complaining. Additionally, your server log will get helpful entries like this:

ERROR:  function pg_stat_reset() does not exist

Cheers,
Greg

Apologies for the late reply.

> I'm guessing shutting down Netdata did not stop the resets?

There was a hiccup yesterday, so I had to redo the test and wait until today to see if there were any stats resets ... and NONE today!!

    reset_status    | cnt
--------------------+-----
 reset before today | 100
 no reset           |  57
 reset today        |   0

This is quite interesting, and a little unexpected... but in order to confirm this theory, I have done the following just now: 

1) Restart Netdata service
2) rename the "pg_stat_reset()" function , as you suggested in another post  - very clever btw! :-)  

Now let's wait until tomorrow to see if we have any stats reset and/or some relevant errors in the PG logs! 

Quick note: I did do a quick check in any user functions for any "pg_stat_reset" calls , but didn't find any. With that said, we have 150+ databases in our cluster, so perhaps I missed something. I also did a grep on the netdata directories, but nothing as well. 

-Steeve


On Sat, Nov 23, 2024 at 10:19 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/23/24 05:16, Steeve Boulanger wrote:
>> Here (Ireland) we sometimes say "common-or-garden variety".... It means a normal, everyday variety. :-)
>
> I'm afraid that my Irish dialect is limited to "sláinte" only ;-)  In
> any case, thanks for taking the time to help with this issue. I'm
> still investigating, but I think that calling the "ghostbusters" is
> moving up the list now lol.

I'm guessing shutting down Netdata did not stop the resets?

You might try setting log_min_messages to info to see if that catches
anything.


>
>
> On Sat, Nov 23, 2024 at 7:09 AM Ray O'Donnell <ray@rodonnell.ie> wrote:
>>
>> On 23/11/2024 13:06, Steeve Boulanger wrote:
>>
>>


--
Adrian Klaver
adrian.klaver@aklaver.com

On 11/23/24 16:20, Steeve Boulanger wrote:
> Apologies for the late reply.
> 
>  > I'm guessing shutting down Netdata did not stop the resets?
> 
> There was a hiccup yesterday, so I had to redo the test and wait until 
> today to see if there were any stats resets ... and NONE today!!
> 
>      reset_status    | cnt
> --------------------+-----
>   reset before today | 100
>   no reset           |  57
>   reset today        |   0

Getting closer.

> 
> This is quite interesting, and a little unexpected... but in order to 
> confirm this theory, I have done the following just now:
> 
> 1) Restart Netdata service
> 2) rename the "pg_stat_reset()" function , as you suggested in another 
> post  - very clever btw! :-)

Not sure who 'you' refers to, but just to be clear that was Greg Sabino 
Mullane.

> 
> Now let's wait until tomorrow to see if we have any stats reset and/or 
> some relevant errors in the PG logs!
> 
> Quick note: I did do a quick check in any user functions for any 
> "pg_stat_reset" calls , but didn't find any. With that said, we have 
> 150+ databases in our cluster, so perhaps I missed something. I also did 
> a grep on the netdata directories, but nothing as well.
> 
> -Steeve
> 
> 


-- 
Adrian Klaver
adrian.klaver@aklaver.com




>  As a superuser, rename pg_stat_reset inside one of the commonly affected databases:
> alter function pg_stat_reset rename to hey_stop_running_pg_stat_reset_already;
> Then see who starts complaining. Additionally, your server log will get helpful entries like this:

Thanks for the tip Greg - that's very clever! I've done this just now, and will check back tomorrow to see if there's any corresponding error message. 

NB:  I hadn't originally realized that the system functions (ie pg_*) were created for each database, so I had renamed only the one in the postgres database, thinking it was global (ie I'll blame this on my Oracle days ;-)). 

Now I renamed the function under one database that had been verified as getting its stats reset on a daily basis. Hopefully tomorrow we'll get more interesting info! 

-Steeve

On Sat, Nov 23, 2024 at 11:50 AM Greg Sabino Mullane <htamfids@gmail.com> wrote:
As a superuser, rename pg_stat_reset inside one of the commonly affected databases:

alter function pg_stat_reset rename to hey_stop_running_pg_stat_reset_already;

Then see who starts complaining. Additionally, your server log will get helpful entries like this:

ERROR:  function pg_stat_reset() does not exist

Cheers,
Greg