Thread: BUG #18046: stats collection behaviour change is affecting the usability of information.

BUG #18046: stats collection behaviour change is affecting the usability of information.

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18046
Logged by:          Jobin Augustine
Email address:      jobinau@gmail.com
PostgreSQL version: 15.3
Operating system:   CentOS/RHEL
Description:

After stats collection changes in PG 15, The behaviour stats_reset
information is changed. which is adversely affecting the usability of data
presented in the view.
```
select stats_reset from pg_stat_database;
```
Before PG15, stats_reset was always populated so that the user knows the
cumulative values presented are of how many days.
But unfortunately, PG 15 keeps it null unless there is an explicit reset
(pg_stat_reset). The value goes again null if there is a reset due to a
crash. 
So on a regular system, the User can't understand the cumulative values
presented in the view.




On Tue, 1 Aug 2023 at 22:01, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18046
Logged by:          Jobin Augustine
Email address:      jobinau@gmail.com
PostgreSQL version: 15.3
Operating system:   CentOS/RHEL
Description:       

After stats collection changes in PG 15, The behaviour stats_reset
information is changed. which is adversely affecting the usability of data
presented in the view.
```
select stats_reset from pg_stat_database;
```
Before PG15, stats_reset was always populated so that the user knows the
cumulative values presented are of how many days.
But unfortunately, PG 15 keeps it null unless there is an explicit reset
(pg_stat_reset). The value goes again null if there is a reset due to a
crash.
So on a regular system, the User can't understand the cumulative values
presented in the view.


Thank you for the bug report Jobin.

IMHO, this is a valid concern. As per the documentation, the "stats_reset" column tracks the last time the stats were reset. There is no mention of this being timestamp for manual reset only.

Internally, the server calls pgstat_reset_after_failure whenever it cannot find the stats file or if the server is recovering from a crash. In case of a crash, the stats may no longer be valid. Therefore, internally the stats are dropped, and a new file is written.

Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that when the database stats are being written to disk and the stats_reset is not set, it adds the current timestamp to it. Since a new file is written at initdb and when the server is recovering from a crash, this works as expected.
 
Attachment
Thank you Hamid for working on this and coming with a fix.

On Fri, Aug 4, 2023 at 4:53 PM Hamid Akhtar <hamid.akhtar@percona.com> wrote:

Thank you for the bug report Jobin.

IMHO, this is a valid concern. As per the documentation, the "stats_reset" column tracks the last time the stats were reset. There is no mention of this being timestamp for manual reset only.

Without this base info, users don't have the option to understand the cumulative statistics in the stats view
 

Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that when the database stats are being written to disk and the stats_reset is not set, it adds the current timestamp to it. Since a new file is written at initdb and when the server is recovering from a crash, this works as expected.
 

I can confirm that this patch fixes the problem.
I could find simple steps to reproduce the original problem independently.


Step 1 : Create a new database
CREATE DATABASE db1;

Step 2. Create a table in the database
\c db1
CREATE TABLE t1 (id INT);

Step 3. Check the timestamp of the start of database-level statistics
db1=# SELECT datname,stats_reset FROM pg_stat_database;

Expected behaviour(works in all versions upto and including PostgreSQL 14)
  datname  |          stats_reset          
-----------+-------------------------------
           | 2023-08-02 06:41:15.777135+00
 postgres  | 2023-08-02 06:41:15.777108+00
 template1 | 
 template0 | 
 db1       | 2023-08-02 11:02:54.954363+00
(5 rows)

The problem in PostgreSQL 15 and above
  datname  |         stats_reset          
-----------+------------------------------
           | 
 postgres  | 
 db1       | 
 template1 | 
 template0 | 
(5 rows)

Once again, Thank you for the fix.
Jobin Augustine.

Andres, can you comment on this thread?  I see you had a commit to PG
15 in this area:

    commit 5cd1c40b3c
    Author: Andres Freund <andres@anarazel.de>
    Date:   Thu Apr 14 17:40:25 2022 -0700
    
        pgstat: set timestamps of fixed-numbered stats after a crash.
    
        When not loading stats at startup (i.e. pgstat_discard_stats() getting
        called), reset timestamps of fixed numbered stats would be left at
        0. Oversight in 5891c7a8ed8.
    
        Instead use pgstat_reset_after_failure() and add tests verifying that
        fixed-numbered reset timestamps are set appropriately.
    
        Reported-By: "David G. Johnston" <david.g.johnston@gmail.com>
        Discussion: https://postgr.es/m/CAKFQuwamFuaQHKdhcMt4Gbw5+Hca2UE741B8gOOXoA=TtAd2Yw@mail.gmail.com

Thanks.

---------------------------------------------------------------------------

On Fri, Aug  4, 2023 at 10:40:46PM +0530, Jobin Augustine wrote:
> Thank you Hamid for working on this and coming with a fix.
> 
> On Fri, Aug 4, 2023 at 4:53 PM Hamid Akhtar <hamid.akhtar@percona.com> wrote:
> 
> 
>     Thank you for the bug report Jobin.
> 
>     IMHO, this is a valid concern. As per the documentation, the "stats_reset"
>     column tracks the last time the stats were reset. There is no mention of
>     this being timestamp for manual reset only.
> 
> 
> Without this base info, users don't have the option to understand the
> cumulative statistics in the stats view
>  
> 
> 
>     Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that
>     when the database stats are being written to disk and the stats_reset is
>     not set, it adds the current timestamp to it. Since a new file is written
>     at initdb and when the server is recovering from a crash, this works as
>     expected.
>      
> 
> 
> I can confirm that this patch fixes the problem.
> I could find simple steps to reproduce the original problem independently.
> 
> 
> Step 1 : Create a new database
> CREATE DATABASE db1;
> 
> Step 2. Create a table in the database
> \c db1
> CREATE TABLE t1 (id INT);
> 
> Step 3. Check the timestamp of the start of database-level statistics
> db1=# SELECT datname,stats_reset FROM pg_stat_database;
> 
> Expected behaviour(works in all versions upto and including PostgreSQL 14)
> 
>   datname  |          stats_reset          
> -----------+-------------------------------
>            | 2023-08-02 06:41:15.777135+00
>  postgres  | 2023-08-02 06:41:15.777108+00
>  template1 | 
>  template0 | 
>  db1       | 2023-08-02 11:02:54.954363+00
> (5 rows)
> 
> 
> The problem in PostgreSQL 15 and above
> 
>   datname  |         stats_reset          
> -----------+------------------------------
>            | 
>  postgres  | 
>  db1       | 
>  template1 | 
>  template0 | 
> (5 rows)
> 
> 
> Once again, Thank you for the fix.
> 
> Jobin Augustine.
> 
> 

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



(This email has Andres properly in the TO field.)

Andres, can you comment on this thread?  I see you had a commit to PG
15 in this area:

    commit 5cd1c40b3c
    Author: Andres Freund <andres@anarazel.de>
    Date:   Thu Apr 14 17:40:25 2022 -0700
    
        pgstat: set timestamps of fixed-numbered stats after a crash.
    
        When not loading stats at startup (i.e. pgstat_discard_stats() getting
        called), reset timestamps of fixed numbered stats would be left at
        0. Oversight in 5891c7a8ed8.
    
        Instead use pgstat_reset_after_failure() and add tests verifying that
        fixed-numbered reset timestamps are set appropriately.
    
        Reported-By: "David G. Johnston" <david.g.johnston@gmail.com>
        Discussion: https://postgr.es/m/CAKFQuwamFuaQHKdhcMt4Gbw5+Hca2UE741B8gOOXoA=TtAd2Yw@mail.gmail.com

Thanks.

---------------------------------------------------------------------------

On Fri, Aug  4, 2023 at 10:40:46PM +0530, Jobin Augustine wrote:
> Thank you Hamid for working on this and coming with a fix.
> 
> On Fri, Aug 4, 2023 at 4:53 PM Hamid Akhtar <hamid.akhtar@percona.com> wrote:
> 
> 
>     Thank you for the bug report Jobin.
> 
>     IMHO, this is a valid concern. As per the documentation, the "stats_reset"
>     column tracks the last time the stats were reset. There is no mention of
>     this being timestamp for manual reset only.
> 
> 
> Without this base info, users don't have the option to understand the
> cumulative statistics in the stats view
>  
> 
> 
>     Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that
>     when the database stats are being written to disk and the stats_reset is
>     not set, it adds the current timestamp to it. Since a new file is written
>     at initdb and when the server is recovering from a crash, this works as
>     expected.
>      
> 
> 
> I can confirm that this patch fixes the problem.
> I could find simple steps to reproduce the original problem independently.
> 
> 
> Step 1 : Create a new database
> CREATE DATABASE db1;
> 
> Step 2. Create a table in the database
> \c db1
> CREATE TABLE t1 (id INT);
> 
> Step 3. Check the timestamp of the start of database-level statistics
> db1=# SELECT datname,stats_reset FROM pg_stat_database;
> 
> Expected behaviour(works in all versions upto and including PostgreSQL 14)
> 
>   datname  |          stats_reset          
> -----------+-------------------------------
>            | 2023-08-02 06:41:15.777135+00
>  postgres  | 2023-08-02 06:41:15.777108+00
>  template1 | 
>  template0 | 
>  db1       | 2023-08-02 11:02:54.954363+00
> (5 rows)
> 
> 
> The problem in PostgreSQL 15 and above
> 
>   datname  |         stats_reset          
> -----------+------------------------------
>            | 
>  postgres  | 
>  db1       | 
>  template1 | 
>  template0 | 
> (5 rows)
> 
> 
> Once again, Thank you for the fix.
> 
> Jobin Augustine.
> 
> 

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Only you can decide what is important to you.



At Wed, 6 Sep 2023 12:57:01 -0400, Bruce Momjian <bruce@momjian.us> wrote in
> > Expected behaviour(works in all versions upto and including PostgreSQL 14)
> >
> >   datname  |          stats_reset          
> > -----------+-------------------------------
> >            | 2023-08-02 06:41:15.777135+00
> >  postgres  | 2023-08-02 06:41:15.777108+00
> >  template1 | 
> >  template0 | 
> >  db1       | 2023-08-02 11:02:54.954363+00
> > (5 rows)
> >
> >
> > The problem in PostgreSQL 15 and above
> >
> >   datname  |         stats_reset          
> > -----------+------------------------------
> >            | 
> >  postgres  | 
> >  db1       | 
> >  template1 | 
> >  template0 | 
> > (5 rows)

I agree it's not ideal to reset timestamps only for fixed-numbered
stats. In 13, the timestamp is set to the creation time. There's no
practical issue with doing it that way for now, I think it's
generallynot preferable.

We could use the transaction start time as a sufficient approximation
for the timestamp in fast paths. It might not align with with the
reset time set by pg_stat_reset(), but that shouldn't be a
problem. pgstat_restore_stats runs outside of a transaction, but
GetCurrentTimestamp() is already being used there. The reason for
defining a new function is to improve robustness, although it might
not be necessary.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment
At Thu, 07 Sep 2023 14:12:32 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in 
> We could use the transaction start time as a sufficient approximation
> for the timestamp in fast paths. It might not align with with the
> reset time set by pg_stat_reset(), but that shouldn't be a
> problem. pgstat_restore_stats runs outside of a transaction, but

To be accurate, it's not "outside of a transaction", but "before any
transaction starts".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Hi,

On 2023-09-06 13:06:35 -0400, Bruce Momjian wrote:
> (This email has Andres properly in the TO field.)
> 
> Andres, can you comment on this thread?  I see you had a commit to PG
> 15 in this area:

I'll try to look at it soon. I'm only now catching up on email after being on
vacation.

Regards,

Andres



Hi,

On 2023-09-06 13:06:35 -0400, Bruce Momjian wrote:
> (This email has Andres properly in the TO field.)
> 
> Andres, can you comment on this thread?  I see you had a commit to PG
> 15 in this area:
> 
>     commit 5cd1c40b3c
>     Author: Andres Freund <andres@anarazel.de>
>     Date:   Thu Apr 14 17:40:25 2022 -0700
>     
>         pgstat: set timestamps of fixed-numbered stats after a crash.
>     
>         When not loading stats at startup (i.e. pgstat_discard_stats() getting
>         called), reset timestamps of fixed numbered stats would be left at
>         0. Oversight in 5891c7a8ed8.
>     
>         Instead use pgstat_reset_after_failure() and add tests verifying that
>         fixed-numbered reset timestamps are set appropriately.
>     
>         Reported-By: "David G. Johnston" <david.g.johnston@gmail.com>
>         Discussion: https://postgr.es/m/CAKFQuwamFuaQHKdhcMt4Gbw5+Hca2UE741B8gOOXoA=TtAd2Yw@mail.gmail.com
> 
> Thanks.

I don't think that commit is relevant, but I still am to blame :)


> On Fri, Aug  4, 2023 at 10:40:46PM +0530, Jobin Augustine wrote:
> > Thank you Hamid for working on this and coming with a fix.
> >     Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that
> >     when the database stats are being written to disk and the stats_reset is
> >     not set, it adds the current timestamp to it. Since a new file is written
> >     at initdb and when the server is recovering from a crash, this works as
> >     expected.

I don't think that's the right approach. This way we'll end up with a stats
reset time that can be *after* other stats have already accumulated. Which
doesn't seem great.

I think Horiguchi-san's approach is closer to what we would want.


Unfortunately, the reset time for other variable-numbered stats entries
historically has behaved differently than what you're
desiring. E.g. pg_stat_replication_slots.stats_reset isn't set unless you have
reset the stats.

I'm ok with changing the semantics to one behaviour, but I'm a bit hesitant
whacking this around further in a minor release...

Greetings,

Andres Freund







I think Horiguchi-san's approach is closer to what we would want.

Great. 

Unfortunately, the reset time for other variable-numbered stats entries
historically has behaved differently than what you're
desiring. E.g. pg_stat_replication_slots.stats_reset isn't set unless you have
reset the stats.

Conceptually, that's correct. But the Impact on users are different.
SQL statements around pg_stat_database are something every user (DBA) has in their arsenal for essential checks and troubleshooting.
The impact is big when their queries stopped working/giving wrong results from PostgeSQL 15 onwards.
 
I'm ok with changing the semantics to one behaviour,
Great. I believe everyone will appreciate a consistent behaviour
 
but I'm a bit hesitant
whacking this around further in a minor release...


Since this sudden undocumented behaviour change caught everyone by surprise,
and case this is a too big change for a "bug-fix",  is it possible to have a temporary fix for a minor version, fixing pg_stat_database alone?
 
Greetings,
Andres Freund

Regards,
Jobin Augustine