Thread: BUG #18046: stats collection behaviour change is affecting the usability of information.
BUG #18046: stats collection behaviour change is affecting the usability of information.
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18046 Logged by: Jobin Augustine Email address: jobinau@gmail.com PostgreSQL version: 15.3 Operating system: CentOS/RHEL Description: After stats collection changes in PG 15, The behaviour stats_reset information is changed. which is adversely affecting the usability of data presented in the view. ``` select stats_reset from pg_stat_database; ``` Before PG15, stats_reset was always populated so that the user knows the cumulative values presented are of how many days. But unfortunately, PG 15 keeps it null unless there is an explicit reset (pg_stat_reset). The value goes again null if there is a reset due to a crash. So on a regular system, the User can't understand the cumulative values presented in the view.
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Hamid Akhtar
Date:
On Tue, 1 Aug 2023 at 22:01, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18046
Logged by: Jobin Augustine
Email address: jobinau@gmail.com
PostgreSQL version: 15.3
Operating system: CentOS/RHEL
Description:
After stats collection changes in PG 15, The behaviour stats_reset
information is changed. which is adversely affecting the usability of data
presented in the view.
```
select stats_reset from pg_stat_database;
```
Before PG15, stats_reset was always populated so that the user knows the
cumulative values presented are of how many days.
But unfortunately, PG 15 keeps it null unless there is an explicit reset
(pg_stat_reset). The value goes again null if there is a reset due to a
crash.
So on a regular system, the User can't understand the cumulative values
presented in the view.
IMHO, this is a valid concern. As per the documentation, the "stats_reset" column tracks the last time the stats were reset. There is no mention of this being timestamp for manual reset only.
Internally, the server calls pgstat_reset_after_failure whenever it cannot find the stats file or if the server is recovering from a crash. In case of a crash, the stats may no longer be valid. Therefore, internally the stats are dropped, and a new file is written.
Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that when the database stats are being written to disk and the stats_reset is not set, it adds the current timestamp to it. Since a new file is written at initdb and when the server is recovering from a crash, this works as expected.
Attachment
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Jobin Augustine
Date:
Thank you Hamid for working on this and coming with a fix.
On Fri, Aug 4, 2023 at 4:53 PM Hamid Akhtar <hamid.akhtar@percona.com> wrote:
Thank you for the bug report Jobin.
IMHO, this is a valid concern. As per the documentation, the "stats_reset" column tracks the last time the stats were reset. There is no mention of this being timestamp for manual reset only.
Without this base info, users don't have the option to understand the cumulative statistics in the stats view
Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that when the database stats are being written to disk and the stats_reset is not set, it adds the current timestamp to it. Since a new file is written at initdb and when the server is recovering from a crash, this works as expected.
I can confirm that this patch fixes the problem.
I could find simple steps to reproduce the original problem independently.
Step 1 : Create a new database
CREATE DATABASE db1;
Step 2. Create a table in the database
\c db1
CREATE TABLE t1 (id INT);
Step 3. Check the timestamp of the start of database-level statistics
db1=# SELECT datname,stats_reset FROM pg_stat_database;
Expected behaviour(works in all versions upto and including PostgreSQL 14)
datname | stats_reset -----------+------------------------------- | 2023-08-02 06:41:15.777135+00 postgres | 2023-08-02 06:41:15.777108+00 template1 | template0 | db1 | 2023-08-02 11:02:54.954363+00 (5 rows)
The problem in PostgreSQL 15 and above
datname | stats_reset -----------+------------------------------ | postgres | db1 | template1 | template0 | (5 rows)
Once again, Thank you for the fix.
Jobin Augustine.
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Bruce Momjian
Date:
Andres, can you comment on this thread? I see you had a commit to PG 15 in this area: commit 5cd1c40b3c Author: Andres Freund <andres@anarazel.de> Date: Thu Apr 14 17:40:25 2022 -0700 pgstat: set timestamps of fixed-numbered stats after a crash. When not loading stats at startup (i.e. pgstat_discard_stats() getting called), reset timestamps of fixed numbered stats would be left at 0. Oversight in 5891c7a8ed8. Instead use pgstat_reset_after_failure() and add tests verifying that fixed-numbered reset timestamps are set appropriately. Reported-By: "David G. Johnston" <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/CAKFQuwamFuaQHKdhcMt4Gbw5+Hca2UE741B8gOOXoA=TtAd2Yw@mail.gmail.com Thanks. --------------------------------------------------------------------------- On Fri, Aug 4, 2023 at 10:40:46PM +0530, Jobin Augustine wrote: > Thank you Hamid for working on this and coming with a fix. > > On Fri, Aug 4, 2023 at 4:53 PM Hamid Akhtar <hamid.akhtar@percona.com> wrote: > > > Thank you for the bug report Jobin. > > IMHO, this is a valid concern. As per the documentation, the "stats_reset" > column tracks the last time the stats were reset. There is no mention of > this being timestamp for manual reset only. > > > Without this base info, users don't have the option to understand the > cumulative statistics in the stats view > > > > Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that > when the database stats are being written to disk and the stats_reset is > not set, it adds the current timestamp to it. Since a new file is written > at initdb and when the server is recovering from a crash, this works as > expected. > > > > I can confirm that this patch fixes the problem. > I could find simple steps to reproduce the original problem independently. > > > Step 1 : Create a new database > CREATE DATABASE db1; > > Step 2. Create a table in the database > \c db1 > CREATE TABLE t1 (id INT); > > Step 3. Check the timestamp of the start of database-level statistics > db1=# SELECT datname,stats_reset FROM pg_stat_database; > > Expected behaviour(works in all versions upto and including PostgreSQL 14) > > datname | stats_reset > -----------+------------------------------- > | 2023-08-02 06:41:15.777135+00 > postgres | 2023-08-02 06:41:15.777108+00 > template1 | > template0 | > db1 | 2023-08-02 11:02:54.954363+00 > (5 rows) > > > The problem in PostgreSQL 15 and above > > datname | stats_reset > -----------+------------------------------ > | > postgres | > db1 | > template1 | > template0 | > (5 rows) > > > Once again, Thank you for the fix. > > Jobin Augustine. > > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Bruce Momjian
Date:
(This email has Andres properly in the TO field.) Andres, can you comment on this thread? I see you had a commit to PG 15 in this area: commit 5cd1c40b3c Author: Andres Freund <andres@anarazel.de> Date: Thu Apr 14 17:40:25 2022 -0700 pgstat: set timestamps of fixed-numbered stats after a crash. When not loading stats at startup (i.e. pgstat_discard_stats() getting called), reset timestamps of fixed numbered stats would be left at 0. Oversight in 5891c7a8ed8. Instead use pgstat_reset_after_failure() and add tests verifying that fixed-numbered reset timestamps are set appropriately. Reported-By: "David G. Johnston" <david.g.johnston@gmail.com> Discussion: https://postgr.es/m/CAKFQuwamFuaQHKdhcMt4Gbw5+Hca2UE741B8gOOXoA=TtAd2Yw@mail.gmail.com Thanks. --------------------------------------------------------------------------- On Fri, Aug 4, 2023 at 10:40:46PM +0530, Jobin Augustine wrote: > Thank you Hamid for working on this and coming with a fix. > > On Fri, Aug 4, 2023 at 4:53 PM Hamid Akhtar <hamid.akhtar@percona.com> wrote: > > > Thank you for the bug report Jobin. > > IMHO, this is a valid concern. As per the documentation, the "stats_reset" > column tracks the last time the stats were reset. There is no mention of > this being timestamp for manual reset only. > > > Without this base info, users don't have the option to understand the > cumulative statistics in the stats view > > > > Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that > when the database stats are being written to disk and the stats_reset is > not set, it adds the current timestamp to it. Since a new file is written > at initdb and when the server is recovering from a crash, this works as > expected. > > > > I can confirm that this patch fixes the problem. > I could find simple steps to reproduce the original problem independently. > > > Step 1 : Create a new database > CREATE DATABASE db1; > > Step 2. Create a table in the database > \c db1 > CREATE TABLE t1 (id INT); > > Step 3. Check the timestamp of the start of database-level statistics > db1=# SELECT datname,stats_reset FROM pg_stat_database; > > Expected behaviour(works in all versions upto and including PostgreSQL 14) > > datname | stats_reset > -----------+------------------------------- > | 2023-08-02 06:41:15.777135+00 > postgres | 2023-08-02 06:41:15.777108+00 > template1 | > template0 | > db1 | 2023-08-02 11:02:54.954363+00 > (5 rows) > > > The problem in PostgreSQL 15 and above > > datname | stats_reset > -----------+------------------------------ > | > postgres | > db1 | > template1 | > template0 | > (5 rows) > > > Once again, Thank you for the fix. > > Jobin Augustine. > > -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Kyotaro Horiguchi
Date:
At Wed, 6 Sep 2023 12:57:01 -0400, Bruce Momjian <bruce@momjian.us> wrote in > > Expected behaviour(works in all versions upto and including PostgreSQL 14) > > > > datname | stats_reset > > -----------+------------------------------- > > | 2023-08-02 06:41:15.777135+00 > > postgres | 2023-08-02 06:41:15.777108+00 > > template1 | > > template0 | > > db1 | 2023-08-02 11:02:54.954363+00 > > (5 rows) > > > > > > The problem in PostgreSQL 15 and above > > > > datname | stats_reset > > -----------+------------------------------ > > | > > postgres | > > db1 | > > template1 | > > template0 | > > (5 rows) I agree it's not ideal to reset timestamps only for fixed-numbered stats. In 13, the timestamp is set to the creation time. There's no practical issue with doing it that way for now, I think it's generallynot preferable. We could use the transaction start time as a sufficient approximation for the timestamp in fast paths. It might not align with with the reset time set by pg_stat_reset(), but that shouldn't be a problem. pgstat_restore_stats runs outside of a transaction, but GetCurrentTimestamp() is already being used there. The reason for defining a new function is to improve robustness, although it might not be necessary. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Kyotaro Horiguchi
Date:
At Thu, 07 Sep 2023 14:12:32 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in > We could use the transaction start time as a sufficient approximation > for the timestamp in fast paths. It might not align with with the > reset time set by pg_stat_reset(), but that shouldn't be a > problem. pgstat_restore_stats runs outside of a transaction, but To be accurate, it's not "outside of a transaction", but "before any transaction starts". regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Andres Freund
Date:
Hi, On 2023-09-06 13:06:35 -0400, Bruce Momjian wrote: > (This email has Andres properly in the TO field.) > > Andres, can you comment on this thread? I see you had a commit to PG > 15 in this area: I'll try to look at it soon. I'm only now catching up on email after being on vacation. Regards, Andres
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Andres Freund
Date:
Hi, On 2023-09-06 13:06:35 -0400, Bruce Momjian wrote: > (This email has Andres properly in the TO field.) > > Andres, can you comment on this thread? I see you had a commit to PG > 15 in this area: > > commit 5cd1c40b3c > Author: Andres Freund <andres@anarazel.de> > Date: Thu Apr 14 17:40:25 2022 -0700 > > pgstat: set timestamps of fixed-numbered stats after a crash. > > When not loading stats at startup (i.e. pgstat_discard_stats() getting > called), reset timestamps of fixed numbered stats would be left at > 0. Oversight in 5891c7a8ed8. > > Instead use pgstat_reset_after_failure() and add tests verifying that > fixed-numbered reset timestamps are set appropriately. > > Reported-By: "David G. Johnston" <david.g.johnston@gmail.com> > Discussion: https://postgr.es/m/CAKFQuwamFuaQHKdhcMt4Gbw5+Hca2UE741B8gOOXoA=TtAd2Yw@mail.gmail.com > > Thanks. I don't think that commit is relevant, but I still am to blame :) > On Fri, Aug 4, 2023 at 10:40:46PM +0530, Jobin Augustine wrote: > > Thank you Hamid for working on this and coming with a fix. > > Attach is a fix for PG16 and PG15 that resolves this issue. It ensures that > > when the database stats are being written to disk and the stats_reset is > > not set, it adds the current timestamp to it. Since a new file is written > > at initdb and when the server is recovering from a crash, this works as > > expected. I don't think that's the right approach. This way we'll end up with a stats reset time that can be *after* other stats have already accumulated. Which doesn't seem great. I think Horiguchi-san's approach is closer to what we would want. Unfortunately, the reset time for other variable-numbered stats entries historically has behaved differently than what you're desiring. E.g. pg_stat_replication_slots.stats_reset isn't set unless you have reset the stats. I'm ok with changing the semantics to one behaviour, but I'm a bit hesitant whacking this around further in a minor release... Greetings, Andres Freund
Re: BUG #18046: stats collection behaviour change is affecting the usability of information.
From
Jobin Augustine
Date:
I think Horiguchi-san's approach is closer to what we would want.
Great.
Unfortunately, the reset time for other variable-numbered stats entries
historically has behaved differently than what you're
desiring. E.g. pg_stat_replication_slots.stats_reset isn't set unless you have
reset the stats.
Conceptually, that's correct. But the Impact on users are different.
SQL statements around pg_stat_database are something every user (DBA) has in their arsenal for essential checks and troubleshooting.
The impact is big when their queries stopped working/giving wrong results from PostgeSQL 15 onwards.
I'm ok with changing the semantics to one behaviour,
Great. I believe everyone will appreciate a consistent behaviour
but I'm a bit hesitant
whacking this around further in a minor release...
Since this sudden undocumented behaviour change caught everyone by surprise,
and case this is a too big change for a "bug-fix", is it possible to have a temporary fix for a minor version, fixing pg_stat_database alone?
Greetings,
Andres Freund
Regards,
Jobin Augustine