Thread: n_ins_since_vacuum stats for aborted transactions
Hi, I came across what appears to be incorrect behavior in the pg_stat_all_tables.n_ins_since_vacuum counter after a rollback. As shown below, the first two inserts of 1,000 rows were rolled back, yet they are still counted toward n_ins_since_vacuum.Consequently, they influence the vacuum insert threshold calculation—even though such rolled-back rows are dead tuples and should only affect the vacuum threshold calculation. Notice that the n_mod_since_analyze actually does the correct thing here and does not take into account the rolledback inserts. Rollbacks are not common, so this may go unnoticed, but I think it should be corrected, which means that only committed inserts should count towards n_ins_since_vacuum. the n_tup_ins|del|upd should continue to track both committed and rolledback rows, but I also think the documentation [0] for these fields could be improved to clarify this point, i.e. n_tup_ins should be documented as "Total number of rows inserted, including those from aborted transactions" instead of just "Total number of rows inserted" ``` DROP TABLE t; CREATE TABLE t (id INT); ALTER TABLE t SET (autovacuum_enabled = OFF); BEGIN; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; ROLLBACK; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; SELECT n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, n_mod_since_analyze, n_ins_since_vacuum FROM pg_stat_all_tables WHERE relname = 't'; n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup | n_mod_since_analyze | n_ins_since_vacuum -----------+-----------+-----------+------------+------------+---------------------+-------------------- 5000 | 0 | 0 | 3000 | 2000 | 3000 | 5000 (1 row) ``` Thoughts? before I prepare patches for this. [0] https://www.postgresql.org/docs/current/monitoring-stats.html -- Sami Imseih Amazon Web Services (AWS)
On Wed, Apr 9, 2025, at 1:57 PM, Sami Imseih wrote:
I came across what appears to be incorrect behavior in thepg_stat_all_tables.n_ins_since_vacuumcounter after a rollback.
This is *not* an oversight. It is by design. See commit b07642dbcd8d. The
documentation says
Estimated number of rows inserted since this table was last vacuumed
Those rows were actually inserted. They are physically in the data files. And
that's what matters for the autovacuum algorithm.
> This is *not* an oversight. It is by design. See commit b07642dbcd8d. The > documentation says I don't see in the commit message why inserts in an aborted transaction must count towards n_ins_since_vacuum. > Estimated number of rows inserted since this table was last vacuumed > > Those rows were actually inserted. They are physically in the data files. And > that's what matters for the autovacuum algorithm. Correct,but they are dead tuples that are physically in the files, and are accounted for through n_dead_tup and are then cleaned up based on the autovacuum_vacuum_scale_factor|threshold calculation. The purpose of b07642dbcd8d is to trigger autovacuum for append only/mainly workloads that don't generate dead tuples. -- Sami Imseih Amazon Web Services (AWS)
On Wed, Apr 9, 2025 at 11:32 AM Sami Imseih <samimseih@gmail.com> wrote:
The purpose of b07642dbcd8d is to trigger autovacuum for append only/mainly
workloads that don't generate dead tuples.
Why does counting or not counting the dead tuples matter? Forget original purpose, is there presently a bug or not? Between the two options the one where we count dead tuples makes more sense on its face.
David J.
> Forget original purpose, is there presently a bug or not? Yes, there is a bug. Accounting rows inserted as part of an aborted transaction in n_ins_since_vacuum is not correct, since the same rows are being accounted for with n_dead_tup. > Between the two options the one where we count dead tuples makes more sense on its face. IIUC, I am saying the same thing, if an inserted row is rolled back, it should only count as a dead tuple (n_dead_tup) only, and not in n_ins_since_vacuum. -- Sami Imseih Amazon Web Services (AWS)
Yes, there is a bug. Accounting rows inserted as part of an aborted
transaction in
n_ins_since_vacuum is not correct, since the same rows are being
accounted for with n_dead_tup.
If I create a table with autovacuum_enabled=false, insert rows (some of which abort), and check the stats, surely the n_ins_tup and the n_ins_since_vacuum should be the same, because all the insertions (however we count them) have happened since the nonexistent last vacuum:
CREATE TABLE n_insert_test (
i INTEGER NOT NULL PRIMARY KEY
) WITH (autovacuum_enabled = false);
INSERT INTO n_insert_test (i) VALUES (1);
INSERT INTO n_insert_test
(SELECT 1 FROM generate_series(1,100000))
ON CONFLICT
DO NOTHING;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
1 | 0 | 1 | 1
(1 row)
INSERT INTO n_insert_test
(SELECT 2 FROM generate_series(1,100000))
ON CONFLICT
DO NOTHING;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
2 | 0 | 2 | 2
(1 row)
BEGIN;
INSERT INTO n_insert_test
(SELECT * FROM generate_series(3,100000));
ROLLBACK;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
2 | 99998 | 100000 | 100000
(1 row)
i INTEGER NOT NULL PRIMARY KEY
) WITH (autovacuum_enabled = false);
INSERT INTO n_insert_test (i) VALUES (1);
INSERT INTO n_insert_test
(SELECT 1 FROM generate_series(1,100000))
ON CONFLICT
DO NOTHING;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
1 | 0 | 1 | 1
(1 row)
INSERT INTO n_insert_test
(SELECT 2 FROM generate_series(1,100000))
ON CONFLICT
DO NOTHING;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
2 | 0 | 2 | 2
(1 row)
BEGIN;
INSERT INTO n_insert_test
(SELECT * FROM generate_series(3,100000));
ROLLBACK;
SELECT pg_sleep(1);
pg_sleep
----------
(1 row)
SELECT n_live_tup, n_dead_tup, n_tup_ins, n_ins_since_vacuum
FROM pg_stat_all_tables
WHERE relname = 'n_insert_test';
n_live_tup | n_dead_tup | n_tup_ins | n_ins_since_vacuum
------------+------------+-----------+--------------------
2 | 99998 | 100000 | 100000
(1 row)
If we went with your suggestion, I think the final n_ins_since_vacuum column would be 2. Do you think the n_tup_ins should also be 2? Should those two columns differ? If so, why?
On Wed, Apr 9, 2025, 11:57 Sami Imseih <samimseih@gmail.com> wrote:
> Forget original purpose, is there presently a bug or not?
Yes, there is a bug. Accounting rows inserted as part of an aborted
transaction in
n_ins_since_vacuum is not correct, since the same rows are being
accounted for with n_dead_tup.
So why is it important we not account for the aborted insert in both n_ins_since_vacuum and n_dead_tup?
When would you ever add them together so that an actual double-counting would reflect in some total.
You aren't upset that n_live_tup and this both include the non-aborted inserts.
David J.
> If we went with your suggestion, I think the final n_ins_since_vacuum column would be 2. Do you think the n_tup_ins shouldalso be 2? n_ins_since_vacuum should be 2 and n_tup_ins should be 100000. A user tracks how many inserts they performed with n_tup_ins to measure load/activity on the database. It's important to also include aborted transactions in this metric, n_ins_since_vacuum however is not used to measure database activity, but is used to drive autovacuum decisions. So, it has a different purpose. > Should those two columns differ? If so, why? They will differ because n_tup_ins keeps increasing, while n_ins_since_vacuum is reset after a vacuum. The issue I see is that n_ins_since_vacuum should only reflect the number of newly inserted rows that are eligible for freezing, as described in pgstat_report_vacuum [0] [0] https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_relation.c#L238-L247 -- Sami Imseih Amazon Web Services (AWS)
> So why is it important we not account for the aborted insert in both n_ins_since_vacuum and n_dead_tup? I am saying n_dead_tup should continue to account for n_dead_tup. I am not saying it should not. What I am saying is n_ins_since_vacuum should not account for aborted inserts. > When would you ever add them together so that an actual double-counting would reflect in some total. I would never add them together. n_ins_since_vacuum uses this value for vacuuming purposes. > You aren't upset that n_live_tup and this both include the non-aborted inserts. n_live_tup only shows the non-aborted inserts. I will put a better formatted version of the repro below. IN the exampleI have 2k dead tuples from the rolled back transactions and 3k live tuples from the committed transactions. ``` DROP TABLE t; CREATE TABLE t (id INT); ALTER TABLE t SET (autovacuum_enabled = OFF); BEGIN; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; ROLLBACK; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; INSERT INTO t SELECT n FROM generate_series(1, 1000) AS n; \x SELECT n_tup_ins, n_tup_upd, n_tup_del, n_live_tup, n_dead_tup, n_mod_since_analyze, n_ins_since_vacuum FROM pg_stat_all_tables WHERE relname = 't'; -[ RECORD 1 ]-------+----- n_tup_ins | 5000 n_tup_upd | 0 n_tup_del | 0 n_live_tup | 3000 n_dead_tup | 2000 n_mod_since_analyze | 3000 n_ins_since_vacuum | 5000 ``` -- Sami Imseih Amazon Web Services (AWS)
The issue I see is that n_ins_since_vacuum should only
reflect the number of newly inserted rows that are eligible for
freezing, as described
in pgstat_report_vacuum [0]
[0] https://github.com/postgres/postgres/blob/master/src/backend/utils/activity/pgstat_relation.c#L238-L247
For the archives, that paragraph reads:
* It is quite possible that a non-aggressive VACUUM ended up skipping
* various pages, however, we'll zero the insert counter here regardless.
* It's currently used only to track when we need to perform an "insert"
* autovacuum, which are mainly intended to freeze newly inserted tuples.
* Zeroing this may just mean we'll not try to vacuum the table again
* until enough tuples have been inserted to trigger another insert
* autovacuum. An anti-wraparound autovacuum will catch any persistent
* stragglers.
*/
What work do you believe the word "mainly" does in that paragraph? The presence of the word "mainly" rather than "only" somewhat cuts against your argument that we should only be counting tuples that get inserted without aborting.
On Wed, Apr 9, 2025, 12:56 Sami Imseih <samimseih@gmail.com> wrote:
What I am saying is n_ins_since_vacuum should not account for aborted inserts.
It does and from what I can see it should. You need to explain why it should not. More importantly, convincingly enough to change five year old behavior.
You should get away from showing counts and talk about system behavior. Or, if this is accounting related, explain the math.
David J.
On Wed, Apr 9, 2025, 12:39 Sami Imseih <samimseih@gmail.com> wrote:
They will differ because n_tup_ins keeps increasing, while n_ins_since_vacuum is
reset after a vacuum. The issue I see is that n_ins_since_vacuum should only
reflect the number of newly inserted rows that are eligible for
freezing, as described
in pgstat_report_vacuum [0]
Vacuuming them into oblivion is a form of freezing. It also removes the aging xid from the table.
David J.
>> What I am saying is n_ins_since_vacuum should not account for aborted inserts. > > It does and from what I can see it should. You need to explain why it should not. More importantly, convincingly enoughto change five year old behavior. n_ins_since_vacuum was introduced to trigger autovacuum based on the # of inserts committed, and does not care about about dead tuples in this formula. If I have a transaction that rolledback an insert of a million rows, I expect autovacuum to kick in based on the fact there are now 1 million n_dead_tup. n_ins_since_vacuumm is not relevant to the formula for this case. In other words, the reason n_ins_since_vacuum was introduced is to freeze (committed) rows, so it should not need to track dead rows to do what it intends to do. -- Sami Imseih
On Wed, Apr 9, 2025 at 1:30 PM Sami Imseih <samimseih@gmail.com> wrote:
>> What I am saying is n_ins_since_vacuum should not account for aborted inserts.
>
> It does and from what I can see it should. You need to explain why it should not. More importantly, convincingly enough to change five year old behavior.
n_ins_since_vacuum was introduced to trigger autovacuum based on the #
of inserts
committed, and does not care about about dead tuples in this formula.
If I have a transaction that rolledback an insert of a million rows,
I expect autovacuum to kick in based on the fact there are now 1 million
n_dead_tup. n_ins_since_vacuumm is not relevant to the formula
for this case.
In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.
Well, then the authors failed to implement what they intended. Which seems unlikely, but even accepting that to be true, you still haven't shown that the current behavior is undesirable. Just unexpected. Which means we haven't documented this well enough so people reading the docs/code get an expectation that matches reality.
So, please, go ahead with some documentation/code comment changes. But, for my part, the existing behavior seems quite reasonable: "yes, please, get rid of my bloat on this otherwise infrequently updated table"; thus leave the existing behavior and the tracking alone.
David J.
In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.
Wouldn't that result in the rather strange behavior that 1 million dead rows might trigger a vacuum due to one threshold, 1 million inserted live rows might trigger a vacuum due to another threshold, while half a million dead plus half a million live fails to meet either threshold and fails to trigger a vacuum? What is the use case for that behavior? Perhaps you have one, but until you make it explicit, it is hard for others to get behind your proposal.
On Wednesday, April 9, 2025, Sami Imseih <samimseih@gmail.com> wrote:
In other words, the reason n_ins_since_vacuum was introduced is to freeze
(committed) rows, so it should not need to track dead rows to do what it intends
to do.
n_ins_since_vacuum was introduced to indicate how many tuples a vacuum would touch on an insert-only table should vacuum be run now. Autovacuum uses this value when determining whether a given relation should be vacuumed.
David J.
On Wed, Apr 9, 2025 at 4:23 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: >> >> In other words, the reason n_ins_since_vacuum was introduced is to freeze >> (committed) rows, so it should not need to track dead rows to do what it intends >> to do. > > > Wouldn't that result in the rather strange behavior that 1 million dead rows might trigger a vacuum due to one threshold, > 1 million inserted live rows might trigger a vacuum due to another threshold, > while half a million dead plus half a million live fails to meet either threshold and fails to trigger a vacuum? Vacuum works based on different thresholds already, right? A user is able to configure different thresholds: autovacuum_vacuum_scale_factor|threshold autovacuum_vacuum_insert_scale_factor|threshold > What is the use case for that behavior? Perhaps you have one, but until you make it explicit, it is hard for others toget behind your proposal. The point is to ensure that the pg_stats fields that autovacuum uses are supplied the correct values for the different thresholds they need to calculate, as described here [0] [0] https://www.postgresql.org/message-id/CAA5RZ0uDyGW1omWqWkxyW8NB1qzsKmXhnoMtzTBeRzSd4DMatQ%40mail.gmail.com -- Sami Imseih
Hi, On 2025-04-09 17:05:39 -0500, Sami Imseih wrote: > On Wed, Apr 9, 2025 at 4:23 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > >> > >> In other words, the reason n_ins_since_vacuum was introduced is to freeze > >> (committed) rows, so it should not need to track dead rows to do what it intends > >> to do. > > > > > > Wouldn't that result in the rather strange behavior that 1 million dead rows might trigger a vacuum due to one threshold, > > 1 million inserted live rows might trigger a vacuum due to another threshold, > > while half a million dead plus half a million live fails to meet either threshold and fails to trigger a vacuum? > > Vacuum works based on different thresholds already, right? A user is > able to configure different thresholds: > autovacuum_vacuum_scale_factor|threshold > autovacuum_vacuum_insert_scale_factor|threshold > > > What is the use case for that behavior? Perhaps you have one, but until you make it explicit, it is hard for othersto get behind your proposal. > > The point is to ensure that the pg_stats fields that autovacuum uses > are supplied the correct values > for the different thresholds they need to calculate, as described here [0] You so far have not outlined a single scenario where the current behaviour causes an issue. Greetings, Andres Freund
>> >> What I am saying is n_ins_since_vacuum should not account for aborted inserts. >> > >> > It does and from what I can see it should. You need to explain why it should not. More importantly, convincingly enoughto change five year old behavior. >> >> n_ins_since_vacuum was introduced to trigger autovacuum based on the # >> of inserts >> committed, and does not care about about dead tuples in this formula. >> >> If I have a transaction that rolledback an insert of a million rows, >> I expect autovacuum to kick in based on the fact there are now 1 million >> n_dead_tup. n_ins_since_vacuumm is not relevant to the formula >> for this case. >> >> In other words, the reason n_ins_since_vacuum was introduced is to freeze >> (committed) rows, so it should not need to track dead rows to do what it intends >> to do. >> > > Well, then the authors failed to implement what they intended. Which seems unlikely, but even accepting that to be true, At least, the reasoning behind this is neither explained in code comments or in public docs. > you still haven't shown that the current behavior is undesirable. Just unexpected. It's not expected for sure. As far as being undesirable, I don't think it is because rollbacks are not that common and this is not preventing or delaying autovacuum. I do think it's probably a good idea to add code comment that it is OK to include dead tuples from a aborted insert into the n_ins_since_vacuum counter, for the above reasons. I will also update the public documentation for the counters to mention that they include rows from aborted transactions. -- Sami Imseih
On Wednesday, April 9, 2025, Sami Imseih <samimseih@gmail.com> wrote:
> What is the use case for that behavior? Perhaps you have one, but until you make it explicit, it is hard for others to get behind your proposal.
The point is to ensure that the pg_stats fields that autovacuum uses
are supplied the correct values
for the different thresholds they need to calculate, as described here [0]
[0] https://www.postgresql.org/message-id/ CAA5RZ0uDyGW1omWqWkxyW8NB1qzsK mXhnoMtzTBeRzSd4DMatQ%40mail. gmail.com
Except there isn’t some singular provably correct value here. Today’s behavior (considering dead tuples) is not intrinsically wrong nor correct, and neither is what you propose (ignoring the dead tuples). The fact that those dead tuples get removed regardless is a point in favor of counting them when deciding what to do. And it’s also the long-standing behavior. You need to make a compelling argument to change to your preference.
Inserting aborted dead tuples moves the counter closer to both autovacuum thresholds. There is no reason why that should be prohibited. I can see the argument for why one threshold should be dead tuples only and the other live tuples only - but I don’t favor that design.
David J.
> Except there isn’t some singular provably correct value here. Today’s behavior (considering dead tuples) > is not intrinsically wrong nor correct, and neither is what you propose (ignoring the dead tuples). > The fact that those dead tuples get removed regardless is a point in favor of counting them when deciding what to do. > And it’s also the long-standing behavior. You need to make a compelling argument to change to your preference. > > Inserting aborted dead tuples moves the counter closer to both autovacuum thresholds. > There is no reason why that should be prohibited. I can see the argument for why one > threshold should be dead tuples only and the other live tuples only - but I don’t favor that design. Fair enough, and I think I got my answers from this thread. This design decision was not discussed in the thread for b07642dbcd8. So, I think public documentation updates to clarify that n_tup_upd|del|ins, and n_ins_since_vacuum include aborted rows is a good enhancement. Also a code comment in pgstat_relation_flush_cb that explains ins_since_vacuum intentionally includes aborted inserts. tabentry->ins_since_vacuum += lstats->counts.tuples_inserted; -- Sami Imseih
On Thu, 10 Apr 2025 at 10:54, Sami Imseih <samimseih@gmail.com> wrote: > Fair enough, and I think I got my answers from this thread. This > design decision was not > discussed in the thread for b07642dbcd8. This discussion is mostly settled down now, but... I also went through that thread to see if it was mentioned and saw nothing about it. One possible bad behaviour that this could cause is if autovacuum_vacuum_insert_scale_factor was set lower than autovacuum_vacuum_scale_factor is that we could end up performing a vacuum for inserts when all we've got is dead tuples from aborted inserts. That does not seem terrible. It's just an extra autovacuum that could happen in a not very common case. We could fix it but it would require adding a new field to PgStat_TableCounts to track this as you can't selectively only update PgStat_TableCounts.tuples_inserted on commit as the n_tup_ins would then be wrong. If the above is the only misbehaviour this is causing, then I'm doubting that it's worth expanding PgStat_TableCounts by 8 bytes to have this work better. > So, I think public documentation updates to clarify that > n_tup_upd|del|ins, and n_ins_since_vacuum include > aborted rows is a good enhancement. A code comment might help settle future debates. If you'd arrived from a user perspective and were confused about this, then maybe that would warrant something going into the docs. On the other hand, if you have a suggestion, please put it into patch form. I've attached a small patch which adds a code comment about this, which might save a future discussion. David
Attachment
> One possible bad behaviour that this could cause is if > autovacuum_vacuum_insert_scale_factor was set lower than > autovacuum_vacuum_scale_factor is that we could end up performing a > vacuum for inserts when all we've got is dead tuples from aborted > inserts. Thanks for pointing this out! > We could fix it but it > would require adding a new field to PgStat_TableCounts to track this > as you can't selectively only update > PgStat_TableCounts.tuples_inserted on commit as the n_tup_ins would > then be wrong. > > If the above is the only misbehaviour this is causing, then I'm > doubting that it's worth expanding PgStat_TableCounts by 8 bytes to > have this work better. I agree, as I mentioned at the top of the thread, rollbacks are not common enough for this to be worth it. But, the code comment as you have it is good enough. > > So, I think public documentation updates to clarify that > > n_tup_upd|del|ins, and n_ins_since_vacuum include > > aborted rows is a good enhancement. I also attached public doc clarification of how aborted ( and committed ) rows are handled for pg_stat_all_tables fields that count row changes I initially thought about adding clarification for every field, but that felt too repetitive. So, I add a Note section after pg_stat_all_tables in the public docs to describe the behavior. It may be better to apply your code comment patch and the public docs patch separately. -- Sami Imseih Amazon Web Services (AWS)
Attachment
I created an entry for the July CF https://commitfest.postgresql.org/patch/5691/ ... and I realized I forgot to include David's code comment patch yesterday, Reattaching both patches. -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Fri, 11 Apr 2025 at 02:01, Sami Imseih <samimseih@gmail.com> wrote: > I created an entry for the July CF > https://commitfest.postgresql.org/patch/5691/ > > ... and I realized I forgot to include David's code comment patch yesterday, > Reattaching both patches. I've pushed the code comment patch. For the docs patch, the fact you're having to name specific fields in the note at the bottom makes me think the details should be added to the row for the field in question instead. David
> On Fri, 11 Apr 2025 at 02:01, Sami Imseih <samimseih@gmail.com> wrote: > > I created an entry for the July CF > > https://commitfest.postgresql.org/patch/5691/ > > > > ... and I realized I forgot to include David's code comment patch yesterday, > > Reattaching both patches. > > I've pushed the code comment patch. > > For the docs patch, the fact you're having to name specific fields in > the note at the bottom makes me think the details should be added to > the row for the field in question instead. Here is how per line will look like. Either way is fine with me. with v2, I also did add the clarification in the pg_stat_database.tup_inserted| updated|deleted fields as well. -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Thu, Apr 10, 2025 at 5:38 PM Sami Imseih <samimseih@gmail.com> wrote:
> On Fri, 11 Apr 2025 at 02:01, Sami Imseih <samimseih@gmail.com> wrote:
> > I created an entry for the July CF
> > https://commitfest.postgresql.org/patch/5691/
> >
> > ... and I realized I forgot to include David's code comment patch yesterday,
> > Reattaching both patches.
>
> I've pushed the code comment patch.
>
> For the docs patch, the fact you're having to name specific fields in
> the note at the bottom makes me think the details should be added to
> the row for the field in question instead.
Here is how per line will look like. Either way is fine with me.
with v2, I also did add the clarification in the pg_stat_database.tup_inserted|
updated|deleted fields as well.
I'm learning toward a hybrid. At the top:
"The tuple counters below, except where noted, are incremented even if the transaction aborts."
(We can leave it unwritten that if the description says live or dead tuples that qualifies as otherwise noted.)
The only exception I see that we need to note is: n_mod_since_analyze; which I would explain as opposed to simply noting the oddity.
"Estimated number of rows modified since this table was last analyzed. Aborted transactions are ignored here since they will not cause the statistics computed by analyze to change."
So, here are the relevant counters, with their treatment of aborted transaction tuples:
seq_tup_read - says live
idx_tup_fetch - says live
n_tup_ins - default notice
n_tup_upd - default notice
n_tup_del - default notice
n_tup_hot_upd - default notice (is this correct?)
n_tup_newpage_upd - default notice (is this correct?)
n_live_tup - says live (is this a counter?)
n_dead_tup - says dead (is this a counter?)
n_mod_since_analyze - inline reason for non-default
n_ins_since_vacuum - default notice
I'm also thinking to reword n_tup_upd, something like:
Total number of rows updated. Subsets of these updates are also tracked in n_tup_hot_upd and n_tup_newpage_upd to facilitate performance monitoring.
David J.
P.S. I'm trying to understand what it means for rows from aborted deletes to be counted (or not...).
I spent some time thinking about this today. > "The tuple counters below, except where noted, are incremented even if the transaction aborts." I like this idea, and I think it fits good as a blurb under "27.2.2. Viewing Statistics" I suggest a slight re-write however. + <para> + An aborted transaction will also increment tuple-related counters, unless otherwise noted. + </para> > So, here are the relevant counters, with their treatment of aborted transaction tuples: > > seq_tup_read - says live > idx_tup_fetch - says live > n_tup_ins - default notice > n_tup_upd - default notice > n_tup_del - default notice > n_mod_since_analyze - inline reason for non-default > n_ins_since_vacuum - default notice All the counters mentioned above will increment the number of rows modified/accessed even in the case of an aborted transaction, except for n_mod_since_analyze. > n_live_tup - says live (is this a counter?) > n_dead_tup - says dead (is this a counter?) They are not values that are purely incremental. They are incremented by insert/update/delete for committed transactions, but are also updated by VACUUM or VACUUM FULL. So, these will need some inlined description of their behavior, > I'm also thinking to reword n_tup_upd, something like: > > Total number of rows updated. Subsets of these updates are also tracked in n_tup_hot_upd and n_tup_newpage_upd to facilitateperformance monitoring. I think the current explanation is clear enough, I am also not too thrilled about the "...to facilitate performance monitoring." since the cumulative stats system as a whole is known to be used to facilitate perf monitoring. What do you think of the attached? -- Sami Imseih Amazon Web Services (AWS) -- Sami Imseih Amazon Web Services (AWS)
Attachment
On Fri, Apr 11, 2025 at 12:33 PM Sami Imseih <samimseih@gmail.com> wrote:
> I'm also thinking to reword n_tup_upd, something like:
>
> Total number of rows updated. Subsets of these updates are also tracked in n_tup_hot_upd and n_tup_newpage_upd to facilitate performance monitoring.
I think the current explanation is clear enough, I am also not too
thrilled about the "...to facilitate performance monitoring." since
the cumulative stats system
as a whole is known to be used to facilitate perf monitoring.
Yeah, it was mostly a style thing - I was trying to avoid using parentheses, but the existing does make the needed point.
What do you think of the attached?
WFM. Though is there a reason to avoid adding the "why" of the exception for n_mod_since_analyze?
David J.
> WFM. Though is there a reason to avoid adding the "why" of the exception for n_mod_since_analyze? I did think about that. I thought it will be understood that since this is a field that deals with ANALYZE, it will be understood that only committed changes matter here, and not worth adding more text to the description. but, maybe it's worth it? -- Sami
On Fri, Apr 11, 2025 at 5:19 PM Sami Imseih <samimseih@gmail.com> wrote:
> WFM. Though is there a reason to avoid adding the "why" of the exception for n_mod_since_analyze?
I did think about that. I thought it will be understood that since
this is a field that deals with ANALYZE,
it will be understood that only committed changes matter here, and not
worth adding more text to the
description. but, maybe it's worth it?
Absent field questions I'm content to think it is sufficiently obvious or discoverable for others.
David J.