Re: New criteria for autovacuum - Mailing list pgsql-hackers

From Sami Imseih
Subject Re: New criteria for autovacuum
Date
Msg-id CAA5RZ0sPur3KUt9JXW9J88Lcjosdz73m91SdJnXrFgat7Lqf1Q@mail.gmail.com
Whole thread Raw
In response to Re: New criteria for autovacuum  (Konstantin Knizhnik <knizhnik@garret.ru>)
List pgsql-hackers
> From logical point of view I agree with you: taken in account number of inserted tuples makes sense if it allows to
markpage as all-visible.
 
> So `ins_since_vacuum` should be better renamed to `ins_all_visible_since_vacuum` and count only all-visible tuples.
> If newly inserted tuple is not visible to all, then it should not be accounted in statistic and trigger autovacuum.

cumulative stats are not updated for an in-flight transaction, so
effectively, these rows
are not accounted for by vacuum.

I did just realize however that it is strange that a rolledback
session will accumulate n_tup_ins and n_ins_since_vacuum

# session 1
test=# create table t (id int);
CREATE TABLE
test=# begin;
BEGIN
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1
test=*# insert into t values (1);
INSERT 0 1

# session 2

test=#  select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
 n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
         0 |         0 |         0 |          0 |          0 |
         0
(1 row)

# session 1
ROLLBACK

# session 2
test=#  select n_tup_ins, n_tup_upd, n_tup_del, n_live_tup,
n_dead_tup, n_ins_since_vacuum from pg_stat_all_tables where relname =
't';
 n_tup_ins | n_tup_upd | n_tup_del | n_live_tup | n_dead_tup |
n_ins_since_vacuum
-----------+-----------+-----------+------------+------------+--------------------
         3 |         0 |         0 |          0 |          3 |
         3
(1 row)

ROLLBACKs should not be the norm, but this looks like a bug to me
as it may trigger vacuum based on insert threshold more often.
Thoughts?


> So I still think that maintaining count of heap visibility check is the best alternative.
> It quite easy to implement, adds almost no overhead and this information indicates efficiency
> of index-only scan. So it seems to be useful even if not used by autovacuum.

I agree. It could aid in a user that wants to schedule some manual
vacuum for tables
that need very optimal index-only scans, or to per-table tune autovac
for those types of
tables. I actually think we should add a few columns as mentioned here [0]


>> That being said, long-running transactions are a problem for
>> autovacuum in general. Even if you track this stat you are proposing
>> about heap fetches by index only scans, you won't know if the long
>> running transaction is over and thus if it makes sense to try and
>> trigger an autovacuum for that table again anyway.

> This. It would be really useful to have some kind of a system for
> figuring out when -- in terms of XIDs -- we ought to vacuum which
> table. I think that's a hard problem, but it would help a lot of
> people.

hmm, isn't that what anti-wraparound autovac does? or, I may
have missed the point here completely


[0] https://www.postgresql.org/message-id/CAA5RZ0t1U38qtVAmg3epjh5RBbpT4VRB_Myfp0oGm_73w-UNRA%40mail.gmail.com

--
Sami Imseih
Amazon Web Services (AWS)



pgsql-hackers by date:

Previous
From: Nathan Bossart
Date:
Subject: Re: Statistics Import and Export
Next
From: Nathan Bossart
Date:
Subject: Re: Statistics Import and Export