Thread: BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する

I hope this email finds you well. I would like to apologize in advance for any language mistakes, as my English is not very good. However, I will do my best to communicate clearly.

Regarding the report "BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する" submitted from the email address "antengynnnn536(at)gmail(dot)com", I am writing to inform you that I work at the same company. Since the owner of the mentioned email address is away from work for a while, I will be corresponding on their behalf.

I have already contacted AWS Support for verification regarding this issue. Although the case was escalated to their development department, AWS stated that this behavior is due to a PostgreSQL specification and they would not address it. They recommended contacting PostgreSQL directly.

In our company's environment, we rely heavily on Autovacuum and Autoanalyze for database management due to the high load of executing many SQL queries and managing execution plans. If Autovacuum causes the reltuples value to become inaccurate, it could significantly impact SQL performance, resulting in substantial influence.

Since this issue could also cause unintended SQL delays for other users, we are hoping for an improvement in this area.

Thank you very much for your attention to this matter.

Best regards,


=======================================================================
このE-mailに含まれる情報は、このE-mailの名宛人のみが利用可能な限定された情報です。

万一、当方の誤送信等によりこのE-mailをお受け取りになった場合は、深くお詫び申し上げます。
誠に勝手ではございますが、このメールの印刷、コピー、転送その他
一切のご使用をお控えいただきますようお願いいたします。
お手数をお掛けいたしますが、直ちにこのE-mailを破棄して頂くとともに、誤送信である旨を送信者に通知いただきますようご協力をお願い申し上げます。
=======================================================================

[ex-al]
On Sun, Feb 9, 2025 at 9:10 PM 濱中 弘和 <hamanaka7767.ita@al.asahi-life.co.jp> wrote:
I hope this email finds you well. I would like to apologize in advance for any language mistakes, as my English is not very good. However, I will do my best to communicate clearly.

Regarding the report "BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する" submitted from the email address "antengynnnn536(at)gmail(dot)com", I am writing to inform you that I work at the same company.

Since this issue could also cause unintended SQL delays for other users, we are hoping for an improvement in this area.

Your English here is quite good.  Maybe you can translate the original bug report into English, or just file your own, so that others can review it and know it is reasonably accurate.  It is unusual, IME, to use a translation service to perform such conversions - especially for something as technical and precise as a bug report.

David J.

Dear David J.,

Thank you very much for your kind words.

I have created a new bug report.

Below is the detailed bug report:

----------------------------------------------------------------------------------------------------------------------------------------

1.Test Environment
① PostgreSQL 13.9 *Using Aurora PostgreSQL.
② Parameters
  autovacuum=on
  autovacuum_analyze_scale_factor=0.05
  autovacuum_analyze_threshold=50
  autovacuum_naptime=30
  autovacuum_vacuum_cost_delay=20
  autovacuum_vacuum_cost_limit=200
  autovacuum_vacuum_insert_scale_factor=0.2
  autovacuum_vacuum_insert_threshold=1000
  autovacuum_vacuum_scale_factor=0.2
  autovacuum_vacuum_threshold=50


2.Test to Confirm the Inaccuracy of pg_class.reltuples

The following operations were performed:

① Inserted 50,000 records into a truncated table.
  Confirmed that autovacuum and autoanalyze were completed.


devsmtdb=> SELECT relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,
n_live_tup,n_dead_tup,n_mod_since_analyze,n_ins_since_vacuum,
last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,analyze_count,autovacuum_count,autoanalyze_count
from pg_stat_user_tables where relname = 'aa_test';
-[ RECORD 1 ]-------+------------------------------
relname             | aa_test
n_tup_ins           | 50000
n_tup_upd           | 0
n_tup_del           | 0
n_tup_hot_upd       | 0
n_live_tup          | 50000
n_dead_tup          | 0
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     | 2025-02-10 17:24:24.507357+09
last_analyze        |
last_autoanalyze    | 2025-02-10 17:24:24.586928+09
vacuum_count        | 0
analyze_count       | 0
autovacuum_count    | 1
autoanalyze_count   | 1

devsmtdb=> SELECT relname,relpages,reltuples FROM PG_CLASS WHERE RELNAME = 'aa_test';
-[ RECORD 1 ]------
relname   | aa_test
relpages  | 271
reltuples | 50000

devsmtdb=>


② Deleted the first row of the table from Transaction 1, and waited without committing.

devsmtdb=> begin;
BEGIN
devsmtdb=*> delete from aa_test WHERE a = 1;
DELETE 1
devsmtdb=*>


③ Deleted 20,000 rows from Transaction 2 and committed.
  Confirmed that autovacuum and autoanalyze were triggered afterward.
  Autovacuum did not complete and was skipped.
  Autoanalyze finished successfully. → pg_class.reltuples became 30,000.

devsmtdb=> begin;
BEGIN
devsmtdb=*> delete from aa_test WHERE a > 30000;
DELETE 20000
devsmtdb=*>
devsmtdb=*> commit;
COMMIT
devsmtdb=>


devsmtdb=> SELECT relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,
n_live_tup,n_dead_tup,n_mod_since_analyze,n_ins_since_vacuum,
last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,analyze_count,autovacuum_count,autoanalyze_count
from pg_stat_user_tables where relname = 'aa_test';
-[ RECORD 1 ]-------+------------------------------
relname             | aa_test
n_tup_ins           | 50000
n_tup_upd           | 0
n_tup_del           | 20000
n_tup_hot_upd       | 0
n_live_tup          | 30000
n_dead_tup          | 20000
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     | 2025-02-10 17:27:24.541284+09
last_analyze        |
last_autoanalyze    | 2025-02-10 17:27:24.608364+09
vacuum_count        | 0
analyze_count       | 0
autovacuum_count    | 2
autoanalyze_count   | 2

devsmtdb=>
devsmtdb=> SELECT relname,relpages,reltuples FROM PG_CLASS WHERE RELNAME = 'aa_test';
-[ RECORD 1 ]------
relname   | aa_test
relpages  | 271
reltuples | 30000


④ The skipped autovacuum was triggered every 30 seconds. Each time it triggered, pg_class.reltuples decreased.
  After about 20 triggers, pg_class.reltuples became approximately 500.

devsmtdb=> SELECT relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,
n_live_tup,n_dead_tup,n_mod_since_analyze,n_ins_since_vacuum,
last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,analyze_count,autovacuum_count,autoanalyze_count
from pg_stat_user_tables where relname = 'aa_test';
-[ RECORD 1 ]-------+------------------------------
relname             | aa_test
n_tup_ins           | 50000
n_tup_upd           | 0
n_tup_del           | 20000
n_tup_hot_upd       | 0
n_live_tup          | 10931
n_dead_tup          | 20000
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     | 2025-02-10 17:28:24.572012+09
last_analyze        |
last_autoanalyze    | 2025-02-10 17:27:24.608364+09
vacuum_count        | 0
analyze_count       | 0
autovacuum_count    | 4
autoanalyze_count   | 2

devsmtdb=>
devsmtdb=> SELECT relname,relpages,reltuples FROM PG_CLASS WHERE RELNAME = 'aa_test';
-[ RECORD 1 ]------
relname   | aa_test
relpages  | 271
reltuples | 10931

・・・skip over the intermediate steps・・・


devsmtdb=> SELECT relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,
n_live_tup,n_dead_tup,n_mod_since_analyze,n_ins_since_vacuum,
last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,analyze_count,autovacuum_count,autoanalyze_count
from pg_stat_user_tables where relname = 'aa_test';
-[ RECORD 1 ]-------+------------------------------
relname             | aa_test
n_tup_ins           | 50000
n_tup_upd           | 0
n_tup_del           | 20000
n_tup_hot_upd       | 0
n_live_tup          | 532
n_dead_tup          | 20000
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     | 2025-02-10 17:36:24.76081+09
last_analyze        |
last_autoanalyze    | 2025-02-10 17:27:24.608364+09
vacuum_count        | 0
analyze_count       | 0
autovacuum_count    | 20
autoanalyze_count   | 2

devsmtdb=>
devsmtdb=> SELECT relname,relpages,reltuples FROM PG_CLASS WHERE RELNAME = 'aa_test';
-[ RECORD 1 ]------
relname   | aa_test
relpages  | 271
reltuples | 532


⑤ Committed Transaction 1.

⑥ Autovacuum completed. pg_class.reltuples remained approximately 530.

devsmtdb=> SELECT relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,
n_live_tup,n_dead_tup,n_mod_since_analyze,n_ins_since_vacuum,
last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,analyze_count,autovacuum_count,autoanalyze_count
from pg_stat_user_tables where relname = 'aa_test';
-[ RECORD 1 ]-------+------------------------------
relname             | aa_test
n_tup_ins           | 50000
n_tup_upd           | 0
n_tup_del           | 20001
n_tup_hot_upd       | 0
n_live_tup          | 530
n_dead_tup          | 0
n_mod_since_analyze | 1
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     | 2025-02-10 17:36:54.77512+09
last_analyze        |
last_autoanalyze    | 2025-02-10 17:27:24.608364+09
vacuum_count        | 0
analyze_count       | 0
autovacuum_count    | 21
autoanalyze_count   | 2

devsmtdb=>
devsmtdb=>
devsmtdb=> SELECT relname,relpages,reltuples FROM PG_CLASS WHERE RELNAME = 'aa_test';
-[ RECORD 1 ]------
relname   | aa_test
relpages  | 163
reltuples | 530

devsmtdb=>


⑦Execution Plan (before analyze)
devsmtdb=> explain analyze select * from aa_test;
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN | Seq Scan on aa_test  (cost=0.00..168.30 rows=530 width=11) (actual time=0.003..1.359 rows=29999 loops=1)
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN | Planning Time: 0.226 ms
-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN | Execution Time: 2.245 ms

devsmtdb=> explain analyze select * from aa_test where a > 20000;
-[ RECORD 1 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN | Seq Scan on aa_test  (cost=0.00..169.62 rows=177 width=11) (actual time=1.565..2.698 rows=10000 loops=1)
-[ RECORD 2 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN |   Filter: (a > '20000'::numeric)
-[ RECORD 3 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN |   Rows Removed by Filter: 19999
-[ RECORD 4 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN | Planning Time: 0.095 ms
-[ RECORD 5 ]--------------------------------------------------------------------------------------------------------
QUERY PLAN | Execution Time: 3.005 ms

devsmtdb=>


⑧Executed analyze. pg_class.reltuples became 29,999.

devsmtdb=> SELECT relname,n_tup_ins,n_tup_upd,n_tup_del,n_tup_hot_upd,
n_live_tup,n_dead_tup,n_mod_since_analyze,n_ins_since_vacuum,
last_vacuum,last_autovacuum,last_analyze,last_autoanalyze,
vacuum_count,analyze_count,autovacuum_count,autoanalyze_count
from pg_stat_user_tables where relname = 'aa_test';
-[ RECORD 1 ]-------+------------------------------
relname             | aa_test
n_tup_ins           | 50000
n_tup_upd           | 0
n_tup_del           | 20001
n_tup_hot_upd       | 0
n_live_tup          | 29999
n_dead_tup          | 0
n_mod_since_analyze | 0
n_ins_since_vacuum  | 0
last_vacuum         |
last_autovacuum     | 2025-02-10 17:36:54.77512+09
last_analyze        | 2025-02-10 17:40:55.471849+09
last_autoanalyze    | 2025-02-10 17:27:24.608364+09
vacuum_count        | 0
analyze_count       | 1
autovacuum_count    | 21
autoanalyze_count   | 2

devsmtdb=> SELECT relname,relpages,reltuples FROM PG_CLASS WHERE RELNAME = 'aa_test';
-[ RECORD 1 ]------
relname   | aa_test
relpages  | 163
reltuples | 29999

devsmtdb=>


⑨Execution Plan (after analyze)
devsmtdb=> explain analyze select * from aa_test;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN | Seq Scan on aa_test  (cost=0.00..462.99 rows=29999 width=11) (actual time=0.004..1.360 rows=29999 loops=1)
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN | Planning Time: 0.080 ms
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN | Execution Time: 2.245 ms

devsmtdb=> explain analyze select * from aa_test where a > 20000;
-[ RECORD 1 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN | Seq Scan on aa_test  (cost=0.00..537.99 rows=10000 width=11) (actual time=1.604..2.724 rows=10000 loops=1)
-[ RECORD 2 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN |   Filter: (a > '20000'::numeric)
-[ RECORD 3 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN |   Rows Removed by Filter: 19999
-[ RECORD 4 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN | Planning Time: 0.085 ms
-[ RECORD 5 ]----------------------------------------------------------------------------------------------------------
QUERY PLAN | Execution Time: 3.030 ms

devsmtdb=>


3.Background of the Issue
Suddenly, an SQL query experienced significant delay.
The cause was the joining of large tables.

The execution plan indicated a join between 1 row and 29 rows, thus choosing a nested loop.
However, in reality, it turned out to be a nested loop with 401,400 rows and 13,380 rows, resulting in a substantial delay.
The person in charge ran ANALYZE to prioritize recovery.
It has not been confirmed if the reduction in pg_class.reltuples was entirely the cause.

During the process of investigating the cause of the SQL query suddenly delaying, it was discovered that pg_class.reltuples had become inaccurate.
It is understood that the execution plan relies on information from pg_class.reltuples,
and the reduction in pg_class.reltuples might have been the cause.
In fact, the table that caused the delay had about 6000 in pg_stat_user_tables.autovacuum_count,
and about 200 in pg_stat_user_tables.autoanalyze_count, indicating that autovacuum retrials were occurring.

There is a concern that inefficient execution plans might occur in the future.



Here is an excerpt of the execution plan for the delayed SQL query.

-> Nested Loop (cost=21.66..26.67 rows=1 width=43) (actual time=214.306..3363978.394 rows=401400 loops=1)
   Join Filter: (kn06.sykinid = tb_kn06knmknrkhninf.sykinid)
   Rows Removed by Join Filter: 5405252400
   -> Hash Join (cost=5.67..7.81 rows=1 width=9) (actual time=6.436..14.643 rows=13380 loops=1)
   ~~skip~~
   -> Nested Loop (cost=15.99..18.50 rows=29 width=34) (actual time=0.012..233.294 rows=404010 loops=13380)
   ~~skip~~

----------------------------------------------------------------------------------------------------------------------------------------

Best regards,

差出人: David G. Johnston <david.g.johnston@gmail.com>
送信日時: 2025年2月10日 13:18
宛先: 濱中 弘和 <hamanaka7767.ita@al.asahi-life.co.jp>
CC: pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
件名: Re: BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する
 
On Sun, Feb 9, 2025 at 9:10 PM 濱中 弘和 <hamanaka7767.ita@al.asahi-life.co.jp> wrote:
I hope this email finds you well. I would like to apologize in advance for any language mistakes, as my English is not very good. However, I will do my best to communicate clearly.

Regarding the report "BUG #18773: オートバキュームのリトライ時にreltuplesの値が減少する" submitted from the email address "antengynnnn536(at)gmail(dot)com", I am writing to inform you that I work at the same company.

Since this issue could also cause unintended SQL delays for other users, we are hoping for an improvement in this area.

Your English here is quite good.  Maybe you can translate the original bug report into English, or just file your own, so that others can review it and know it is reasonably accurate.  It is unusual, IME, to use a translation service to perform such conversions - especially for something as technical and precise as a bug report.

David J.

=======================================================================
このE-mailに含まれる情報は、このE-mailの名宛人のみが利用可能な限定された情報です。

万一、当方の誤送信等によりこのE-mailをお受け取りになった場合は、深くお詫び申し上げます。
誠に勝手ではございますが、このメールの印刷、コピー、転送その他
一切のご使用をお控えいただきますようお願いいたします。
お手数をお掛けいたしますが、直ちにこのE-mailを破棄して頂くとともに、誤送信である旨を送信者に通知いただきますようご協力をお願い申し上げます。
=======================================================================

[ex-al]

reltuples decreasing with each autovacuum run

From
Fujii Masao
Date:

On 2025/02/10 18:47, 濱中 弘和 wrote:
> Dear David J.,
> 
> Thank you very much for your kind words.
> 
> I have created a new bug report.

I’ve updated the email subject to an English version for broader understanding.

> ④ The skipped autovacuum was triggered every 30 seconds. Each time it triggered, pg_class.reltuples decreased.
>    After about 20 triggers, pg_class.reltuples became approximately 500.

As mentioned in the original thread [1], I believe this issue comes from
the logic in vac_estimate_reltuples(). This function estimates reltuples
by summing the number of live tuples found in scanned pages with
an estimate for unscanned pages. While this seems to work well when live tuples
are evenly distributed, it can underestimate reltuples if unscanned pages
have a higher density of live tuples than scanned ones.

Commit 74388a1ac3 seems attempt to address this by keeping the old reltuples
if VACUUM scans <=2% of total pages. However, when VACUUM scans more than 2%,
the estimation may still be inaccurate, leading to a gradual decrease in reltuples.

One idea for this issue is to raise the threshold from 2% to, say, 10%,
though this would only be a partial improvement...

Alternatively, if VACUUM is likely to set incorrect reltuples, another just idea
is to introduce a storage parameter to prevent VACUUM from updating it,
allowing only ANALYZE to make changes.

Regards,

[1] https://postgr.es/m/3dec196d-72a6-447f-ad2e-f2668f2907a9@oss.nttdata.com

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION