Thread: Automatic aggressive vacuum on almost frozen table takes too long

Automatic aggressive vacuum on almost frozen table takes too long

From

Mikhail Balayan

Date:

16 February 2023, 06:57:37

Hello,

I have a big table in the actively working system, in which nothing is written for a long time, and nothing is read from it. Table size is 15GB (data only), indexes 150GB.

Since the table does not change, after a while it crosses the autovacuum_freeze_max_age and an aggressive vacuum is triggered. And it would be OK, but vacuuming of the table takes a long time, despite the fact that exactly the same scan was made a few days before and almost all pages are marked as frozen, which is confirmed by data from the log:

automatic aggressive vacuum of table "appdbname.appschemaname.applications": index scans: 1

pages: 0 removed, 2013128 remain, 0 skipped due to pins, 2008230 skipped frozen

tuples: 2120 removed, 32616340 remain, 0 are dead but not yet removable, oldest xmin: 4111875427

buffer usage: 2005318781 hits, 19536511 misses, 23903 dirtied

avg read rate: 4.621 MB/s, avg write rate: 0.006 MB/s

system usage: CPU: user: 26398.27 s, system: 335.27 s, elapsed: 33029.00 s

That is, if I understand it correctly, it says that there were (and actually are) 2013128 pages of which 2008230 were skipped, which leaves 4898 blocks to be scanned. I.e. it seems that the allocated 1GB (autovacuum_work_mem) should be enough to handle that amount of blocks and to avoid multiple scans of the indexes.

But, based on buffer usage, one can see that a huge amount of data is read, greatly exceeding not only the number of remaining unfrozen blocks, but also the size of the table and indexes taken together: 2 billion blocks, more than 15TB.

Is this a bug in Postgresql or am I interpreting the log data wrong?

Just in case, I'm using Postgresql version: 11.11.

autovacuum_vacuum_cost_delay: 2ms

autovacuum_vacuum_cost_limit: 8000

Thank you.

BR,

Mikhael

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Adrian Klaver

Date:

16 February 2023, 15:44:11

On 2/15/23 22:57, Mikhail Balayan wrote:
> Hello,
> 
> I have a big table in the actively working system, in which nothing is 
> written for a long time, and nothing is read from it. Table size is 15GB 
> (data only), indexes 150GB.
> Since the table does not change, after a while it crosses the 
> autovacuum_freeze_max_age and an aggressive vacuum is triggered. And it 
> would be OK, but vacuuming of the table takes a long time, despite the 
> fact that exactly the same scan was made a few days before and almost 
> all pages are marked as frozen, which is confirmed by data from the log:
>          automatic aggressive vacuum of table 
> "appdbname.appschemaname.applications": index scans: 1
>              pages: 0 removed, 2013128 remain, 0 skipped due to pins, 
> 2008230 skipped frozen
>              tuples: 2120 removed, 32616340 remain, 0 are dead but not 
> yet removable, oldest xmin: 4111875427
>              buffer usage: 2005318781 hits, 19536511 misses, 23903 dirtied
>              avg read rate: 4.621 MB/s, avg write rate: 0.006 MB/s
>              system usage: CPU: user: 26398.27 s, system: 335.27 s, 
> elapsed: 33029.00 s
> 
> That is, if I understand it correctly, it says that there were (and 
> actually are) 2013128 pages of which 2008230 were skipped, which leaves 
> 4898 blocks to be scanned. I.e. it seems that the allocated 1GB 
> (autovacuum_work_mem) should be enough to handle that amount of blocks 
> and to avoid multiple scans of the indexes.
> But, based on buffer usage, one can see that a huge amount of data is 
> read, greatly exceeding not only the number of remaining unfrozen 
> blocks, but also the size of the table and indexes taken together: 2 
> billion blocks, more than 15TB.
> 
> Is this a bug in Postgresql or am I interpreting the log data wrong?

Not sure if this applies but from:

https://www.postgresql.org/docs/11/release-11-18.html

Release 11.18

Avoid long-term memory leakage in the autovacuum launcher process (Reid 
Thompson)

The lack of field reports suggests that this problem is only latent in 
pre-v15 branches; but it's not very clear why, so back-patch the fix anyway.


> 
> Just in case, I'm using Postgresql version: 11.11.

Besides the above you are missing 8 releases of other fixes.

> autovacuum_vacuum_cost_delay: 2ms
> autovacuum_vacuum_cost_limit: 8000
> 
> Thank you.
> 
> BR,
> Mikhael

-- 
Adrian Klaver
adrian.klaver@aklaver.com

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Peter Geoghegan

Date:

16 February 2023, 19:41:53

On Thu, Feb 16, 2023 at 7:44 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> > That is, if I understand it correctly, it says that there were (and
> > actually are) 2013128 pages of which 2008230 were skipped, which leaves
> > 4898 blocks to be scanned. I.e. it seems that the allocated 1GB
> > (autovacuum_work_mem) should be enough to handle that amount of blocks
> > and to avoid multiple scans of the indexes.

It's quite clear that there is only one pass over the indexes, since
"index scans: 1" says exactly that.

> > But, based on buffer usage, one can see that a huge amount of data is
> > read, greatly exceeding not only the number of remaining unfrozen
> > blocks, but also the size of the table and indexes taken together: 2
> > billion blocks, more than 15TB.
> >
> > Is this a bug in Postgresql or am I interpreting the log data wrong?

I think that it might be a bug in Postgres.

I addressed a similar issue in the same "buffer" instrumentation in
commit d3609dd2, but that wasn't backpatched because I imagined that
it only applied to the new VACUUM VERBOSE case (VACUUM VERBOSE only
started sharing the same instrumentation code as log_autovacuum in
Postgres 15). It's not immediately obvious how you could see a problem
like the one you've shown in the autovacuum log output. Even still,
the information about buffers that you've shown does indeed appear to
be total nonsense (while everything else we can see looks plausible).
There has to be some explanation for that.

The only other explanation I can think of is a pathological case where
an index scan by some particular ambulkdelete routine scans a number
of buffers that vastly exceeds the total size of the index. That does
seem just about possible with an access method like GIN. Do you have
any non-btree indexes on the table? Can you show us the details of the
table, including all of its indexes? In other words, can you show "\d
applications" output from psql?

-- 
Peter Geoghegan

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Mikhail Balayan

Date:

17 February 2023, 01:23:03

Adrian,

thanks for pointing out the fix. We are just about to update to 11.18 next month.

Mikhael

On Thu, 16 Feb 2023 at 23:44, Adrian Klaver <adrian.klaver@aklaver.com> wrote:

On 2/15/23 22:57, Mikhail Balayan wrote:
> Hello,
>
> I have a big table in the actively working system, in which nothing is
> written for a long time, and nothing is read from it. Table size is 15GB
> (data only), indexes 150GB.
> Since the table does not change, after a while it crosses the
> autovacuum_freeze_max_age and an aggressive vacuum is triggered. And it
> would be OK, but vacuuming of the table takes a long time, despite the
> fact that exactly the same scan was made a few days before and almost
> all pages are marked as frozen, which is confirmed by data from the log:
> automatic aggressive vacuum of table
> "appdbname.appschemaname.applications": index scans: 1
> pages: 0 removed, 2013128 remain, 0 skipped due to pins,
> 2008230 skipped frozen
> tuples: 2120 removed, 32616340 remain, 0 are dead but not
> yet removable, oldest xmin: 4111875427
> buffer usage: 2005318781 hits, 19536511 misses, 23903 dirtied
> avg read rate: 4.621 MB/s, avg write rate: 0.006 MB/s
> system usage: CPU: user: 26398.27 s, system: 335.27 s,
> elapsed: 33029.00 s
>
> That is, if I understand it correctly, it says that there were (and
> actually are) 2013128 pages of which 2008230 were skipped, which leaves
> 4898 blocks to be scanned. I.e. it seems that the allocated 1GB
> (autovacuum_work_mem) should be enough to handle that amount of blocks
> and to avoid multiple scans of the indexes.
> But, based on buffer usage, one can see that a huge amount of data is
> read, greatly exceeding not only the number of remaining unfrozen
> blocks, but also the size of the table and indexes taken together: 2
> billion blocks, more than 15TB.
>
> Is this a bug in Postgresql or am I interpreting the log data wrong?

Not sure if this applies but from:

https://www.postgresql.org/docs/11/release-11-18.html

Release 11.18

Avoid long-term memory leakage in the autovacuum launcher process (Reid
Thompson)

The lack of field reports suggests that this problem is only latent in
pre-v15 branches; but it's not very clear why, so back-patch the fix anyway.

>
> Just in case, I'm using Postgresql version: 11.11.

Besides the above you are missing 8 releases of other fixes.

> autovacuum_vacuum_cost_delay: 2ms
> autovacuum_vacuum_cost_limit: 8000
>
> Thank you.
>
> BR,
> Mikhael

--
Adrian Klaver
adrian.klaver@aklaver.com

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Mikhail Balayan

Date:

17 February 2023, 01:40:22

>> Even still,
>> the information about buffers that you've shown does indeed appear to
>> be total nonsense (while everything else we can see looks plausible).

Actually not only buffers. The elapsed time also looks strange. And this is the first reason why I paid attention to this situation. 33029 seconds, which is 9+ hours to scan 4898 blocks and related indexes. And by this metric, it seems that the amount of buffers processed is reliable (which does not negate the presence of the bug). Otherwise I can not explain such a long vacuuming of the table.

>> Do you have any non-btree indexes on the table? Can you show us the details of the
>> table, including all of its indexes? In other words, can you show "\d applications" output from psql?

Only btree indexes. Please find the full table schema below:

Thank you!

Mikhael

On Fri, 17 Feb 2023 at 03:42, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Feb 16, 2023 at 7:44 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> > That is, if I understand it correctly, it says that there were (and
> > actually are) 2013128 pages of which 2008230 were skipped, which leaves
> > 4898 blocks to be scanned. I.e. it seems that the allocated 1GB
> > (autovacuum_work_mem) should be enough to handle that amount of blocks
> > and to avoid multiple scans of the indexes.

It's quite clear that there is only one pass over the indexes, since
"index scans: 1" says exactly that.

> > But, based on buffer usage, one can see that a huge amount of data is
> > read, greatly exceeding not only the number of remaining unfrozen
> > blocks, but also the size of the table and indexes taken together: 2
> > billion blocks, more than 15TB.
> >
> > Is this a bug in Postgresql or am I interpreting the log data wrong?

I think that it might be a bug in Postgres.

I addressed a similar issue in the same "buffer" instrumentation in
commit d3609dd2, but that wasn't backpatched because I imagined that
it only applied to the new VACUUM VERBOSE case (VACUUM VERBOSE only
started sharing the same instrumentation code as log_autovacuum in
Postgres 15). It's not immediately obvious how you could see a problem
like the one you've shown in the autovacuum log output. Even still,
the information about buffers that you've shown does indeed appear to
be total nonsense (while everything else we can see looks plausible).
There has to be some explanation for that.

The only other explanation I can think of is a pathological case where
an index scan by some particular ambulkdelete routine scans a number
of buffers that vastly exceeds the total size of the index. That does
seem just about possible with an access method like GIN. Do you have
any non-btree indexes on the table? Can you show us the details of the
table, including all of its indexes? In other words, can you show "\d
applications" output from psql?

--
Peter Geoghegan

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Peter Geoghegan

Date:

17 February 2023, 21:34:57

On Thu, Feb 16, 2023 at 5:40 PM Mikhail Balayan <mv.balayan@gmail.com> wrote:
>> >> Do you have any non-btree indexes on the table? Can you show us the details of the
>> >> table, including all of its indexes? In other words, can you show "\d applications" output from psql?
>
> Only btree indexes. Please find the full table schema below:

It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here, which could have added a long delay.
But...that doesn't seem particularly likely.

Can you run amcheck's bt_index_check() routine against some of the
indexes you've shown? There is perhaps some chance that index
corruption exists and causes VACUUM to take a very long time to delete
index pages. This is pretty much a wild guess, though.

-- 
Peter Geoghegan

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Mikhail Balayan

Date:

20 February 2023, 05:29:16

>> Can you run amcheck's bt_index_check() routine against some of the
>> indexes you've shown? There is perhaps some chance that index
>> corruption exists and causes VACUUM to take a very long time to delete
>> index pages. This is pretty much a wild guess, though.

Unfortunately I can't, we haven't enabled this extension yet. And since this is a production, I'm not ready to turn it on right away. But I can say that this theory is unlikely, since this problem occurs on different sites. Here's an example of an output from another site where table size is 46.5 GB (again, data only), indexes 107GB, toast 62MB:

automatic aggressive vacuum of table "appdbname2.appschemaname.applications": index scans: 1
pages: 0 removed, 6091646 remain, 0 skipped due to pins, 6086395 skipped frozen
tuples: 2344 removed, 35295654 remain, 0 are dead but not yet removable, oldest xmin: 213412878
buffer usage: 251980554 hits, 14462331 misses, 18844 dirtied
avg read rate: 12.018 MB/s, avg write rate: 0.016 MB/s
system usage: CPU: user: 7734.43 s, system: 178.98 s, elapsed: 9401.36 s

Here again we see that there are 5251 blocks that need to be cleaned (6091646 - 6086395), buffer usage is 266461729 blocks or ~ 2 TB and processing time is 2.5h+.

>> It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here

If such a scenario is possible, it makes sense to add information about the blocking waiting time to the output. Something like:

system usage: CPU: user: 7734.43 s, system: 178.98 s, lock_wait: 1234.56 s, elapsed: 9401.36 s

Mikhail

On Sat, 18 Feb 2023 at 05:35, Peter Geoghegan <pg@bowt.ie> wrote:

On Thu, Feb 16, 2023 at 5:40 PM Mikhail Balayan <mv.balayan@gmail.com> wrote:
>> >> Do you have any non-btree indexes on the table? Can you show us the details of the
>> >> table, including all of its indexes? In other words, can you show "\d applications" output from psql?
>
> Only btree indexes. Please find the full table schema below:

It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here, which could have added a long delay.
But...that doesn't seem particularly likely.

Can you run amcheck's bt_index_check() routine against some of the
indexes you've shown? There is perhaps some chance that index
corruption exists and causes VACUUM to take a very long time to delete
index pages. This is pretty much a wild guess, though.

--
Peter Geoghegan

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Mikhail Balayan

Date:

21 February 2023, 05:43:00

Probably the result of vacuum freeze manual execution will give some more information:

Table size:

Each index size:

tablename | indexname | index_size | unique |
--------------+------------------------------------------------------------+------------+--------+
applications | chk_invariant_1_apps2 | 8192 bytes | Y |
applications | applications2_pkey | 1645 MB | Y |
applications | idx_applications2_agent_id | 2982 MB | N |
applications | idx_applications2_context_id | 2708 MB | N |
applications | idx_applications2_context_tenant_id | 1560 MB | N |
applications | idx_applications2_context_type | 4963 MB | N |
applications | idx_applications2_deleted_at | 3264 MB | N |
applications | idx_applications2_enabled | 4975 MB | N |
applications | idx_applications2_euc_index | 1684 MB | N |
applications | idx_applications2_policy_id | 3131 MB | N |
applications | idx_applications2_policy_type | 16 GB | N |
applications | idx_applications2_root_policy_id | 1574 MB | N |
applications | idx_applications2_status | 5391 MB | N |
applications | idx_applications2_status_by_policy_fields_context_tenant_i | 13 MB | N |
applications | idx_applications2_status_by_policy_fields_tenant_id | 15 MB | N |
applications | idx_applications2_tenant_id | 1670 MB | N |
applications | idx_applications_created_at | 2855 MB | N |
applications | tmp_idx_applications2_updated_at2 | 3505 MB | N |
applications | uq_policy_id_context2 | 1398 MB | Y |

And vacuum freeze output:

appdbname3=# VACUUM (FREEZE, VERBOSE) appschemaname.applications;
INFO: aggressively vacuuming "appschemaname.applications"
INFO: scanned index "applications2_pkey" to remove 10488 row versions
DETAIL: CPU: user: 2.45 s, system: 1.07 s, elapsed: 5.16 s
INFO: scanned index "tmp_idx_applications2_updated_at2" to remove 10488 row versions
DETAIL: CPU: user: 2.53 s, system: 2.86 s, elapsed: 11.56 s
INFO: scanned index "idx_applications2_agent_id" to remove 10488 row versions
DETAIL: CPU: user: 3.16 s, system: 2.36 s, elapsed: 11.04 s
INFO: scanned index "idx_applications2_context_id" to remove 10488 row versions
DETAIL: CPU: user: 2.22 s, system: 1.95 s, elapsed: 8.06 s
INFO: scanned index "idx_applications2_context_tenant_id" to remove 10488 row versions
DETAIL: CPU: user: 1.86 s, system: 1.16 s, elapsed: 5.10 s
INFO: scanned index "idx_applications2_context_type" to remove 10488 row versions
DETAIL: CPU: user: 190.09 s, system: 6.48 s, elapsed: 220.88 s
INFO: scanned index "idx_applications2_deleted_at" to remove 10488 row versions
DETAIL: CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s
INFO: scanned index "idx_applications2_enabled" to remove 10488 row versions
DETAIL: CPU: user: 78.22 s, system: 4.74 s, elapsed: 96.96 s
INFO: scanned index "idx_applications2_policy_id" to remove 10488 row versions
DETAIL: CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s
INFO: scanned index "idx_applications2_policy_type" to remove 10488 row versions
DETAIL: CPU: user: 300.45 s, system: 16.05 s, elapsed: 365.80 s
INFO: scanned index "idx_applications2_root_policy_id" to remove 10488 row versions
DETAIL: CPU: user: 2.13 s, system: 1.20 s, elapsed: 6.16 s
INFO: scanned index "idx_applications2_status" to remove 10488 row versions
DETAIL: CPU: user: 153.79 s, system: 6.57 s, elapsed: 185.76 s
INFO: scanned index "idx_applications2_status_by_policy_fields_context_tenant_i" to remove 10488 row versions
DETAIL: CPU: user: 0.02 s, system: 0.01 s, elapsed: 0.06 s
INFO: scanned index "idx_applications2_status_by_policy_fields_tenant_id" to remove 10488 row versions
DETAIL: CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.05 s
INFO: scanned index "idx_applications2_tenant_id" to remove 10488 row versions
DETAIL: CPU: user: 2.11 s, system: 1.23 s, elapsed: 6.19 s
INFO: scanned index "idx_applications2_euc_index" to remove 10488 row versions
DETAIL: CPU: user: 2.05 s, system: 1.27 s, elapsed: 6.05 s
INFO: scanned index "chk_invariant_1_apps2" to remove 10488 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: scanned index "uq_policy_id_context2" to remove 10488 row versions
DETAIL: CPU: user: 1.24 s, system: 1.00 s, elapsed: 4.40 s
INFO: scanned index "idx_applications_created_at" to remove 10488 row versions
DETAIL: CPU: user: 2.32 s, system: 1.97 s, elapsed: 8.46 s
INFO: "applications": removed 10488 row versions in 7580 pages
DETAIL: CPU: user: 0.86 s, system: 0.08 s, elapsed: 1.03 s
INFO: index "applications2_pkey" now contains 17803283 row versions in 210561 pages
DETAIL: 0 index row versions were removed.
59 index pages have been deleted, 59 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "tmp_idx_applications2_updated_at2" now contains 17803283 row versions in 448580 pages
DETAIL: 0 index row versions were removed.
19542 index pages have been deleted, 19542 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_agent_id" now contains 17803283 row versions in 381741 pages
DETAIL: 0 index row versions were removed.
61237 index pages have been deleted, 28239 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_context_id" now contains 17803283 row versions in 346594 pages
DETAIL: 0 index row versions were removed.
57308 index pages have been deleted, 24105 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_context_tenant_id" now contains 17803283 row versions in 199743 pages
DETAIL: 0 index row versions were removed.
32830 index pages have been deleted, 14767 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_context_type" now contains 17803283 row versions in 635317 pages
DETAIL: 0 index row versions were removed.
43331 index pages have been deleted, 41233 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_deleted_at" now contains 17803283 row versions in 417770 pages
DETAIL: 0 index row versions were removed.
15695 index pages have been deleted, 15381 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_enabled" now contains 17803283 row versions in 636750 pages
DETAIL: 0 index row versions were removed.
26332 index pages have been deleted, 26037 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_policy_id" now contains 17803283 row versions in 400718 pages
DETAIL: 0 index row versions were removed.
55007 index pages have been deleted, 29671 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_policy_type" now contains 17803283 row versions in 2084261 pages
DETAIL: 7990 index row versions were removed.
56012 index pages have been deleted, 55365 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_root_policy_id" now contains 5015730 row versions in 201441 pages
DETAIL: 5807 index row versions were removed.
22964 index pages have been deleted, 17915 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_status" now contains 17803283 row versions in 690063 pages
DETAIL: 10488 index row versions were removed.
31401 index pages have been deleted, 31113 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_status_by_policy_fields_context_tenant_i" now contains 155059 row versions in 1707 pages
DETAIL: 223 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_status_by_policy_fields_tenant_id" now contains 155059 row versions in 1883 pages
DETAIL: 223 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_tenant_id" now contains 17803283 row versions in 213798 pages
DETAIL: 10488 index row versions were removed.
33323 index pages have been deleted, 16730 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications2_euc_index" now contains 17803283 row versions in 215570 pages
DETAIL: 10488 index row versions were removed.
32261 index pages have been deleted, 16643 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "chk_invariant_1_apps2" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "uq_policy_id_context2" now contains 5015730 row versions in 178953 pages
DETAIL: 5807 index row versions were removed.
23793 index pages have been deleted, 13898 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "idx_applications_created_at" now contains 17803283 row versions in 365479 pages
DETAIL: 10488 index row versions were removed.
19020 index pages have been deleted, 18938 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "applications": found 0 removable, 471321 nonremovable row versions in 30772 out of 1278613 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 4177172255
There were 1490846 unused item pointers.
Skipped 0 pages due to buffer pins, 1247841 frozen pages.
0 pages are entirely empty.
CPU: user: 816.65 s, system: 55.37 s, elapsed: 1031.96 s.
INFO: aggressively vacuuming "pg_toast.pg_toast_35599271"
INFO: "pg_toast_35599271": found 0 removable, 2 nonremovable row versions in 1 out of 2328 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 4178183844
There were 2 unused item pointers.
Skipped 0 pages due to buffer pins, 2327 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

What catches my eye: scanning indexes smaller than 3.1GB is fast, larger ones are slow. For example:

idx_applications2_policy_id is 3131 MB took just 5 seconds (DETAIL: CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s)

but idx_applications2_deleted_at with 3264 MB took 1 minute 22 seconds (DETAIL: CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s)

In addition to this, I collected statistics during the idx_applications2_enabled scan:

# strace -c -p 26792
strace: Process 26792 attached
^Cstrace: Process 26792 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
90.87 2.931694 4 711882 read
4.61 0.148573 3 55598 19503 futex
3.01 0.097138 1 70933 lseek
0.87 0.027926 7 4261 write
0.44 0.014348 23 623 fdatasync
0.20 0.006372 5 1245 kill
0.00 0.000081 5 17 open
0.00 0.000042 4 12 select
0.00 0.000038 3 12 close
0.00 0.000000 0 1 rt_sigreturn
0.00 0.000000 0 1 sendto
0.00 0.000000 0 2 getrusage
------ ----------- ----------- --------- --------- ----------------

Mikhael

On Mon, 20 Feb 2023 at 13:29, Mikhail Balayan <mv.balayan@gmail.com> wrote:

>> Can you run amcheck's bt_index_check() routine against some of the
>> indexes you've shown? There is perhaps some chance that index
>> corruption exists and causes VACUUM to take a very long time to delete
>> index pages. This is pretty much a wild guess, though.

Unfortunately I can't, we haven't enabled this extension yet. And since this is a production, I'm not ready to turn it on right away. But I can say that this theory is unlikely, since this problem occurs on different sites. Here's an example of an output from another site where table size is 46.5 GB (again, data only), indexes 107GB, toast 62MB:
automatic aggressive vacuum of table "appdbname2.appschemaname.applications": index scans: 1
pages: 0 removed, 6091646 remain, 0 skipped due to pins, 6086395 skipped frozen
tuples: 2344 removed, 35295654 remain, 0 are dead but not yet removable, oldest xmin: 213412878
buffer usage: 251980554 hits, 14462331 misses, 18844 dirtied
avg read rate: 12.018 MB/s, avg write rate: 0.016 MB/s
system usage: CPU: user: 7734.43 s, system: 178.98 s, elapsed: 9401.36 s

Here again we see that there are 5251 blocks that need to be cleaned (6091646 - 6086395), buffer usage is 266461729 blocks or ~ 2 TB and processing time is 2.5h+.

>> It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here
If such a scenario is possible, it makes sense to add information about the blocking waiting time to the output. Something like:
system usage: CPU: user: 7734.43 s, system: 178.98 s, lock_wait: 1234.56 s, elapsed: 9401.36 s

Mikhail

On Sat, 18 Feb 2023 at 05:35, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Feb 16, 2023 at 5:40 PM Mikhail Balayan <mv.balayan@gmail.com> wrote:
>> >> Do you have any non-btree indexes on the table? Can you show us the details of the
>> >> table, including all of its indexes? In other words, can you show "\d applications" output from psql?
>
> Only btree indexes. Please find the full table schema below:

It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here, which could have added a long delay.
But...that doesn't seem particularly likely.

Can you run amcheck's bt_index_check() routine against some of the
indexes you've shown? There is perhaps some chance that index
corruption exists and causes VACUUM to take a very long time to delete
index pages. This is pretty much a wild guess, though.

--
Peter Geoghegan

Re: Automatic aggressive vacuum on almost frozen table takes too long

From

Peter Geoghegan

Date:

21 February 2023, 05:50:56

On Mon, Feb 20, 2023 at 9:43 PM Mikhail Balayan <mv.balayan@gmail.com> wrote:
> What catches my eye: scanning indexes smaller than 3.1GB is fast, larger ones are slow. For example:
> idx_applications2_policy_id is 3131 MB took just 5 seconds (DETAIL:  CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32
s)
> but idx_applications2_deleted_at with 3264 MB took 1 minute 22 seconds (DETAIL:  CPU: user: 67.93 s, system: 3.41 s,
elapsed:82.75 s)

I think that I know what this is.

If you delete many index pages during VACUUM, and those pages are all
full of duplicate values, the deletion operation can sometimes be
slower due to the need to relocate a downlink to each to-be-deleted
leaf page. When there are thousands of matches, you'll start to notice
O(n^2) behavior due to the way in which the B-Tree VACUUM code must
grovel through the parent level, which is full of duplicate keys.

If you were on Postgres 12+, then this wouldn't happen, because the
heap TID is treated as a part of the key space there, affecting sort
order. The implementation would immediately relocate the matching
parent downlink using a unique key (unique because heap TID would act
as a unique-ifier on that version). And if you were on 14+, things in
this area would be much better still.

-- 
Peter Geoghegan