Re: Automatic aggressive vacuum on almost frozen table takes too long - Mailing list pgsql-general

From Mikhail Balayan
Subject Re: Automatic aggressive vacuum on almost frozen table takes too long
Date
Msg-id CAC2oM1Ze2zQDCdgc_oM_SbnSHkmnsfxCPBHkdH15cLDQ=cxgtw@mail.gmail.com
Whole thread Raw
In response to Re: Automatic aggressive vacuum on almost frozen table takes too long  (Mikhail Balayan <mv.balayan@gmail.com>)
Responses Re: Automatic aggressive vacuum on almost frozen table takes too long
List pgsql-general
Probably the result of vacuum freeze manual execution will give some more information:
Table size:
appdbname3=# \dt+ appschemaname.applications
                              List of relations
    Schema     |     Name     | Type  |     Owner     |   Size   | Description
---------------+--------------+-------+---------------+----------+-------------
 appschemaname | applications | table | appschemaname | 10012 MB |

Each index size:
  tablename   |                         indexname                          | index_size | unique |
--------------+------------------------------------------------------------+------------+--------+
 applications | chk_invariant_1_apps2                                      | 8192 bytes | Y      |
 applications | applications2_pkey                                         | 1645 MB    | Y      |
 applications | idx_applications2_agent_id                                 | 2982 MB    | N      |
 applications | idx_applications2_context_id                               | 2708 MB    | N      |
 applications | idx_applications2_context_tenant_id                        | 1560 MB    | N      |
 applications | idx_applications2_context_type                             | 4963 MB    | N      |
 applications | idx_applications2_deleted_at                               | 3264 MB    | N      |
 applications | idx_applications2_enabled                                  | 4975 MB    | N      |
 applications | idx_applications2_euc_index                                | 1684 MB    | N      |
 applications | idx_applications2_policy_id                                | 3131 MB    | N      |
 applications | idx_applications2_policy_type                              | 16 GB      | N      |
 applications | idx_applications2_root_policy_id                           | 1574 MB    | N      |
 applications | idx_applications2_status                                   | 5391 MB    | N      |
 applications | idx_applications2_status_by_policy_fields_context_tenant_i | 13 MB      | N      |
 applications | idx_applications2_status_by_policy_fields_tenant_id        | 15 MB      | N      |
 applications | idx_applications2_tenant_id                                | 1670 MB    | N      |
 applications | idx_applications_created_at                                | 2855 MB    | N      |
 applications | tmp_idx_applications2_updated_at2                          | 3505 MB    | N      |
 applications | uq_policy_id_context2                                      | 1398 MB    | Y      |

And vacuum freeze output:
appdbname3=# VACUUM (FREEZE, VERBOSE) appschemaname.applications;
INFO:  aggressively vacuuming "appschemaname.applications"
INFO:  scanned index "applications2_pkey" to remove 10488 row versions
DETAIL:  CPU: user: 2.45 s, system: 1.07 s, elapsed: 5.16 s
INFO:  scanned index "tmp_idx_applications2_updated_at2" to remove 10488 row versions
DETAIL:  CPU: user: 2.53 s, system: 2.86 s, elapsed: 11.56 s
INFO:  scanned index "idx_applications2_agent_id" to remove 10488 row versions
DETAIL:  CPU: user: 3.16 s, system: 2.36 s, elapsed: 11.04 s
INFO:  scanned index "idx_applications2_context_id" to remove 10488 row versions
DETAIL:  CPU: user: 2.22 s, system: 1.95 s, elapsed: 8.06 s
INFO:  scanned index "idx_applications2_context_tenant_id" to remove 10488 row versions
DETAIL:  CPU: user: 1.86 s, system: 1.16 s, elapsed: 5.10 s
INFO:  scanned index "idx_applications2_context_type" to remove 10488 row versions
DETAIL:  CPU: user: 190.09 s, system: 6.48 s, elapsed: 220.88 s
INFO:  scanned index "idx_applications2_deleted_at" to remove 10488 row versions
DETAIL:  CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s
INFO:  scanned index "idx_applications2_enabled" to remove 10488 row versions
DETAIL:  CPU: user: 78.22 s, system: 4.74 s, elapsed: 96.96 s
INFO:  scanned index "idx_applications2_policy_id" to remove 10488 row versions
DETAIL:  CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s
INFO:  scanned index "idx_applications2_policy_type" to remove 10488 row versions
DETAIL:  CPU: user: 300.45 s, system: 16.05 s, elapsed: 365.80 s
INFO:  scanned index "idx_applications2_root_policy_id" to remove 10488 row versions
DETAIL:  CPU: user: 2.13 s, system: 1.20 s, elapsed: 6.16 s
INFO:  scanned index "idx_applications2_status" to remove 10488 row versions
DETAIL:  CPU: user: 153.79 s, system: 6.57 s, elapsed: 185.76 s
INFO:  scanned index "idx_applications2_status_by_policy_fields_context_tenant_i" to remove 10488 row versions
DETAIL:  CPU: user: 0.02 s, system: 0.01 s, elapsed: 0.06 s
INFO:  scanned index "idx_applications2_status_by_policy_fields_tenant_id" to remove 10488 row versions
DETAIL:  CPU: user: 0.03 s, system: 0.00 s, elapsed: 0.05 s
INFO:  scanned index "idx_applications2_tenant_id" to remove 10488 row versions
DETAIL:  CPU: user: 2.11 s, system: 1.23 s, elapsed: 6.19 s
INFO:  scanned index "idx_applications2_euc_index" to remove 10488 row versions
DETAIL:  CPU: user: 2.05 s, system: 1.27 s, elapsed: 6.05 s
INFO:  scanned index "chk_invariant_1_apps2" to remove 10488 row versions
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO:  scanned index "uq_policy_id_context2" to remove 10488 row versions
DETAIL:  CPU: user: 1.24 s, system: 1.00 s, elapsed: 4.40 s
INFO:  scanned index "idx_applications_created_at" to remove 10488 row versions
DETAIL:  CPU: user: 2.32 s, system: 1.97 s, elapsed: 8.46 s
INFO:  "applications": removed 10488 row versions in 7580 pages
DETAIL:  CPU: user: 0.86 s, system: 0.08 s, elapsed: 1.03 s
INFO:  index "applications2_pkey" now contains 17803283 row versions in 210561 pages
DETAIL:  0 index row versions were removed.
59 index pages have been deleted, 59 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "tmp_idx_applications2_updated_at2" now contains 17803283 row versions in 448580 pages
DETAIL:  0 index row versions were removed.
19542 index pages have been deleted, 19542 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_agent_id" now contains 17803283 row versions in 381741 pages
DETAIL:  0 index row versions were removed.
61237 index pages have been deleted, 28239 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_context_id" now contains 17803283 row versions in 346594 pages
DETAIL:  0 index row versions were removed.
57308 index pages have been deleted, 24105 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_context_tenant_id" now contains 17803283 row versions in 199743 pages
DETAIL:  0 index row versions were removed.
32830 index pages have been deleted, 14767 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_context_type" now contains 17803283 row versions in 635317 pages
DETAIL:  0 index row versions were removed.
43331 index pages have been deleted, 41233 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_deleted_at" now contains 17803283 row versions in 417770 pages
DETAIL:  0 index row versions were removed.
15695 index pages have been deleted, 15381 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_enabled" now contains 17803283 row versions in 636750 pages
DETAIL:  0 index row versions were removed.
26332 index pages have been deleted, 26037 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_policy_id" now contains 17803283 row versions in 400718 pages
DETAIL:  0 index row versions were removed.
55007 index pages have been deleted, 29671 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_policy_type" now contains 17803283 row versions in 2084261 pages
DETAIL:  7990 index row versions were removed.
56012 index pages have been deleted, 55365 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_root_policy_id" now contains 5015730 row versions in 201441 pages
DETAIL:  5807 index row versions were removed.
22964 index pages have been deleted, 17915 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_status" now contains 17803283 row versions in 690063 pages
DETAIL:  10488 index row versions were removed.
31401 index pages have been deleted, 31113 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_status_by_policy_fields_context_tenant_i" now contains 155059 row versions in 1707 pages
DETAIL:  223 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_status_by_policy_fields_tenant_id" now contains 155059 row versions in 1883 pages
DETAIL:  223 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_tenant_id" now contains 17803283 row versions in 213798 pages
DETAIL:  10488 index row versions were removed.
33323 index pages have been deleted, 16730 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications2_euc_index" now contains 17803283 row versions in 215570 pages
DETAIL:  10488 index row versions were removed.
32261 index pages have been deleted, 16643 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "chk_invariant_1_apps2" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "uq_policy_id_context2" now contains 5015730 row versions in 178953 pages
DETAIL:  5807 index row versions were removed.
23793 index pages have been deleted, 13898 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "idx_applications_created_at" now contains 17803283 row versions in 365479 pages
DETAIL:  10488 index row versions were removed.
19020 index pages have been deleted, 18938 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "applications": found 0 removable, 471321 nonremovable row versions in 30772 out of 1278613 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 4177172255
There were 1490846 unused item pointers.
Skipped 0 pages due to buffer pins, 1247841 frozen pages.
0 pages are entirely empty.
CPU: user: 816.65 s, system: 55.37 s, elapsed: 1031.96 s.
INFO:  aggressively vacuuming "pg_toast.pg_toast_35599271"
INFO:  "pg_toast_35599271": found 0 removable, 2 nonremovable row versions in 1 out of 2328 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 4178183844
There were 2 unused item pointers.
Skipped 0 pages due to buffer pins, 2327 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

What catches my eye: scanning indexes smaller than 3.1GB is fast, larger ones are slow. For example:
idx_applications2_policy_id is 3131 MB took just 5 seconds (DETAIL:  CPU: user: 2.99 s, system: 1.65 s, elapsed: 5.32 s)
but idx_applications2_deleted_at with 3264 MB took 1 minute 22 seconds (DETAIL:  CPU: user: 67.93 s, system: 3.41 s, elapsed: 82.75 s)

In addition to this, I collected statistics during the idx_applications2_enabled scan:
# strace -c -p 26792
strace: Process 26792 attached
^Cstrace: Process 26792 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 90.87    2.931694           4    711882           read
  4.61    0.148573           3     55598     19503 futex
  3.01    0.097138           1     70933           lseek
  0.87    0.027926           7      4261           write
  0.44    0.014348          23       623           fdatasync
  0.20    0.006372           5      1245           kill
  0.00    0.000081           5        17           open
  0.00    0.000042           4        12           select
  0.00    0.000038           3        12           close
  0.00    0.000000           0         1           rt_sigreturn
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         2           getrusage
------ ----------- ----------- --------- --------- ----------------

--
Mikhael

On Mon, 20 Feb 2023 at 13:29, Mikhail Balayan <mv.balayan@gmail.com> wrote:
>> Can you run amcheck's bt_index_check() routine against some of the
>> indexes you've shown? There is perhaps some chance that index
>> corruption exists and causes VACUUM to take a very long time to delete
>> index pages. This is pretty much a wild guess, though.

Unfortunately I can't, we haven't enabled this extension yet. And since this is a production, I'm not ready to turn it on right away. But I can say that this theory is unlikely, since this problem occurs on different sites. Here's an example of an output from another site where table size is 46.5 GB (again, data only), indexes 107GB, toast 62MB:
automatic aggressive vacuum of table "appdbname2.appschemaname.applications": index scans: 1
pages: 0 removed, 6091646 remain, 0 skipped due to pins, 6086395 skipped frozen
tuples: 2344 removed, 35295654 remain, 0 are dead but not yet removable, oldest xmin: 213412878
buffer usage: 251980554 hits, 14462331 misses, 18844 dirtied
avg read rate: 12.018 MB/s, avg write rate: 0.016 MB/s
system usage: CPU: user: 7734.43 s, system: 178.98 s, elapsed: 9401.36 s

Here again we see that there are 5251 blocks that need to be cleaned (6091646 - 6086395), buffer usage is 266461729 blocks or ~ 2 TB and processing time is 2.5h+.

>> It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here
If such a scenario is possible, it makes sense to add information about the blocking waiting time to the output. Something like:
     system usage: CPU: user: 7734.43 s, system: 178.98 s, lock_wait: 1234.56 s, elapsed: 9401.36 s


Mikhail

On Sat, 18 Feb 2023 at 05:35, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Feb 16, 2023 at 5:40 PM Mikhail Balayan <mv.balayan@gmail.com> wrote:
>> >> Do you have any non-btree indexes on the table? Can you show us the details of the
>> >> table, including all of its indexes? In other words, can you show "\d applications" output from psql?
>
> Only btree indexes. Please find the full table schema below:

It's possible that VACUUM had to wait a long time for a cleanup lock
on one individual heap page here, which could have added a long delay.
But...that doesn't seem particularly likely.

Can you run amcheck's bt_index_check() routine against some of the
indexes you've shown? There is perhaps some chance that index
corruption exists and causes VACUUM to take a very long time to delete
index pages. This is pretty much a wild guess, though.

--
Peter Geoghegan

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: pg_dump'ed file contains "DROP DATABASE"
Next
From: Peter Geoghegan
Date:
Subject: Re: Automatic aggressive vacuum on almost frozen table takes too long