Re: Automatic aggressive vacuum on almost frozen table takes too long - Mailing list pgsql-general

From Mikhail Balayan
Subject Re: Automatic aggressive vacuum on almost frozen table takes too long
Date
Msg-id CAC2oM1aTNHFK1r5sGGOjkU7y3xRutk30H+om1h2hcd77RAMt+g@mail.gmail.com
Whole thread Raw
In response to Re: Automatic aggressive vacuum on almost frozen table takes too long  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-general
>> Even still,
>> the information about buffers that you've shown does indeed appear to
>> be total nonsense (while everything else we can see looks plausible).
Actually not only buffers. The elapsed time also looks strange. And this is the first reason why I paid attention to this situation. 33029 seconds, which is 9+ hours to scan 4898 blocks and related indexes. And by this metric, it seems that the amount of buffers processed is reliable (which does not negate the presence of the bug). Otherwise I can not explain such a long vacuuming of the table.

>> Do you have any non-btree indexes on the table? Can you show us the details of the
>> table, including all of its indexes? In other words, can you show "\d applications" output from psql?
Only btree indexes. Please find the full table schema below:
appdbname=> \d applications
                           Table "appschemaname.applications"
             Column             |          Type          | Collation | Nullable | Default
--------------------------------+------------------------+-----------+----------+----------
 id                             | character varying(36)  |           | not null |
 created_at                     | bigint                 |           | not null |
 updated_at                     | bigint                 |           | not null |
 deleted_at                     | bigint                 |           |          |
 tenant_id                      | character varying(36)  |           |          |
 context_tenant_id              | character varying(36)  |           |          |
 policy_id                      | character varying(36)  |           |          |
 policy_type                    | character varying(128) |           |          |
 policy_name                    | text                   |           |          |
 root_policy_id                 | character varying(36)  |           |          |
 context_id                     | character varying(36)  |           |          |
 context_type                   | character varying(128) |           |          |
 agent_id                       | character varying(36)  |           |          |
 multiple_applications_conflict | text                   |           |          |
 validation_issues              | text                   |           |          |
 deployment_state               | character varying(128) |           |          |
 deployment_errors              | text                   |           |          |
 enabled                        | boolean                |           |          |
 has_conflict_alert             | boolean                |           |          |
 running_state                  | character varying(128) |           |          |
 running_activities             | text                   |           |          |
 running_progress               | character varying(128) |           |          |
 running_blockers               | text                   |           |          |
 last_success_time              | bigint                 |           |          |
 last_run_time                  | bigint                 |           |          |
 last_activity                  | character varying(128) |           |          |
 next_run_time                  | bigint                 |           |          |
 next_activity                  | character varying(128) |           |          |
 most_severe_status_event       | text                   |           |          |
 status                         | character varying(128) |           |          |
 origin_context                 | text                   |           | not null | ''::text
 euc                            | character varying(36)  |           |          |
 event_flag                     | integer                |           |          | 1
Indexes:
    "applications2_pkey" PRIMARY KEY, btree (id)
    "chk_invariant_1_apps2" UNIQUE, btree (policy_type, context_id) WHERE multiple_applications_conflict <> ''::text AND enabled = true AND root_policy_id::text <> ''::text AND deleted_at IS NULL
    "uq_policy_id_context2" UNIQUE, btree (policy_id, context_id) WHERE deleted_at IS NULL
    "idx_applications2_agent_id" btree (agent_id)
    "idx_applications2_context_id" btree (context_id)
    "idx_applications2_context_tenant_id" btree (context_tenant_id)
    "idx_applications2_context_type" btree (context_type)
    "idx_applications2_deleted_at" btree (deleted_at)
    "idx_applications2_enabled" btree (enabled)
    "idx_applications2_euc_index" btree (euc)
    "idx_applications2_policy_id" btree (policy_id)
    "idx_applications2_policy_type" btree (policy_type)
    "idx_applications2_root_policy_id" btree (root_policy_id) WHERE deleted_at IS NULL
    "idx_applications2_status" btree (status)
    "idx_applications2_status_by_policy_fields_context_tenant_i" btree (context_tenant_id, root_policy_id, policy_type, context_type, deleted_at) WHERE deleted_at IS NULL AND root_policy_id::text = ''::text
    "idx_applications2_status_by_policy_fields_tenant_id" btree (tenant_id, root_policy_id, policy_type, context_type, deleted_at) WHERE deleted_at IS NULL AND root_policy_id::text = ''::text
    "idx_applications2_tenant_id" btree (tenant_id)
    "idx_applications_created_at" btree (created_at)
    "tmp_idx_applications2_updated_at2" btree (updated_at)


Thank you!

Mikhael


On Fri, 17 Feb 2023 at 03:42, Peter Geoghegan <pg@bowt.ie> wrote:
On Thu, Feb 16, 2023 at 7:44 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
> > That is, if I understand it correctly, it says that there were (and
> > actually are) 2013128 pages of which 2008230 were skipped, which leaves
> > 4898 blocks to be scanned. I.e. it seems that the allocated 1GB
> > (autovacuum_work_mem) should be enough to handle that amount of blocks
> > and to avoid multiple scans of the indexes.

It's quite clear that there is only one pass over the indexes, since
"index scans: 1" says exactly that.

> > But, based on buffer usage, one can see that a huge amount of data is
> > read, greatly exceeding not only the number of remaining unfrozen
> > blocks, but also the size of the table and indexes taken together: 2
> > billion blocks, more than 15TB.
> >
> > Is this a bug in Postgresql or am I interpreting the log data wrong?

I think that it might be a bug in Postgres.

I addressed a similar issue in the same "buffer" instrumentation in
commit d3609dd2, but that wasn't backpatched because I imagined that
it only applied to the new VACUUM VERBOSE case (VACUUM VERBOSE only
started sharing the same instrumentation code as log_autovacuum in
Postgres 15). It's not immediately obvious how you could see a problem
like the one you've shown in the autovacuum log output. Even still,
the information about buffers that you've shown does indeed appear to
be total nonsense (while everything else we can see looks plausible).
There has to be some explanation for that.

The only other explanation I can think of is a pathological case where
an index scan by some particular ambulkdelete routine scans a number
of buffers that vastly exceeds the total size of the index. That does
seem just about possible with an access method like GIN. Do you have
any non-btree indexes on the table? Can you show us the details of the
table, including all of its indexes? In other words, can you show "\d
applications" output from psql?

--
Peter Geoghegan

pgsql-general by date:

Previous
From: Mikhail Balayan
Date:
Subject: Re: Automatic aggressive vacuum on almost frozen table takes too long
Next
From: Sebastien Flaesch
Date:
Subject: PostgreSQL configuration in a VM