Thread: Vacuum Questions

Vacuum Questions

From

Leo

Date:

02 May, 17:23:07

I have been working on AWS PostgreSQL RDS for a few years, but still not very experienced when it comes to performance issues. Plus RDS is slightly different from the pure PostgreSQL.

I am trying to comprehend exactly how vacuum works.

Here is what I am trying to do.

I am purging old records from a table (500 million rows, but I am doing it in sets of 50,000,000 with a smaller loop of 100,000). That works just fine.

Because of the amount of data/rows deleted, I disabled the autovacuum for this table (I want to have control over vacuum, autovacuum does not complete anyway due to the timeout, sizing, etc settings that I do not want to change system wide). I will put the autovacuum back once I am done of course.

The issue is when I start vacuuming. This table has 4 indexes and a PK that I worry about. The PK takes about 30 minutes to vacuum and two of the indexes take about an hour each. The problem comes in for the other 2 indexes - they take 12+ hours each to vacuum:

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: vacuuming "public.pc_workflowlog"

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: launched 4 parallel vacuum workers for index vacuuming (planned: 4)

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "pc_workflowlog_pk" to remove 50000000 row versions

DETAIL: CPU: user: 191.03 s, system: 12.43 s, elapsed: 1711.22 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_u_publicid_g6uqp9lkn6e8" to remove 50000000 row versions

DETAIL: CPU: user: 325.75 s, system: 19.75 s, elapsed: 2674.24 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_n_workflow_2tc9k2hdtry9v" to remove 50000000 row versions

DETAIL: CPU: user: 312.17 s, system: 16.94 s, elapsed: 3097.88 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_n_frozenseti_2kjkbjgf3c6ro" to remove 50000000 row versions

DETAIL: CPU: user: 41187.70 s, system: 216.14 s, elapsed: 42749.36 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_n_userid_14kqw6qdsnndw" to remove 50000000 row versions

DETAIL: CPU: user: 41280.66 s, system: 216.67 s, elapsed: 42832.16 s

psql:/home/backup/leo/fws_vacuum.sql:6: INFO: "pc_workflowlog": removed 50000000 row versions in 1129870 pages

I've increased max_parallel_maintenance_workers to 8 for the session and it used parallel 4 (one for each index I assume) to handle it and the two indexes were done in ~ an hour. What I am trying to figure out is how to force the other two large indexes to be vacuumed in parallel - a few workers going against an index. It seems it is possible to do, the index size is large enough to kick in, but I have not been able to figure it out yet. Most of the parameters are at default values.

What am I missing?

I have a few other questions. Does vacuum time depend on the number of dead rows only and the size of the table, or does the entire storage allocation (including dead tuples) also affect it?

Would it be more beneficial to drop the two large indexes, purge, vacuum, and recreate the indexes after make more sense (I know it needs to be tested)? The reason I am doing it in stages is to make sure I have enough time to vacuum, but maybe it would not take much longer to vacuum after the complete purge?

Lastly, is it better to delete all the rows (500 mil) instead of doing it in smaller batches, and vacuum only once?

The current size of the table is about 1T and the indexes add another 1.5T to it.

Truncate is not an option as I am only deleting rows older than 6 months. Client was not doing purging for years, but will do it after the clean up.

P.S. This is my very first post here, please advise if it is the wrong channel. Thank you in advance.

Re: Vacuum Questions

From

Leo

Date:

02 May, 19:04:01

Also, is there a way to estimate the vacuum execution? Something like explain plan - without actually vacuuming, just to see how it will perform it - like a degree of parallelism?

On Fri, May 2, 2025 at 10:23 AM Leo <leo1969@gmail.com> wrote:

I have been working on AWS PostgreSQL RDS for a few years, but still not very experienced when it comes to performance issues. Plus RDS is slightly different from the pure PostgreSQL.

I am trying to comprehend exactly how vacuum works.

Here is what I am trying to do.

I am purging old records from a table (500 million rows, but I am doing it in sets of 50,000,000 with a smaller loop of 100,000). That works just fine.

Because of the amount of data/rows deleted, I disabled the autovacuum for this table (I want to have control over vacuum, autovacuum does not complete anyway due to the timeout, sizing, etc settings that I do not want to change system wide). I will put the autovacuum back once I am done of course.

The issue is when I start vacuuming. This table has 4 indexes and a PK that I worry about. The PK takes about 30 minutes to vacuum and two of the indexes take about an hour each. The problem comes in for the other 2 indexes - they take 12+ hours each to vacuum:

psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  vacuuming "public.pc_workflowlog"
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  launched 4 parallel vacuum workers for index vacuuming (planned: 4)
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index "pc_workflowlog_pk" to remove 50000000 row versions
DETAIL:  CPU: user: 191.03 s, system: 12.43 s, elapsed: 1711.22 s
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index "workflowlo_u_publicid_g6uqp9lkn6e8" to remove 50000000 row versions
DETAIL:  CPU: user: 325.75 s, system: 19.75 s, elapsed: 2674.24 s
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index "workflowlo_n_workflow_2tc9k2hdtry9v" to remove 50000000 row versions
DETAIL:  CPU: user: 312.17 s, system: 16.94 s, elapsed: 3097.88 s
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index "workflowlo_n_frozenseti_2kjkbjgf3c6ro" to remove 50000000 row versions
DETAIL:  CPU: user: 41187.70 s, system: 216.14 s, elapsed: 42749.36 s
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  scanned index "workflowlo_n_userid_14kqw6qdsnndw" to remove 50000000 row versions
DETAIL:  CPU: user: 41280.66 s, system: 216.67 s, elapsed: 42832.16 s
psql:/home/backup/leo/fws_vacuum.sql:6: INFO:  "pc_workflowlog": removed 50000000 row versions in 1129870 pages

I've increased max_parallel_maintenance_workers to 8 for the session and it used parallel 4 (one for each index I assume) to handle it and the two indexes were done in ~ an hour. What I am trying to figure out is how to force the other two large indexes to be vacuumed in parallel - a few workers going against an index. It seems it is possible to do, the index size is large enough to kick in, but I have not been able to figure it out yet. Most of the parameters are at default values.

What am I missing?

I have a few other questions. Does vacuum time depend on the number of dead rows only and the size of the table, or does the entire storage allocation (including dead tuples) also affect it?

Would it be more beneficial to drop the two large indexes, purge, vacuum, and recreate the indexes after make more sense (I know it needs to be tested)? The reason I am doing it in stages is to make sure I have enough time to vacuum, but maybe it would not take much longer to vacuum after the complete purge?

Lastly, is it better to delete all the rows (500 mil) instead of doing it in smaller batches, and vacuum only once?

The current size of the table is about 1T and the indexes add another 1.5T to it.

Truncate is not an option as I am only deleting rows older than 6 months. Client was not doing purging for years, but will do it after the clean up.

P.S. This is my very first post here, please advise if it is the wrong channel. Thank you in advance.

Re: Vacuum Questions

From

John Naylor

Date:

03 May, 02:49:37

On Fri, May 2, 2025 at 9:23 PM Leo <leo1969@gmail.com> wrote:

> I am purging old records from a table (500 million rows, but I am doing it in sets of  50,000,000 with a smaller loop
of100,000).  That works just fine. 
>
> Because of the amount of data/rows deleted, I disabled the autovacuum for this table (I want to have control over
vacuum,autovacuum does not complete anyway due to the timeout, sizing, etc settings that I do not want to change system
wide). I will put the autovacuum back once I am done of course. 
>
> The issue is when I start vacuuming.  This table has 4 indexes and a PK that I worry about.  The PK takes about 30
minutesto vacuum and two of the indexes take about an hour each.  The problem comes in for the other 2 indexes - they
take12+ hours each to vacuum: 

Do you know offhand anything special about these two? Do they have a
random key like UUID or a hash calculation? That can make index
vacuuming slower, but 12x still seems abnormal to me.

> I've increased max_parallel_maintenance_workers to 8 for the session and it used parallel 4 (one for each index I
assume)to handle it and the two indexes were done in ~ an hour. 

Right, one worker scans one index, simultaneously with other workers.
Here the leader process launched 4 workers and also vacuumed one index
itself.

> What I am trying to figure out is how to force the other two large indexes to be vacuumed in parallel - a few workers
goingagainst an index.  It seems it is possible to do, the index size is large enough to kick in, but I have not been
ableto figure it out yet.  Most of the parameters are at default values. 

It is not possible to run multiple workers on a single index.

> I have a few other questions.  Does vacuum time depend on the number of dead rows only and the size of the table, or
doesthe entire storage allocation (including dead tuples) also affect it? 

The vacuum time depends largely on
1) The number of heap (=table) and index pages that are written to by
vacuum, since that correlates with WAL volume -- the DELETE query will
affect how the partial deletes are spread across the table. You only
want to delete records once for each page.
2) The number and size of the indexes, since they must be scanned in
their entirety
3) The storage allocation -- What version is this and what is
maintenance_work_mem set to? This affects how many times each index
must be vacuumed.

> Would it be more beneficial to drop the two large indexes, purge, vacuum, and recreate the indexes after make more
sense(I know it needs to be tested)?  The reason I am doing it in stages is to make sure I have enough time to vacuum,
butmaybe it would not take much longer to vacuum after the complete purge? 

Could you explain your operational constraints? In my experience, WAL
volume has been a more pressing concern for clients in these cases
than elapsed time, but your needs may vary. One possible concern is
that the indexes that take a long time to vacuum may also take a long
time to recreate. Aside from that, dropping and recreating indexes may
be good way to speed this up, depending on how critical they are for
queries.

> Lastly, is it better to delete all the rows (500 mil) instead of doing it in smaller batches, and vacuum only once?

On PG16 and earlier, this would still require each index to be
vacuumed 3 times due to memory allocation constraints, so would be
similar to 3 batches. The drop/recreate strategy would still help in
that case.

> The current size of the table is about 1T and the indexes add another 1.5T to it.
>
> Truncate is not an option as I am only deleting rows older than 6 months.  Client was not doing purging for years,
butwill do it after the clean up. 

What percentage of the table does the 500 million deletes represent?
If you're deleting the vast majority of the records and have the extra
disk space to spare, you might consider VACUUM FULL after completing
the deletes -- that would rewrite the table and recreate all indexes.
That's not generally recommended, since that locks the table for the
entire duration and has other disadvantages, but it is an option.

For your follow-up question:

> Also, is there a way to estimate the vacuum execution?  Something like explain plan - without actually vacuuming,
justto see how it will perform it - like a degree of parallelism? 

There is no way to ask the system to estimate the runtime or other
resource usage.

Also for future reference, please note that we discourage top-posting
(quoting an entire message in a reply).

--
John Naylor
Amazon Web Services

RE: Vacuum Questions

From

Date:

03 May, 22:46:38

It is not your question but for such situations, you should consider using partitioning.

And more closely to your question: I would not disable autovacuum but it must not work with default values.

Best regards

Michel SALAIS

De : Leo <leo1969@gmail.com>
Envoyé : vendredi 2 mai 2025 16:23
À : pgsql-performance@lists.postgresql.org
Objet : Vacuum Questions