Thread: Introduce some randomness to autovacuum
Hi hackers, After watching Robert's talk[1] on autovacuum and participating in the related workshop yesterday, it appears that people are inclined to use prioritization to address the issues highlighted in Robert's presentation. Here I list two of the failure modes that were discussed. - Spinning. Running repeatedly on the same table but not accomplishing anything useful. - Starvation. autovacuum can't vacuum everything that needs vacuuming. - ... The prioritization way needs some basic stuff that postgres doesn't have now. I had a random thought that introducing some randomness might help mitigate some of the issues mentioned above. Before performing vacuum on the collected tables, we could rotate the table_oids list by a random number within the range [0, list_length(table_oids)]. This way, every table would have an equal chance of being vacuumed first, thus no spinning and starvation. Even if there is a broken table that repeatedly gets stuck, this random approach would still provide opportunities for other tables to be vacuumed. Eventually, the system would converge. The change is something like the following, I haven't tested the code, just posted it here for discussion, let me know your thoughts. diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 16756152b71..6dddd273d22 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -79,6 +79,7 @@ #include "catalog/pg_namespace.h" #include "commands/dbcommands.h" #include "commands/vacuum.h" +#include "common/pg_prng.h" #include "common/int.h" #include "lib/ilist.h" #include "libpq/pqsignal.h" @@ -2267,6 +2268,25 @@ do_autovacuum(void) "Autovacuum Portal", ALLOCSET_DEFAULT_SIZES); + /* + * Randomly rotate the list of tables to vacuum. This is to avoid + * always vacuuming the same table first, which could lead to spinning + * on the same table or vacuuming starvation. + */ + if (list_length(table_oids) > 2) + { + int rand = 0; + static pg_prng_state prng_state; + List *tmp_oids = NIL; + + pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL))); + rand = (int) pg_prng_uint64_range(&prng_state, 0, list_length(table_oids) - 1); + if (rand != 0) { + tmp_oids = list_copy_tail(table_oids, rand); + table_oids = list_copy_head(table_oids, list_length(table_oids) - rand); + table_oids = list_concat(table_oids, tmp_oids); + } + } /* * Perform operations on collected tables. */ [1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing That? https://www.youtube.com/watch?v=RfTD-Twpvac -- Regards Junwang Zhao
Hi,I like your idea,It would be even better if the weights could be taken according to the larger tables
On Fri, 25 Apr 2025 at 22:03, Junwang Zhao <zhjwpku@gmail.com> wrote:
Hi hackers,
After watching Robert's talk[1] on autovacuum and participating in the related
workshop yesterday, it appears that people are inclined to use prioritization
to address the issues highlighted in Robert's presentation. Here I list two
of the failure modes that were discussed.
- Spinning. Running repeatedly on the same table but not accomplishing
anything useful.
- Starvation. autovacuum can't vacuum everything that needs vacuuming.
- ...
The prioritization way needs some basic stuff that postgres doesn't have now.
I had a random thought that introducing some randomness might help
mitigate some of the issues mentioned above. Before performing vacuum
on the collected tables, we could rotate the table_oids list by a random
number within the range [0, list_length(table_oids)]. This way, every table
would have an equal chance of being vacuumed first, thus no spinning and
starvation.
Even if there is a broken table that repeatedly gets stuck, this random
approach would still provide opportunities for other tables to be vacuumed.
Eventually, the system would converge.
The change is something like the following, I haven't tested the code,
just posted it here for discussion, let me know your thoughts.
diff --git a/src/backend/postmaster/autovacuum.c
b/src/backend/postmaster/autovacuum.c
index 16756152b71..6dddd273d22 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -79,6 +79,7 @@
#include "catalog/pg_namespace.h"
#include "commands/dbcommands.h"
#include "commands/vacuum.h"
+#include "common/pg_prng.h"
#include "common/int.h"
#include "lib/ilist.h"
#include "libpq/pqsignal.h"
@@ -2267,6 +2268,25 @@ do_autovacuum(void)
"Autovacuum Portal",
ALLOCSET_DEFAULT_SIZES);
+ /*
+ * Randomly rotate the list of tables to vacuum. This is to avoid
+ * always vacuuming the same table first, which could lead to spinning
+ * on the same table or vacuuming starvation.
+ */
+ if (list_length(table_oids) > 2)
+ {
+ int rand = 0;
+ static pg_prng_state prng_state;
+ List *tmp_oids = NIL;
+
+ pg_prng_seed(&prng_state, (uint64) (getpid() ^ time(NULL)));
+ rand = (int) pg_prng_uint64_range(&prng_state, 0,
list_length(table_oids) - 1);
+ if (rand != 0) {
+ tmp_oids = list_copy_tail(table_oids, rand);
+ table_oids = list_copy_head(table_oids,
list_length(table_oids) - rand);
+ table_oids = list_concat(table_oids, tmp_oids);
+ }
+ }
/*
* Perform operations on collected tables.
*/
[1] How Autovacuum Goes Wrong: And Can We Please Make It Stop Doing
That? https://www.youtube.com/watch?v=RfTD-Twpvac
--
Regards
Junwang Zhao
Hi!
I agree it is a good idea to shift the table list. Although vacuuming larger tables first
is a questionable approach because smaller ones could wait a long time to be vacuumed.
It looks like the most obvious and simple way is that the first table to be vacuumed
should not be the first one from the previous iteration.
On Fri, Apr 25, 2025 at 6:04 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
Hi,I like your idea,It would be even better if the weights could be taken according to the larger tables
Hi Nikita, wenhui, On Fri, Apr 25, 2025 at 11:16 PM Nikita Malakhov <hukutoc@gmail.com> wrote: > > Hi! > > I agree it is a good idea to shift the table list. Although vacuuming larger tables first > is a questionable approach because smaller ones could wait a long time to be vacuumed. > It looks like the most obvious and simple way is that the first table to be vacuumed > should not be the first one from the previous iteration. > > On Fri, Apr 25, 2025 at 6:04 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote: >> >> Hi,I like your idea,It would be even better if the weights could be taken according to the larger tables >> > > -- > Regards, > Nikita Malakhov > Postgres Professional > The Russian Postgres Company > https://postgrespro.ru/ Thanks for your feedback. I ended up with adding a guc configuration that may support different vacuum strategies. I name it as `autovacuum_vacuum_strategy` but you might have a better one. For now it support only two strategies: 1. Sequential: Tables are vacuumed in the order they are collected. 2. Random: The list of tables is rotated around a randomly chosen pivot before vacuuming to avoid always starting with the same table, which prevents vacuuming starvation for some tables. We can extend this strategy like prioritization and whatever algorithms in the future. -- Regards Junwang Zhao
Attachment
On Fri, Apr 25, 2025 at 10:02:49PM +0800, Junwang Zhao wrote: > After watching Robert's talk[1] on autovacuum and participating in the related > workshop yesterday, it appears that people are inclined to use prioritization > to address the issues highlighted in Robert's presentation. Here I list two > of the failure modes that were discussed. > > - Spinning. Running repeatedly on the same table but not accomplishing > anything useful. > - Starvation. autovacuum can't vacuum everything that needs vacuuming. > - ... > > The prioritization way needs some basic stuff that postgres doesn't have now. > > I had a random thought that introducing some randomness might help > mitigate some of the issues mentioned above. Before performing vacuum > on the collected tables, we could rotate the table_oids list by a random > number within the range [0, list_length(table_oids)]. This way, every table > would have an equal chance of being vacuumed first, thus no spinning and > starvation. > > Even if there is a broken table that repeatedly gets stuck, this random > approach would still provide opportunities for other tables to be vacuumed. > Eventually, the system would converge. First off, thank you for thinking about this problem and for sharing your thoughts. Adding randomness to solve this is a creative idea. That being said, I am -1 for this proposal. Autovacuum parameters and scheduling are already quite complicated, and making it nondeterministic would add an additional layer of complexity (and may introduce its own problems). But more importantly, IMHO it masks the problems instead of solving them more directly, and it could mask future problems, too. It'd probably behoove us to think about the known problems more deeply and to craft more targeted solutions. -- nathan
> - Spinning. Running repeatedly on the same table but not accomplishing > anything useful. > But more importantly, IMHO it masks the problems instead of > solving them more directly, and it could mask future problems, too To add more to Nathan's comment about masking future problems, this will not solve the "spinning" problem because if the most common reason for this is a long-running transaction, etc., all your tables will eventually end up with wasted vacuum cycles because the xmin horizon is not advancing. -- Sami Imseih
On Wed, Apr 30, 2025 at 10:07 AM Junwang Zhao <zhjwpku@gmail.com> wrote:
I ended up with adding a guc configuration that may support different vacuum
strategies.
+1 to this: it's a good solution to a tricky problem. I would be a -1 if this were not a GUC.
Yes, it is masking the problem, but maybe a better way to think about it is that it is delaying the performance impact, allowing more time for a manual intervention of the problematic table(s).
Cheers,
Greg
--
Crunchy Data - https://www.crunchydata.com
Enterprise Postgres Software Products & Tech Support
> Yes, it is masking the problem, but maybe a better way to think about it is that it is delaying the > performance impact, allowing more time for a manual intervention of the problematic table(s). I question how the user will gauge the success of setting the strategy to "random"? They may make it random by default, but fall into the same issues and revert it back to the default strategy. But also, the key as you mention is "manual intervention" which requires proper monitoring. I will argue that for the two cases that this proposal is seeking to correct, we already have good solutions that could be implemented by a user. Let's take the "spinning" case again. If a table has some sort of problem causing vacuum to error out, one can just disable autovacuum on a per-table level and correct the issue. Also, the xmin horizon being held back ( which is claimed to be the most common cause, and I agree with that ), well that one is just going to cause all your autovacuums to become useless. Also, I do think the starvation problem has a good answer now that autovacuum_max_workers can be modified online. Maybe something can be done for autovacuum to auto-tune this setting to give more workers at times when it's needed. Not sure what that looks like, but it is more possible now that this setting does not require a restart. -- Sami Imseih Amazon Web Services (AWS)
On Wed, Apr 30, 2025 at 1:56 PM Sami Imseih <samimseih@gmail.com> wrote: > > > Yes, it is masking the problem, but maybe a better way to think about it is that it is delaying the > > performance impact, allowing more time for a manual intervention of the problematic table(s). > > I question how the user will gauge the success of setting the strategy > to "random"? They may make > it random by default, but fall into the same issues and revert it back > to the default strategy. > > But also, the key as you mention is "manual intervention" which > requires proper monitoring. I will > argue that for the two cases that this proposal is seeking to correct, > we already have good > solutions that could be implemented by a user. > I would have a lot more faith in this discussion if there was any kind of external solution that had gained popularity as a general solution, but this doesn't seem to be the case (and trying to wedge something in the server will likely hurt that kind of research. As an example, the first fallacy of autovacuum management is the idea that a single strategy will always work. Having implemented a number of crude vacuum management systems in user space already, I know that I have run into multiple cases where I had to essentially build two different "queues" of vacuums (one for xids, one for bloat) to be fed into Postgres so as not to be blocked (in either direction) by conflicting priorities no matter how wonky things got. I can imagine a set of gucs that we could put in to try to mimic such types of behavior, but I imagine it would take quite a few rounds before we got the behavior correct. Robert Treat https://xzilla.net
On Thu, 1 May 2025 at 03:29, Nathan Bossart <nathandbossart@gmail.com> wrote: > That being said, I am -1 for this proposal. Autovacuum parameters and > scheduling are already quite complicated, and making it nondeterministic > would add an additional layer of complexity (and may introduce its own > problems). But more importantly, IMHO it masks the problems instead of > solving them more directly, and it could mask future problems, too. It'd > probably behoove us to think about the known problems more deeply and to > craft more targeted solutions. -1 from me too. It sounds like the aim is to fix the problem with autovacuum vacuuming the same table over and over and being unable to remove enough dead tuples due to something holding back the oldest xmin horizon. Why can't we just fix that by remembering the value that VacuumCutoffs.OldestXmin and only coming back to that table once that's moved forward some amount? David
Hi Sami, On Thu, May 1, 2025 at 1:56 AM Sami Imseih <samimseih@gmail.com> wrote: > > > Yes, it is masking the problem, but maybe a better way to think about it is that it is delaying the > > performance impact, allowing more time for a manual intervention of the problematic table(s). > > I question how the user will gauge the success of setting the strategy > to "random"? They may make > it random by default, but fall into the same issues and revert it back > to the default strategy. > > But also, the key as you mention is "manual intervention" which > requires proper monitoring. I will > argue that for the two cases that this proposal is seeking to correct, > we already have good > solutions that could be implemented by a user. > > Let's take the "spinning" case again. If a table has some sort of > problem causing > vacuum to error out, one can just disable autovacuum on a per-table > level and correct > the issue. Also, the xmin horizon being held back ( which is claimed > to be the most common cause, > and I agree with that ), well that one is just going to cause all your > autovacuums to become > useless. Yeah, I tend to agree with you that the xmin horizon hold back will make autovacuums to become useless for all tables. But I have a question, let me quote Andres' comment on slack first: ```quote begin It seems a bit silly to not just do some basic prioritization instead, but perhaps we just need to reach for some basic stuff, given that we seem unable to progress on prioritization. ```quote end If randomness is not working, ISTM that the prioritization will not benefit the "spinning" case too, am I right? > > Also, I do think the starvation problem has a good answer now that > autovacuum_max_workers > can be modified online. Maybe something can be done for autovacuum to > auto-tune this > setting to give more workers at times when it's needed. Not sure what > that looks like, > but it is more possible now that this setting does not require a restart. Good to know, thanks. One case I didn't mention is that some corruption due to vacuuming the same table might starve other tables two, randomness gives other tables some chances to be vacuumed. I do admit that multi vacuum workers can eliminate this issue a little bit if the corrupted table's vacuum progress lasts for some time, but I think randomness is much better. > > -- > Sami Imseih > Amazon Web Services (AWS) -- Regards Junwang Zhao
On Thu, May 1, 2025 at 8:12 AM David Rowley <dgrowleyml@gmail.com> wrote: > > On Thu, 1 May 2025 at 03:29, Nathan Bossart <nathandbossart@gmail.com> wrote: > > That being said, I am -1 for this proposal. Autovacuum parameters and > > scheduling are already quite complicated, and making it nondeterministic > > would add an additional layer of complexity (and may introduce its own > > problems). But more importantly, IMHO it masks the problems instead of > > solving them more directly, and it could mask future problems, too. It'd > > probably behoove us to think about the known problems more deeply and to > > craft more targeted solutions. > > -1 from me too. > > It sounds like the aim is to fix the problem with autovacuum vacuuming > the same table over and over and being unable to remove enough dead > tuples due to something holding back the oldest xmin horizon. Why > can't we just fix that by remembering the value that > VacuumCutoffs.OldestXmin and only coming back to that table once > that's moved forward some amount? Users expect the tables to be auto vacuumed when: *dead_tuples > vac_base_thresh + vac_scale_factor * reltuples* If we depend on xid moving forward to do autovacuum, I think there are chances some bloated tables won't be vacuumed? > > David -- Regards Junwang Zhao
On Thu, 1 May 2025 at 17:35, Junwang Zhao <zhjwpku@gmail.com> wrote: > > On Thu, May 1, 2025 at 8:12 AM David Rowley <dgrowleyml@gmail.com> wrote: > > It sounds like the aim is to fix the problem with autovacuum vacuuming > > the same table over and over and being unable to remove enough dead > > tuples due to something holding back the oldest xmin horizon. Why > > can't we just fix that by remembering the value that > > VacuumCutoffs.OldestXmin and only coming back to that table once > > that's moved forward some amount? > > Users expect the tables to be auto vacuumed when: > *dead_tuples > vac_base_thresh + vac_scale_factor * reltuples* > If we depend on xid moving forward to do autovacuum, I think > there are chances some bloated tables won't be vacuumed? Can you explain why you think that? The idea is to start vacuum other tables that perhaps can have dead tuples removed instead of repeating vacuums on the same table over and over without any chance of being able to remove any more dead tuples than we could during the last vacuum. David