Thread: Re: Disabling vacuum truncate for autovacuum
On Thu, 2025-01-23 at 22:33 -0800, Gurjeet Singh wrote: > > > I am also wondering if having an autovacuum setting to control it would be > > > a good idea for a feature. > > > > I'm all for that. > > Please see attached an initial patch to disable truncation behaviour in > autovacuum. This patch retains the default behavior of autovacuum truncating > relations. The user is allowed to change the behaviour and disable relation > truncations system-wide by setting autovacuum_disable_vacuum_truncate = true. > Better parameter names welcome :-) I hope it is possible to override the global setting with the "vacuum_truncate" option on an individual table. My suggestion for the parameter name is "autovacuum_disable_truncate". > One additional improvement I can think of is to emit a WARNING or NOTICE message > that truncate operation is being skipped, perhaps only if the truncation > would've freed up space over a certain threshold. Interesting idea, but I think it is independent from this patch. > Perhaps there's value in letting this parameter be specified at database level, > but I'm not able to think of a reason why someone would want to disable this > behaviour on just one database. So leaving the parameter context to be the same > as most other autovacuum parameters: SIGHUP. I can imagine setting that on only a certain database. Different databases typically have different applications, which have different needs. Eventually, the patch should have documentation and regression tests. Yours, Laurenz Albe
On Mon, Jan 27, 2025 at 4:55 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > My suggestion for the parameter name is "autovacuum_disable_truncate". Then it would have a different name than the reloption, and the opposite sense. In most cases, we've been able to keep those matching -- autovacuum vs. autovacuum_enabled being, I believe, the only current mismatch. Also, how sure are we that turning this off globally is a solid idea? Off-hand, it doesn't sound that bad: there are probably situations in which autovacuum never truncates anything anyway just because the tail blocks of the relations always contain at least 1 tuple. But we should be careful not to add a setting that is far more likely to get people into trouble than to get them out of it. It would be good to hear what other people think about the risk vs. reward tradeoff in this case. -- Robert Haas EDB: http://www.enterprisedb.com
On Mon, Jan 27, 2025 at 03:38:39PM -0500, Robert Haas wrote: > Also, how sure are we that turning this off globally is a solid idea? > Off-hand, it doesn't sound that bad: there are probably situations in > which autovacuum never truncates anything anyway just because the tail > blocks of the relations always contain at least 1 tuple. But we should > be careful not to add a setting that is far more likely to get people > into trouble than to get them out of it. It would be good to hear what > other people think about the risk vs. reward tradeoff in this case. My first reaction is that a global setting is probably fine most of the time. I'm sure it's possible to get into bad situations if you try hard enough, but that's not a unique characteristic. There are probably many situations where the truncation is wasted effort because we'll just end up extending the relation shortly afterwards, anyway. In any case, it's already possible to achieve $SUBJECT with a trivial script that sets storage parameters on all tables, so IMHO it would be silly to withhold a global setting that does the same thing just on principle. Of course, ideally we'd "fix" truncation on standbys, but that's at least v19 work at this point, and past discussion from many years ago [0] leads me to believe it's a difficult problem. That's not to say we should shy away from difficult problems... [0] https://postgr.es/m/flat/CAHGQGwE5UqFqSq1%3DkV3QtTUtXphTdyHA-8rAj4A%3DY%2Be4kyp3BQ%40mail.gmail.com -- nathan
On Tue, Feb 18, 2025 at 01:56:09PM -0600, Nathan Bossart wrote: > On Mon, Jan 27, 2025 at 03:38:39PM -0500, Robert Haas wrote: >> Also, how sure are we that turning this off globally is a solid idea? >> Off-hand, it doesn't sound that bad: there are probably situations in >> which autovacuum never truncates anything anyway just because the tail >> blocks of the relations always contain at least 1 tuple. But we should >> be careful not to add a setting that is far more likely to get people >> into trouble than to get them out of it. It would be good to hear what >> other people think about the risk vs. reward tradeoff in this case. > > My first reaction is that a global setting is probably fine most of the > time. I'm sure it's possible to get into bad situations if you try hard > enough, but that's not a unique characteristic. There are probably many > situations where the truncation is wasted effort because we'll just end up > extending the relation shortly afterwards, anyway. In any case, it's > already possible to achieve $SUBJECT with a trivial script that sets > storage parameters on all tables, so IMHO it would be silly to withhold a > global setting that does the same thing just on principle. I spent some time on this one today. A couple of notes: * Since the reloption is a Boolean, there isn't a good way to tell whether it is actually set for the table or if it just inherited the default value. This is important to know because we want the reloption to override the GUC. I considered a bunch of different ways to handle this, but everything felt like a cowboy hack. The cleanest cowboy hack I could come up with is an optional offset field in relopt_parse_elt that points to a variable that stores whether the option was explicitly set. * I didn't see a good GUC category for vacuum_truncate. I suppose we could create a new one, but for now I've just stashed it into the autovacuum one. Bikeshedding welcome. Thoughts? -- nathan
Attachment
On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Thu, 2025-01-23 at 22:33 -0800, Gurjeet Singh wrote: > > > > I am also wondering if having an autovacuum setting to control it would be > > > > a good idea for a feature. > > > > > > I'm all for that. > > > > Please see attached an initial patch to disable truncation behaviour in > > autovacuum. This patch retains the default behavior of autovacuum truncating > > relations. The user is allowed to change the behaviour and disable relation > > truncations system-wide by setting autovacuum_disable_vacuum_truncate = true. > > Better parameter names welcome :-) > > I hope it is possible to override the global setting with the "vacuum_truncate" > option on an individual table. Current patch behaviour is that if the autovacuum_vacuum_truncate is false, then autovacuum will _not_ truncate any relations. If the parameter's value is true (the default), then the relation's reloption will be honored. A table-owner, or the database-owner, may enable truncation of a table, as they may be trying to be nice and return the unused disk space back to the OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the health of the entire db instance, as well as of any replicas of the db instance), wants to disable truncation across all databases to ensure that the replication does not get stuck, then IMHO Postgres should empower the sysadmin to make that decision, and override the relation-level setting enabled by the table- owner or the database-owner. > > One additional improvement I can think of is to emit a WARNING or NOTICE message > > that truncate operation is being skipped, perhaps only if the truncation > > would've freed up space over a certain threshold. > > Interesting idea, but I think it is independent from this patch. Agreed. I'll consider writing a separate patch for this. > > Perhaps there's value in letting this parameter be specified at database level, > > but I'm not able to think of a reason why someone would want to disable this > > behaviour on just one database. So leaving the parameter context to be the same > > as most other autovacuum parameters: SIGHUP. > > I can imagine setting that on only a certain database. Different databases > typically have different applications, which have different needs. Makes sense. I don't think anything special needs to be done in the patch to address this. > Eventually, the patch should have documentation and regression tests. Documentation added. Pointers on if, where, and what kind of tests to add will be appreciated. On Mon, Jan 27, 2025 at 12:38 PM Robert Haas <robertmhaas@gmail.com> wrote: > > On Mon, Jan 27, 2025 at 4:55 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > My suggestion for the parameter name is "autovacuum_disable_truncate". > > Then it would have a different name than the reloption, and the > opposite sense. In most cases, we've been able to keep those matching > -- autovacuum vs. autovacuum_enabled being, I believe, the only > current mismatch. If we want to maintain the convention of autovacuum parameters names to be of the format "autovacuum_<vacuum-option's-name>" then I believe the name autovacuum_vacuum_truncate (boolean) would be better, as compared to my original proposal (autovacuum_disable_vacuum_truncate), or Laurenz's proposal above. The default value should be true, to match the current autovacuum behaviour. > Also, how sure are we that turning this off globally is a solid idea? > Off-hand, it doesn't sound that bad: there are probably situations in > which autovacuum never truncates anything anyway just because the tail > blocks of the relations always contain at least 1 tuple. But we should > be careful not to add a setting that is far more likely to get people > into trouble than to get them out of it. It would be good to hear what > other people think about the risk vs. reward tradeoff in this case. Taking silence from others to be a sign of no opposition, I'm moving forward with the patch. On Tue, Feb 18, 2025 at 11:56 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Mon, Jan 27, 2025 at 03:38:39 PM -0500, Robert Haas wrote: > > Also, how sure are we that turning this off globally is a solid idea? > In any case, it's > already possible to achieve $SUBJECT with a trivial script that sets > storage parameters on all tables, so IMHO it would be silly to withhold a > global setting that does the same thing just on principle. +1 For documentation of this GUC, I borrowed heavily from the relevant sections of CREATE TABLE and VACUUM docs. There are 3 ways I wrote one of the sentences in the docs. I picked the last one, as it is concise and clearer than the others. If others feel a different choice of words would be better, I'm all ears. If <literal>false</literal>, autovacuum will not perform the truncation, even if the <literal>vacuum_truncate</literal> option has been set to <literal>true</literal> for the table being processed. If <literal>false</literal>, autovacuum will not perform the truncation, and it ignores the <literal>vacuum_truncate</literal> option for the tables it processes. If <literal>false</literal>, autovacuum will not perform the truncation on any tables it vacuums. The <literal>vacuum_truncate</literal> option on the tables is ignored. PS: Nathan, your latest email arrived as I was preparing this email and patch, so this email and patch does not address concerns, if any, in your latest email. I will try to respond to it soon. Best regards, Gurjeet http://Gurje.et
Attachment
On Thu, 2025-02-27 at 21:35 -0600, Nathan Bossart wrote: > I spent some time on this one today. A couple of notes: > > * Since the reloption is a Boolean, there isn't a good way to tell whether > it is actually set for the table or if it just inherited the default > value. This is important to know because we want the reloption to > override the GUC. I agree with that, without being able to think of a better solution. > * I didn't see a good GUC category for vacuum_truncate. I suppose we could > create a new one, but for now I've just stashed it into the autovacuum > one. Bikeshedding welcome. Why not use "Client Connection Defaults / Statement Behavior", just like for "vacuum_freeze_min_age"? I don't think that "autovacuum" is appropriate, since it applies to manual VACUUM as well. Yours, Laurenz Albe
On Thu, Feb 27, 2025 at 08:29:16PM -0800, Gurjeet Singh wrote: > On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> I hope it is possible to override the global setting with the "vacuum_truncate" >> option on an individual table. > > Current patch behaviour is that if the autovacuum_vacuum_truncate is false, then > autovacuum will _not_ truncate any relations. If the parameter's value is true > (the default), then the relation's reloption will be honored. > > A table-owner, or the database-owner, may enable truncation of a table, as they > may be trying to be nice and return the unused disk space back to the > OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the > health of the entire db instance, as well as of any replicas of the db > instance), > wants to disable truncation across all databases to ensure that the replication > does not get stuck, then IMHO Postgres should empower the sysadmin to make > that decision, and override the relation-level setting enabled by the table- > owner or the database-owner. IIUC reloptions with corresponding GUCs typically override the GUC setting, although autovacuum_enabled is arguably an exception. If setting the GUC to false overrides the relation-specific settings, it also becomes more difficult to enable truncation for just a couple of tables, although that might not be a popular use-case. Furthermore, even if we do want the GUC to override the reloption, it won't override VACUUM (TRUNCATE). >> > One additional improvement I can think of is to emit a WARNING or NOTICE message >> > that truncate operation is being skipped, perhaps only if the truncation >> > would've freed up space over a certain threshold. >> >> Interesting idea, but I think it is independent from this patch. > > Agreed. I'll consider writing a separate patch for this. Perhaps it would be useful to say whether truncation was attempted in the output of VACUUM (VERBOSE) and the autovacuum logs. >> > Perhaps there's value in letting this parameter be specified at database level, >> > but I'm not able to think of a reason why someone would want to disable this >> > behaviour on just one database. So leaving the parameter context to be the same >> > as most other autovacuum parameters: SIGHUP. >> >> I can imagine setting that on only a certain database. Different databases >> typically have different applications, which have different needs. > > Makes sense. I don't think anything special needs to be done in the patch to > address this. Hm. I was thinking PGC_USERSET might make sense for this one, but that was only because I didn't see any technical reason to restrict it. I don't know whether limiting it accomplishes anything beyond making it more cumbersome for users to choose their desired default truncation setting. > PS: Nathan, your latest email arrived as I was preparing this email and patch, > so this email and patch does not address concerns, if any, in your latest email. > I will try to respond to it soon. Oops, sorry for the conflict. I'm happy to take a step back and be the reviewer/committer for this one. -- nathan
On 2025/03/01 3:21, Nathan Bossart wrote: > On Thu, Feb 27, 2025 at 08:29:16PM -0800, Gurjeet Singh wrote: >> On Mon, Jan 27, 2025 at 1:55 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: >>> I hope it is possible to override the global setting with the "vacuum_truncate" >>> option on an individual table. >> >> Current patch behaviour is that if the autovacuum_vacuum_truncate is false, then >> autovacuum will _not_ truncate any relations. If the parameter's value is true >> (the default), then the relation's reloption will be honored. >> >> A table-owner, or the database-owner, may enable truncation of a table, as they >> may be trying to be nice and return the unused disk space back to the >> OS/filesystem. But if the sysadmin/DBA (who is ultimately responsible for the >> health of the entire db instance, as well as of any replicas of the db >> instance), >> wants to disable truncation across all databases to ensure that the replication >> does not get stuck, then IMHO Postgres should empower the sysadmin to make >> that decision, and override the relation-level setting enabled by the table- >> owner or the database-owner. > > IIUC reloptions with corresponding GUCs typically override the GUC setting, > although autovacuum_enabled is arguably an exception. If setting the GUC > to false overrides the relation-specific settings, it also becomes more > difficult to enable truncation for just a couple of tables, although that > might not be a popular use-case. Furthermore, even if we do want the GUC > to override the reloption, it won't override VACUUM (TRUNCATE). +1 to having the reloption (if specified) override the GUC setting. That is, I think that autovacuum_vacuum_truncate as defining the default behavior for VACUUM truncation, and that the GUC should only apply when neither the TRUNCATE option in VACUUM nor the reloption is set. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, Mar 06, 2025 at 08:54:59AM +0900, Fujii Masao wrote: > +1 to having the reloption (if specified) override the GUC setting. > That is, I think that autovacuum_vacuum_truncate as defining > the default behavior for VACUUM truncation, and that the GUC should > only apply when neither the TRUNCATE option in VACUUM nor > the reloption is set. One other difference in my version of the patch [0] is to call this GUC vacuum_truncate and have it apply to both autovacuum and VACUUM. I did this for the following reasons: * There is no autovacuum-specific storage parameter. There is only vacuum_truncate and toast.vacuum_truncate, both of which apply to autovacuum and VACUUM. Unfortunately, adding autovacuum-specific storage parameters at this point would break things for folks who are already using vacuum_truncate to prevent autovacuum from truncating. In any case, I gather that we try to ordinarily keep storage parameters named the same as their corresponding GUCs. * I'm not sure whether there's a real need to control the autovacuum default but not the VACUUM one. I'd expect most users of this stuff to be worried about truncation in both cases, especially for the hot standby use-case mentioned upthread. I should also mention that we just have a few weeks left in the v18 development cycle. The code itself seems pretty straightforward, so if we can agree on behavior and nomenclature, I'll do my darndest to get this responsibly committed in time. [0] https://postgr.es/m/attachment/172951/v2-0001-Add-vacuum_truncate-GUC.patch -- nathan
On Fri, 2025-03-14 at 10:42 -0500, Nathan Bossart wrote: > One other difference in my version of the patch [0] is to call this GUC > vacuum_truncate and have it apply to both autovacuum and VACUUM. I agree that that is the way to go. It makes no sense to restrict the feature to autovacuum. Yours, Laurenz Albe
On 2025/03/15 0:42, Nathan Bossart wrote: > I should also mention that we just have a few weeks left in the v18 > development cycle. The code itself seems pretty straightforward, so if we > can agree on behavior and nomenclature, I'll do my darndest to get this > responsibly committed in time. +1 Here are two minor review comments from me. + <varlistentry id="guc-vacuum-truncate" xreflabel="autovacuum"> This xreflabel should be "vacuum_truncate", not "autovacuum". - lock on the table. The <literal>TRUNCATE</literal> parameter - of <link linkend="sql-vacuum"><command>VACUUM</command></link>, if specified, overrides the value - of this option. + Per-table value for <xref linkend="guc-vacuum-truncate"/> parameter. It was explicitly documented that the TRUNCATE option in the VACUUM command overrides the vacuum_truncate reloption, but this information has been removed in the patch. Shouldn't we keep it to clarify the priority of these settings? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
+Andres Freund, since an old email of his is quoted here. On Fri, Mar 14, 2025 at 8:45 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > On Thu, Mar 06, 2025 at 08:54:59AM +0900, Fujii Masao wrote: > > +1 to having the reloption (if specified) override the GUC setting. > > That is, I think that autovacuum_vacuum_truncate as defining > > the default behavior for VACUUM truncation, and that the GUC should > > only apply when neither the TRUNCATE option in VACUUM nor > > the reloption is set. > > One other difference in my version of the patch [0] is to call this GUC > vacuum_truncate and have it apply to both autovacuum and VACUUM. I did > this for the following reasons: +1 for the GUC name for the reasons you identified. But -1 for the behaviour where the reloption and vacuum command's options overrides GUC. I did not consider the VACUUM (TRUNCATE) command, and was focused on autovacuum, since that's what caused the replication stall and the consequent application outage in the anecdote I was relaying. I'd like to bring our attention back to how this thread started. Will started the discussion by asking for a way to disable autovacuum's truncate behaviour. Even though the production outage he faced was due to manual vacuum, he was worried about the same behaviour that autovacuum may cause, especially since the parameter old_snapshot_threshold is no longer available; old_snapshot_threshold allowed sysadmins like Will to disable the truncation behaviour globally. I provided an anecdote where autovacuum's truncation behaviour had in fact caused a replication outage as well as the consequent application outage. The behaviour that is being proposed here does not prevent that situation from arising again. A sysadmin who's trying to prevent replication outage and a consequent application outage won't benefit from tuning vacuum_truncate GUC, because a reloption or VACUUM (TRUNCATE) command will override its behaviour, and lead to an outage. We want this new GUC to give the sysadmin the power to override the per-relation and per-command settings. > IIUC reloptions with corresponding GUCs typically override the GUC setting, > although autovacuum_enabled is arguably an exception. If setting the GUC > to false overrides the relation-specific settings, it also becomes more > difficult to enable truncation for just a couple of tables, although that > might not be a popular use-case. Furthermore, even if we do want the GUC > to override the reloption, it won't override VACUUM (TRUNCATE). I guess what I'm looking for is a global switch that guarantees no relation truncation will take place on the instance, so that the relevant replication record is never emitted, and hence will never lead to a blocked replication on the replica and never cause a consequent outage of applications connected to the replica(s). That is, as a sysadmin, I need a global variable that overrides and disables any relation-level and command-level truncation operations. I guess that's why naming the GUC *disable*_vacuum_truncate made sense to me when I initially proposed the autovacuum patch, since it was preventing autovacuum's default truncation behaviour. The downside of disabling _all_ truncation operations is that database size can only grow and never shrink, and it may be frustrating for a regular user who's trying to shrink their table sizes. Even in those situations, a sysadmin may prefer that none of the tables ever be shrunk in normal operation, rather than running the risk of causing a replication outage and a consequent application outage. In Will's words: > I expect I would need to monitor > tables for problematic growth, but that might be better than a surprise > outage. I suppose the growth could cause an outage too, but I'm thinking it > would be more controllable. The sysadmin can schedule a periodic maintenance window where the GUC is changed for the duration of the maintenance, allowing truncation operations globally, and then running VACUUM (TRUNCATE), or relying on autovacuum to truncate relations within that maintenance period. With the proposed v2 patch behaviour, a regular user is still in control of truncation behaviour, and hence can cause a replication outage and application outage on the replicas. Essentially, this patch fails to help the sysadmin, and leaves them at the mercy of table owners and database owners. Perhaps the compromise is that that the sysadmin will run a regular script to check that none of the relations have the reloption set to truncate the relation, and also communicate to the application developers that they shouldn't run the VACUUM (TRUNCATE) command. But anyone who has run a moderately large fleet of Postgres instances would agree that both those things are not foolproof, and will tend to be ignored or forgotten in the long run. Especially the VACUUM (TRUNCATE) option is so enticing for the average user that it's near impossible to prevent them from using it. In the message linked by Will upthread, Andres says: > The production issue is that > autovacuums constantly cancel queries on the standbys despite > hot_standby_feedback if you have a workload that does frequent > truncations. If there's no way to configure it in a way that autovacuum > takes it into account, people will just disable autovacuum and rely > entirely on manual scripts. That's what already happens - leading to a > much bigger foot-gun than disabling truncation. FWIW, people already in > production use the workaround to configuring snapshot_too_old as that, > for undocumented reasons, disables trunctations. The snapshot_too_old error/feature is not available anymore, and add to that the fact that manual scripts are not effective for transient tables; tables with short-enough lifetime that they may be truncated before the next run of the scripts. I think this exercise is moot if we're not solving the problem of letting sysadmin disable truncation altogether, and eliminating the possibility of an outage. By moving forward with v2, we're not addressing the problem that Will started this conversation with. It's entirely possible that I might be misunderstanding your proposal, though, in which case please feel free to illuminate how the proposed v2 patch will help a sysadmin counter vacuum's and autovacuum's truncation behaviour, despite the average user unknowingly monkeying with the database availability. If the GUC is named vacuum_truncate, then as Fujii Masao reasoned, it can be taken to mean the default value of per-relation and VACUUM options. So I propose that GUC be named disable_vacuum_truncate (antonym of the name you proposed), since it _prevents_ the default truncation behaviour of vacuum and autovacuum. When false, vacuum and autovacuum perform truncation as they do normally. When true, no truncations are performed, neither by vacuum nor by autovacuum; they instead emit a NOTICE, notifying that relation truncation was prevented by the GUC disable_vacuum_truncate. Default value should be false, to match the current behaviour. The GUC context should be PGC_SIGHUP so that it can be changed only by the sysadmin, and can be changed without requiring a restart. Optional: disable_vacuum_truncate=true has no effect if wal_level==minimal, since this whole debate and feature is to avoid replication outage, and any downstream effects on replicas, hence irrelevant to systems where replication is not in use. > I should also mention that we just have a few weeks left in the v18 > development cycle. The code itself seems pretty straightforward, so if we > can agree on behavior and nomenclature, I'll do my darndest to get this > responsibly committed in time. Thank you for all your help and work on this patch! Unfortunately I haven't been able to work on it as much as I would've liked to. Best regards, Gurjeet http://Gurje.et
On Sat, 2025-03-15 at 17:14 -0700, Gurjeet Singh wrote: > > One other difference in my version of the patch [0] is to call this GUC > > vacuum_truncate and have it apply to both autovacuum and VACUUM. I did > > this for the following reasons: > > +1 for the GUC name for the reasons you identified. But -1 for the behaviour > where the reloption and vacuum command's options overrides GUC. > > I'd like to bring our attention back to how this thread started. Will started > the discussion by asking for a way to disable autovacuum's truncate behaviour. > Even though the production outage he faced was due to manual vacuum, he was > worried about the same behaviour that autovacuum may cause, especially since the > parameter old_snapshot_threshold is no longer available; old_snapshot_threshold > allowed sysadmins like Will to disable the truncation behaviour globally. I > provided an anecdote where autovacuum's truncation behaviour had in fact caused > a replication outage as well as the consequent application outage. > > The behaviour that is being proposed here does not prevent that situation from > arising again. A sysadmin who's trying to prevent replication outage and > a consequent application outage won't benefit from tuning vacuum_truncate GUC, > because a reloption or VACUUM (TRUNCATE) command will override its behaviour, > and lead to an outage. > > We want this new GUC to give the sysadmin the power to override the per-relation > and per-command settings. > > > I guess what I'm looking for is a global switch that guarantees no relation > truncation will take place on the instance, so that the relevant replication > record is never emitted, and hence will never lead to a blocked replication on > the replica and never cause a consequent outage of applications connected to the > replica(s). Essentially, you are looking for something that reinstates the unintended side effect of "old_snapshot_threshold" that some people relied on. I understand your reasoning. What I am worried about, and why I am against that, is the POLA violation this constitutes. I PostgreSQL, there usually are global settings that can be overridden by per-relation settings. Doing it differently here would surprise and confuse many users. This is not the only way a user can do damage to the system by overriding the administrator's settings. Users can override all autovacuum settings and even disable autovacuum on a table. I don't think these settings are less dangerous than VACUUM truncation. Yours, Laurenz Albe
On Sun, Mar 16, 2025 at 1:29 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Sat, 2025-03-15 at 17:14 -0700, Gurjeet Singh wrote: > > > One other difference in my version of the patch [0] is to call this GUC > > > vacuum_truncate and have it apply to both autovacuum and VACUUM. I did > > > this for the following reasons: > > > > +1 for the GUC name for the reasons you identified. But -1 for the behaviour > > where the reloption and vacuum command's options overrides GUC. > > > > I'd like to bring our attention back to how this thread started. Will started > > the discussion by asking for a way to disable autovacuum's truncate behaviour. > > Even though the production outage he faced was due to manual vacuum, he was > > worried about the same behaviour that autovacuum may cause, especially since the > > parameter old_snapshot_threshold is no longer available; old_snapshot_threshold > > allowed sysadmins like Will to disable the truncation behaviour globally. I > > provided an anecdote where autovacuum's truncation behaviour had in fact caused > > a replication outage as well as the consequent application outage. > > > > The behaviour that is being proposed here does not prevent that situation from > > arising again. A sysadmin who's trying to prevent replication outage and > > a consequent application outage won't benefit from tuning vacuum_truncate GUC, > > because a reloption or VACUUM (TRUNCATE) command will override its behaviour, > > and lead to an outage. > > > > We want this new GUC to give the sysadmin the power to override the per-relation > > and per-command settings. > > > > > > I guess what I'm looking for is a global switch that guarantees no relation > > truncation will take place on the instance, so that the relevant replication > > record is never emitted, and hence will never lead to a blocked replication on > > the replica and never cause a consequent outage of applications connected to the > > replica(s). > > Essentially, you are looking for something that reinstates the unintended side > effect of "old_snapshot_threshold" that some people relied on. > > I understand your reasoning. > What I am worried about, and why I am against that, is the POLA violation this > constitutes. I PostgreSQL, there usually are global settings that can be > overridden by per-relation settings. Doing it differently here would surprise > and confuse many users. > Agreed... I couldn't help when reading through this thread the same thought that the normal way we do this is by trying to pick the sensible default and then giving options to override it on a more granular level. > This is not the only way a user can do damage to the system by overriding the > administrator's settings. Users can override all autovacuum settings and even > disable autovacuum on a table. I don't think these settings are less dangerous > than VACUUM truncation. > Agreed. To the degree I am sympathetic to Gurjeet's concern, it sounds more like he is trying to solve a socio-technical issue, which I think is beyond something that we can guarantee help with; ie presuming we provide a convenient way to disable this generally, if people are going to go out of their way to do the thing they have been told not to... So if the general idea is a guc "vacuum_truncate" which sets a global default for whether vacuums and autovacuums should do truncation, and we then have the storage parameter which would override the global setting. And to be clear, there is also the decision on whether the VACUUM commands default should default to truncate=on (like the existing behavior) or truncate == vacuum_truncate guc, unless explicitly set. I think the latter is probably the right way to go. As an aside, thinking through a bunch of different scenarios, I think I would actually be in favor of changing the default behavior to false (I don't think it buys much for most workloads, and I'd love to see us move towards defaults that minimize risk), but I suspect that may be a bridge too far, at least in this release; but maybe down the line... for now though I'd take an easy way for users to make it the default. Robert Treat https://xzilla.net
On Sun, Mar 16, 2025 at 06:24:59PM -0400, Robert Treat wrote: > So if the general idea is a guc "vacuum_truncate" which sets a global > default for whether vacuums and autovacuums should do truncation, and > we then have the storage parameter which would override the global > setting. And to be clear, there is also the decision on whether the > VACUUM commands default should default to truncate=on (like the > existing behavior) or truncate == vacuum_truncate guc, unless > explicitly set. I think the latter is probably the right way to go. Thank you all for the discussion. I've attempted to address the outstanding feedback into the new version of the patch. -- nathan
Attachment
On Mon, Mar 17, 2025 at 10:14:51AM -0500, Nathan Bossart wrote: > Thank you all for the discussion. I've attempted to address the > outstanding feedback into the new version of the patch. Here is a new version of the patch with tests and some other light edits. I feel like this is committable, but I'll wait for a couple more days for any other feedback or objections. -- nathan
Attachment
On 2025/03/19 10:42, Nathan Bossart wrote: > On Mon, Mar 17, 2025 at 10:14:51AM -0500, Nathan Bossart wrote: >> Thank you all for the discussion. I've attempted to address the >> outstanding feedback into the new version of the patch. > > Here is a new version of the patch with tests and some other light edits. > I feel like this is committable, but I'll wait for a couple more days for > any other feedback or objections. +# - Default Behavior - + +#vacuum_truncate = on # enable truncation after vacuum Since there's no existing GUC category that fits to vacuum_truncate, I'm fine with adding a new category like "Vacuuming" and placing vacuum_truncate there. However, if we do this, ISTM that the new category should also be added to guc_tables.h, and vacuum_truncate should be assigned to it in guc_tables.c. Additionally, the documentation should be updated to include the new category, with vacuum_truncate placed under it. Thought? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, Mar 20, 2025 at 12:34:59AM +0900, Fujii Masao wrote: > +# - Default Behavior - > + > +#vacuum_truncate = on # enable truncation after vacuum > > Since there's no existing GUC category that fits to vacuum_truncate, > I'm fine with adding a new category like "Vacuuming" and placing > vacuum_truncate there. > > However, if we do this, ISTM that the new category should also be added to > guc_tables.h, and vacuum_truncate should be assigned to it in guc_tables.c. > Additionally, the documentation should be updated to include > the new category, with vacuum_truncate placed under it. Thought? Ah, you're right, I need to fix the category. Will post an updated patch soon. -- nathan
On Wed, Mar 19, 2025 at 10:40:31AM -0500, Nathan Bossart wrote: > On Thu, Mar 20, 2025 at 12:34:59AM +0900, Fujii Masao wrote: >> +# - Default Behavior - >> + >> +#vacuum_truncate = on # enable truncation after vacuum >> >> Since there's no existing GUC category that fits to vacuum_truncate, >> I'm fine with adding a new category like "Vacuuming" and placing >> vacuum_truncate there. >> >> However, if we do this, ISTM that the new category should also be added to >> guc_tables.h, and vacuum_truncate should be assigned to it in guc_tables.c. >> Additionally, the documentation should be updated to include >> the new category, with vacuum_truncate placed under it. Thought? > > Ah, you're right, I need to fix the category. Will post an updated patch > soon. Here's a new version of the patch with the GUC placed in a new "Default Behavior" category for vacuum-related parameters. -- nathan
Attachment
Committed. -- nathan
On Thu, Mar 20, 2025 at 8:18 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
Committed.
We added isset_offset to the public struct to introduce this new general behavior of tracking whether any table reloption has been set (not just vacuum_truncate) without any comments.
I get the need for this kind of logic, since we used a boolean for the table option, but as a self-proclaimed hack it seems worth more comments than provided here. Especially pertaining to whether this is indeed generic or vacuum_truncate specific. It's unclear since both isset and vacuum_truncate_set have been introduced. If it is now a general property applying to any setting then vacuum_truncate_set shouldn't be needed - we should just get the isset value of the existing vacuum_truncate reloption by name.
David J.
On Thu, Mar 20, 2025 at 09:59:45AM -0700, David G. Johnston wrote: > I get the need for this kind of logic, since we used a boolean for the > table option, but as a self-proclaimed hack it seems worth more comments > than provided here. Especially pertaining to whether this is indeed > generic or vacuum_truncate specific. It's unclear since both isset and > vacuum_truncate_set have been introduced. I'm happy to expand the comments, but... > If it is now a general property > applying to any setting then vacuum_truncate_set shouldn't be needed - we > should just get the isset value of the existing vacuum_truncate reloption > by name. the reason I added this field is because I couldn't find any existing way to get this information where it's needed. So, I followed the existing pattern of adding an offset to something we can access. This can be used for any reloption, but currently vacuum_truncate is the only one that does. How does something like this look for the comment? /* * isset_offset is an optional offset of a field in the result struct * that stores whether the value is explicitly set for the relation or * has just picked up the default value. In most cases, this can be * deduced by giving the reloption a special default value (e.g., -2 is * a common one for integer reloptions), but this isn't always * possible. One notable example is Boolean reloptions, where it's * difficult to discern the source of the value. This offset is only * used if given a value greater than zero. */ int isset_offset; -- nathan
On Thu, Mar 20, 2025 at 11:13 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Mar 20, 2025 at 09:59:45AM -0700, David G. Johnston wrote:
> I get the need for this kind of logic, since we used a boolean for the
> table option, but as a self-proclaimed hack it seems worth more comments
> than provided here. Especially pertaining to whether this is indeed
> generic or vacuum_truncate specific. It's unclear since both isset and
> vacuum_truncate_set have been introduced.
I'm happy to expand the comments, but...
> If it is now a general property
> applying to any setting then vacuum_truncate_set shouldn't be needed - we
> should just get the isset value of the existing vacuum_truncate reloption
> by name.
the reason I added this field is because I couldn't find any existing way
to get this information where it's needed. So, I followed the existing
pattern of adding an offset to something we can access. This can be used
for any reloption, but currently vacuum_truncate is the only one that does.
I'll come back to the comment if it's needed. I was concerned about dump/restore and apparently pg_dump has been perfectly capable of determining whether the current catalog state for a reloption, even a boolean, is unset, true, or false. Namely, the following outcomes:
create table vtt (x int not null, y int not null);
CREATE TABLE public.vtt (
x integer NOT NULL,
y integer NOT NULL
);
x integer NOT NULL,
y integer NOT NULL
);
alter table vtt set (vacuum_truncate = true);
CREATE TABLE public.vtt (
x integer NOT NULL,
y integer NOT NULL
)
WITH (vacuum_truncate='true');
x integer NOT NULL,
y integer NOT NULL
)
WITH (vacuum_truncate='true');
alter table vtt reset (vacuum_truncate);
CREATE TABLE public.vtt (
x integer NOT NULL,
y integer NOT NULL
);
x integer NOT NULL,
y integer NOT NULL
);
So my concern about dump/restore seems to be alleviated but then, why can we not just do whatever pg_dump is doing to decide whether the current value for vacuum_truncate is its default (and thus would not be dumped) or not (and would be dumped)?
David J.
On Thu, Mar 20, 2025 at 02:18:33PM -0700, David G. Johnston wrote: > So my concern about dump/restore seems to be alleviated but then, why can > we not just do whatever pg_dump is doing to decide whether the current > value for vacuum_truncate is its default (and thus would not be dumped) or > not (and would be dumped)? pg_dump looks at the pg_class.reloptions array directly. In the vacuum code, we look at the pre-parsed rd_options (see RelationParseRelOptions() in relcache.c), which will have already resolved vacuum_truncate to its default value if it was not explicitly set. We could probably look at pg_class.reloptions directly in the vacuum code if we _really_ wanted to, but I felt that putting this information into rd_options was much cleaner. -- nathan
On Thu, Mar 20, 2025 at 2:31 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
On Thu, Mar 20, 2025 at 02:18:33PM -0700, David G. Johnston wrote:
> So my concern about dump/restore seems to be alleviated but then, why can
> we not just do whatever pg_dump is doing to decide whether the current
> value for vacuum_truncate is its default (and thus would not be dumped) or
> not (and would be dumped)?
pg_dump looks at the pg_class.reloptions array directly. In the vacuum
code, we look at the pre-parsed rd_options (see RelationParseRelOptions()
in relcache.c), which will have already resolved vacuum_truncate to its
default value if it was not explicitly set. We could probably look at
pg_class.reloptions directly in the vacuum code if we _really_ wanted to,
but I felt that putting this information into rd_options was much cleaner.
I understand this now and suggest the following comment changes based upon that understanding.
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 645b5c0046..c795df6100 100644--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -1912,7 +1912,9 @@ default_reloptions(Datum reloptions, bool validate, relopt_kind kind)
{"vacuum_index_cleanup", RELOPT_TYPE_ENUM,
offsetof(StdRdOptions, vacuum_index_cleanup)},
{"vacuum_truncate", RELOPT_TYPE_BOOL,
- offsetof(StdRdOptions, vacuum_truncate), offsetof(StdRdOptions, vacuum_truncate_set)},
+ offsetof(StdRdOptions, vacuum_truncate),
+ /* vacuum_truncate uses the isset_offset member instead of a sentinel value */
+ offsetof(StdRdOptions, vacuum_truncate_set)},
{"vacuum_max_eager_freeze_failure_rate", RELOPT_TYPE_REAL,
offsetof(StdRdOptions, vacuum_max_eager_freeze_failure_rate)}
};
diff --git a/src/include/access/reloptions.h b/src/include/access/reloptions.h
index 146aed47c2..fddeee0d54 100644
--- a/src/include/access/reloptions.h
+++ b/src/include/access/reloptions.h
@@ -152,7 +152,17 @@ typedef struct
const char *optname; /* option's name */
relopt_type opttype; /* option's datatype */
int offset; /* offset of field in result struct */
- int isset_offset; /* if > 0, offset of "is set" field */
+ /*
+ * The reloptions system knows whether a reloption has been set by the
+ * DBA or not. Historically, this information was lost when parsing
+ * the reloptions so we coped by using sentinel values.
+ * This doesn't work for boolean reloptions. For those, we
+ * need a place for the reloption parser to store its knowledge of
+ * whether the reloption was set. Set this field to an offset
+ * in the StdRdOptions struct. The pointed-to location needs to
+ * be a bool. A value of 0 here is interpreted as "no need to store".
+ */
+ int isset_offset;
} relopt_parse_elt;
/* Local reloption definition */
diff --git a/src/include/utils/rel.h b/src/include/utils/rel.h
index d94fddd7ce..461648aac6 100644
--- a/src/include/utils/rel.h
+++ b/src/include/utils/rel.h
@@ -344,7 +344,7 @@ typedef struct StdRdOptions
int parallel_workers; /* max number of parallel workers */
StdRdOptIndexCleanup vacuum_index_cleanup; /* controls index vacuuming */
bool vacuum_truncate; /* enables vacuum to truncate a relation */
- bool vacuum_truncate_set; /* whether vacuum_truncate is set */
+ bool vacuum_truncate_set; /* reserve space for isset status of vacuum_truncate */
/*
* Fraction of pages in a relation that vacuum can eagerly scan and fail
David J.
On Thu, Mar 20, 2025 at 11:13 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
How does something like this look for the comment?
/*
* isset_offset is an optional offset of a field in the result struct
* that stores whether the value is explicitly set for the relation or
* has just picked up the default value. In most cases, this can be
* deduced by giving the reloption a special default value (e.g., -2 is
* a common one for integer reloptions), but this isn't always
* possible. One notable example is Boolean reloptions, where it's
* difficult to discern the source of the value. This offset is only
* used if given a value greater than zero.
*/
int isset_offset;
I didn't actually come back to this before writing my comment.
I'm glad they both say basically the same thing.
I'm still partial to mine but yours probably fits the overall style of the codebase better.
David J.
On Fri, Mar 21, 2025 at 08:54:55AM -0700, David G. Johnston wrote: > I'm still partial to mine but yours probably fits the overall style of the > codebase better. Committed with some light edits. -- nathan