Thread: Should we remove vacuum_defer_cleanup_age?
Hi, As evidenced by the bug fixed in be504a3e974, vacuum_defer_cleanup_age is not heavily used - the bug was trivial to hit as soon as vacuum_defer_cleanup_age is set to a non-toy value. It complicates thinking about visibility horizons substantially, as vacuum_defer_cleanup_age can make them go backward substantially. Obviously it's also severely undertested. I started writing a test for vacuum_defer_cleanup_age while working on the fix referenced above, but now I am wondering if said energy would be better spent removing vacuum_defer_cleanup_age alltogether. vacuum_defer_cleanup_age was added as part of hot standby. Back then we did not yet have hot_standby_feedback. Now that that exists, vacuum_defer_cleanup_age doesn't seem like a good idea anymore. It's pessimisistic, i.e. always retains rows, even if none of the standbys has an old enough snapshot. The only benefit of vacuum_defer_cleanup_age over hot_standby_feedback is that it provides a limit of some sort. But transactionids aren't producing dead rows in a uniform manner, so limiting via xid isn't particularly useful. And even if there are use cases, it seems those would be served better by introducing a cap on how much hot_standby_feedback can hold the horizon back. I don't think I have the cycles to push this through in the next weeks, but if we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a good idea to mark it as deprecated in 16? Greetings, Andres Freund
On Sat, Mar 18, 2023 at 12:09 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
As evidenced by the bug fixed in be504a3e974, vacuum_defer_cleanup_age is not
heavily used - the bug was trivial to hit as soon as vacuum_defer_cleanup_age
is set to a non-toy value. It complicates thinking about visibility horizons
substantially, as vacuum_defer_cleanup_age can make them go backward
substantially. Obviously it's also severely undertested.
I started writing a test for vacuum_defer_cleanup_age while working on the fix
referenced above, but now I am wondering if said energy would be better spent
removing vacuum_defer_cleanup_age alltogether.
vacuum_defer_cleanup_age was added as part of hot standby. Back then we did
not yet have hot_standby_feedback. Now that that exists,
vacuum_defer_cleanup_age doesn't seem like a good idea anymore. It's
pessimisistic, i.e. always retains rows, even if none of the standbys has an
old enough snapshot.
The only benefit of vacuum_defer_cleanup_age over hot_standby_feedback is that
it provides a limit of some sort. But transactionids aren't producing dead
rows in a uniform manner, so limiting via xid isn't particularly useful. And
even if there are use cases, it seems those would be served better by
introducing a cap on how much hot_standby_feedback can hold the horizon back.
I don't think I have the cycles to push this through in the next weeks, but if
we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a
good idea to mark it as deprecated in 16?
+1. I haven't seen any (correct) use of this in many many years on my end at least.
And yes, having a cap on hot_standby_feedback would also be great.
On 2023-Mar-17, Andres Freund wrote: > I started writing a test for vacuum_defer_cleanup_age while working on the fix > referenced above, but now I am wondering if said energy would be better spent > removing vacuum_defer_cleanup_age alltogether. +1 I agree it's not useful anymore. > I don't think I have the cycles to push this through in the next weeks, but if > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a > good idea to mark it as deprecated in 16? Hmm, for the time being, can we just "disable" it by disallowing to set the GUC to a value different from 0? Then we can remove the code later in the cycle at leisure. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "La gente vulgar sólo piensa en pasar el tiempo; el que tiene talento, en aprovecharlo"
On Sat, Mar 18, 2023 at 10:33:57AM +0100, Alvaro Herrera wrote: > On 2023-Mar-17, Andres Freund wrote: > > > I started writing a test for vacuum_defer_cleanup_age while working on the fix > > referenced above, but now I am wondering if said energy would be better spent > > removing vacuum_defer_cleanup_age alltogether. > > +1 I agree it's not useful anymore. > > > I don't think I have the cycles to push this through in the next weeks, but if > > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a > > good idea to mark it as deprecated in 16? > > Hmm, for the time being, can we just "disable" it by disallowing to set > the GUC to a value different from 0? Then we can remove the code later > in the cycle at leisure. It can be useful to do a "rolling transition", and it's something I do often. But I can't see why that would be useful here? It seems like something that could be done after the feature freeze. It's removing a feature, not adding one. -- Justin
Hi, On 2023-03-22 11:44:20 -0500, Justin Pryzby wrote: > On Sat, Mar 18, 2023 at 10:33:57AM +0100, Alvaro Herrera wrote: > > On 2023-Mar-17, Andres Freund wrote: > > > > > I started writing a test for vacuum_defer_cleanup_age while working on the fix > > > referenced above, but now I am wondering if said energy would be better spent > > > removing vacuum_defer_cleanup_age alltogether. > > > > +1 I agree it's not useful anymore. > > > > > I don't think I have the cycles to push this through in the next weeks, but if > > > we agree removing vacuum_defer_cleanup_age is a good idea, it seems like a > > > good idea to mark it as deprecated in 16? > > > > Hmm, for the time being, can we just "disable" it by disallowing to set > > the GUC to a value different from 0? Then we can remove the code later > > in the cycle at leisure. > > It can be useful to do a "rolling transition", and it's something I do > often. > > But I can't see why that would be useful here? It seems like something > that could be done after the feature freeze. It's removing a feature, > not adding one. It wasn't actually that much work to write a patch to remove vacuum_defer_cleanup_age, see the attached. I don't know whether others think we should apply it this release, given the "late submission", but I tend to think it's not worth caring the complication of vacuum_defer_cleanup_age forward. Greetings, Andres Freund
Attachment
> On 22 Mar 2023, at 18:00, Andres Freund <andres@anarazel.de> wrote: > It wasn't actually that much work to write a patch to remove > vacuum_defer_cleanup_age, see the attached. - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against + provides protection against relevant rows being removed by vacuum, but the former provides no protection during any time period when the standby is not connected, and the latter often needs to be set to a high value to provide adequate Isn't "the latter" in the kept part of the sentence referring to the guc we're removing here? - * It's possible that slots / vacuum_defer_cleanup_age backed up the - * horizons further than oldest_considered_running. Fix. + * It's possible that slots backed up the horizons further than + * oldest_considered_running. Fix. While not the fault of this patch, can't we use the opportunity to expand "Fix." to a short "Fix this by ..." sentence? Or remove "Fix." perhaps, either of those would improve the comment IMHO. > I don't know whether others think we should apply it this release, given the > "late submission", but I tend to think it's not worth caring the complication > of vacuum_defer_cleanup_age forward. It might be late in the cycle, but as it's not adding something that might break but rather removing something that's known for being problematic (and not useful) I think it's Ok. -- Daniel Gustafsson
Hi, On 2023-03-23 10:18:35 +0100, Daniel Gustafsson wrote: > > On 22 Mar 2023, at 18:00, Andres Freund <andres@anarazel.de> wrote: > > > It wasn't actually that much work to write a patch to remove > > vacuum_defer_cleanup_age, see the attached. > > - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against > + provides protection against > relevant rows being removed by vacuum, but the former provides no > protection during any time period when the standby is not connected, > and the latter often needs to be set to a high value to provide adequate > > Isn't "the latter" in the kept part of the sentence referring to the guc we're > removing here? You're right. That paragraph generally seems a bit off. In HEAD: <para> In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by storing the segments in an archive using <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>. However, these methods often result in retaining more WAL segments than required, whereas replication slots retain only the number of segments known to be needed. On the other hand, replication slots can retain so many WAL segments that they fill up the space allocated for <literal>pg_wal</literal>; <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files retained by replication slots. </para> <para> Similarly, <xref linkend="guc-hot-standby-feedback"/> and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against relevant rows being removed by vacuum, but the former provides no protection during any time period when the standby is not connected, and the latter often needs to be set to a high value to provide adequate protection. Replication slots overcome these disadvantages. </para> Replication slots alone don't prevent row removal, that requires hot_standby_feedback to be used as well. A minimal rephrasing would be: <para> Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without also using a replication slot, provides protection against relevant rows being removed by vacuum, but provides no protection during any time period when the standby is not connected. Replication slots overcome these disadvantages. </para> > - * It's possible that slots / vacuum_defer_cleanup_age backed up the > - * horizons further than oldest_considered_running. Fix. > + * It's possible that slots backed up the horizons further than > + * oldest_considered_running. Fix. > > While not the fault of this patch, can't we use the opportunity to expand > "Fix." to a short "Fix this by ..." sentence? Or remove "Fix." perhaps, either > of those would improve the comment IMHO. Perhaps unsurprisingly, given that I wrote that comment, I don't see a problem with the "Fix."... Greetings, Andres Freund
> On 24 Mar 2023, at 21:27, Andres Freund <andres@anarazel.de> wrote: > On 2023-03-23 10:18:35 +0100, Daniel Gustafsson wrote: >>> On 22 Mar 2023, at 18:00, Andres Freund <andres@anarazel.de> wrote: >> >>> It wasn't actually that much work to write a patch to remove >>> vacuum_defer_cleanup_age, see the attached. >> >> - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against >> + provides protection against >> relevant rows being removed by vacuum, but the former provides no >> protection during any time period when the standby is not connected, >> and the latter often needs to be set to a high value to provide adequate >> >> Isn't "the latter" in the kept part of the sentence referring to the guc we're >> removing here? > > You're right. That paragraph generally seems a bit off. In HEAD: > > ... > > Replication slots alone don't prevent row removal, that requires > hot_standby_feedback to be used as well. > > A minimal rephrasing would be: > <para> > Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without > also using a replication slot, provides protection against relevant rows > being removed by vacuum, but provides no protection during any time period > when the standby is not connected. Replication slots overcome these > disadvantages. > </para> +1, that's definitely an improvement. -- Daniel Gustafsson
On Sat, Mar 18, 2023 at 2:34 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2023-Mar-17, Andres Freund wrote: > > > I started writing a test for vacuum_defer_cleanup_age while working on the fix > > referenced above, but now I am wondering if said energy would be better spent > > removing vacuum_defer_cleanup_age alltogether. > > +1 I agree it's not useful anymore. +1. I am suspicious of most of the GUCs whose value is an XID age. It strikes me as something that is convenient to the implementation, but not to the user, since there are so many ways that XID age might be a poor proxy for whatever it is that you really care about in each case. A theoretical advantage of vacuum_defer_cleanup_age is that it allows the user to control things in terms of the impact on the primary -- whereas hot_standby_feedback is a mechanism that controls things in terms of the needs of the standby. In practice this is pretty useless, but it seems like it might be possible to come up with some other new mechanism that somehow does this in a way that's truly useful. Something that allows the user to constrain how far we hold back conflicts/vacuuming in terms of the *impact* on the primary. It might be helpful to permit opportunistic cleanup by pruning and index deletion at some point, but to throttle it when we know it would violate some soft limit related to hot_standby_feedback. Maybe the system could prevent the first few attempts at pruning when it violates the soft limit, or make pruning prune somewhat less aggressively where there is little advantage to it in terms of space/tuples freed -- decide on what to do at the very last minute, based on all available information at that late stage, with the full context available. The system could be taught to be very patient at first, when relatively few pruning operations have been attempted, when the cost is basically still acceptable. But as more pruning operations ran and clearly didn't free space that really should be freed, we'd quickly lose patience. The big idea here is to delay committing to any course of action for as long as possible, so we wouldn't kill queries on standbys for very little benefit on the primary, while at the same time avoiding ever really failing to kill queries on standbys when the cost proved too high on the primary. For this to have any chance of working it needs to focus on the actual costs on the primary, and not some extremely noisy proxy for that cost. The standby will have its query killed by just one prune record affecting just one heap page, and delaying that specific prune record is likely no big deal. It's preventing pruning of tens of thousands of heap pages that we need to worry about. -- Peter Geoghegan
On Wed, Mar 22, 2023 at 10:00:48AM -0700, Andres Freund wrote: > I don't know whether others think we should apply it this release, given the > "late submission", but I tend to think it's not worth caring the complication > of vacuum_defer_cleanup_age forward. I don't see any utility in waiting; it just makes the process of removing it take longer for no reason. As long as it's done before the betas, it seems completely reasonable to remove it for v16. -- Justin
Hi, On 2023-04-11 11:33:01 -0500, Justin Pryzby wrote: > On Wed, Mar 22, 2023 at 10:00:48AM -0700, Andres Freund wrote: > > I don't know whether others think we should apply it this release, given the > > "late submission", but I tend to think it's not worth caring the complication > > of vacuum_defer_cleanup_age forward. > > I don't see any utility in waiting; it just makes the process of > removing it take longer for no reason. > > As long as it's done before the betas, it seems completely reasonable to > remove it for v16. Added the RMT. We really should have a rmt@pg.o alias... Updated patch attached. I think we should either apply something like that patch, or at least add a <warning/> to the docs. Greetings, Andres Freund
Attachment
On Tue, Apr 11, 2023 at 11:50 PM Andres Freund <andres@anarazel.de> wrote: > > On 2023-04-11 11:33:01 -0500, Justin Pryzby wrote: > > On Wed, Mar 22, 2023 at 10:00:48AM -0700, Andres Freund wrote: > > > I don't know whether others think we should apply it this release, given the > > > "late submission", but I tend to think it's not worth caring the complication > > > of vacuum_defer_cleanup_age forward. > > > > I don't see any utility in waiting; it just makes the process of > > removing it take longer for no reason. > > > > As long as it's done before the betas, it seems completely reasonable to > > remove it for v16. > > Added the RMT. > > We really should have a rmt@pg.o alias... > > Updated patch attached. I think we should either apply something like that > patch, or at least add a <warning/> to the docs. > +1 to do one of the above. I think there is a good chance that somebody might be doing more harm by using it so removing this shouldn't be a problem. Personally, I have not heard of people using it but OTOH it is difficult to predict so giving some time is also not a bad idea. Do others have any opinion/suggestion on this matter? -- With Regards, Amit Kapila.
On 2023-Apr-11, Andres Freund wrote: > Updated patch attached. I think we should either apply something like that > patch, or at least add a <warning/> to the docs. I gave this patch a look. The only code change is that ComputeXidHorizons() and GetSnapshotData() no longer handle the case where vacuum_defer_cleanup_age is different from zero. It looks good. The TransactionIdRetreatSafely() code being removed is pretty weird (I spent a good dozen minutes writing a complaint that your rewrite was faulty, but it turns out I had misunderstood the function), so I'm glad it's being retired. > <para> > - Similarly, <xref linkend="guc-hot-standby-feedback"/> > - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against > - relevant rows being removed by vacuum, but the former provides no > - protection during any time period when the standby is not connected, > - and the latter often needs to be set to a high value to provide adequate > - protection. Replication slots overcome these disadvantages. > + Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without > + also using a replication slot, provides protection against relevant rows > + being removed by vacuum, but provides no protection during any time period > + when the standby is not connected. Replication slots overcome these > + disadvantages. I think it made sense to have this paragraph be separate from the previous one when it was talking about two separate variables, but now that it's just one, it looks a bit isolated. I would merge it with the one above, which is talking about pretty much the same thing, and reorder the whole thing approximately like this <para> In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by storing the segments in an archive using <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>. However, these methods often result in retaining more WAL segments than required. Similarly, <xref linkend="guc-hot-standby-feedback"/> without a replication slot provides protection against relevant rows being removed by vacuum, but provides no protection during any time period when the standby is not connected. </para> <para> Replication slots overcome these disadvantages by retaining only the number of segments known to be needed. On the other hand, replication slots can retain so many WAL segments that they fill up the space allocated for <literal>pg_wal</literal>; <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files retained by replication slots. </para> Though the "However," looks a poor fit; I would do this: <para> In lieu of using replication slots, it is possible to prevent the removal of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by storing the segments in an archive using <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>. A disadvantage of these methods is that they often result in retaining more WAL segments than required. Similarly, <xref linkend="guc-hot-standby-feedback"/> without a replication slot provides protection against relevant rows being removed by vacuum, but provides no protection during any time period when the standby is not connected. </para> <para> Replication slots overcome these disadvantages by retaining only the number of segments known to be needed. On the other hand, replication slots can retain so many WAL segments that they fill up the space allocated for <literal>pg_wal</literal>; <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files retained by replication slots. </para> -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ Tom: There seems to be something broken here. Teodor: I'm in sackcloth and ashes... Fixed. http://postgr.es/m/482D1632.8010507@sigaev.ru
On 4/12/23 11:34 PM, Amit Kapila wrote: > On Tue, Apr 11, 2023 at 11:50 PM Andres Freund <andres@anarazel.de> wrote: >> >> On 2023-04-11 11:33:01 -0500, Justin Pryzby wrote: >>> On Wed, Mar 22, 2023 at 10:00:48AM -0700, Andres Freund wrote: >>>> I don't know whether others think we should apply it this release, given the >>>> "late submission", but I tend to think it's not worth caring the complication >>>> of vacuum_defer_cleanup_age forward. >>> >>> I don't see any utility in waiting; it just makes the process of >>> removing it take longer for no reason. >>> >>> As long as it's done before the betas, it seems completely reasonable to >>> remove it for v16. >> >> Added the RMT. >> >> We really should have a rmt@pg.o alias... (I had thought something as much -- will reach out to pginfra about options) >> Updated patch attached. I think we should either apply something like that >> patch, or at least add a <warning/> to the docs. >> > +1 to do one of the above. I think there is a good chance that > somebody might be doing more harm by using it so removing this > shouldn't be a problem. Personally, I have not heard of people using > it but OTOH it is difficult to predict so giving some time is also not > a bad idea. > > Do others have any opinion/suggestion on this matter? I need a bit more time to study this before formulating an opinion on whether we should remove it for v16. In any case, I'm not against documentation. Jonathan
Attachment
On 4/13/23 11:32 AM, Jonathan S. Katz wrote: > On 4/12/23 11:34 PM, Amit Kapila wrote: >> On Tue, Apr 11, 2023 at 11:50 PM Andres Freund <andres@anarazel.de> >> +1 to do one of the above. I think there is a good chance that >> somebody might be doing more harm by using it so removing this >> shouldn't be a problem. Personally, I have not heard of people using >> it but OTOH it is difficult to predict so giving some time is also not >> a bad idea. >> >> Do others have any opinion/suggestion on this matter? > > I need a bit more time to study this before formulating an opinion on > whether we should remove it for v16. In any case, I'm not against > documentation. (didn't need too much more time). [RMT hat] +1 for removing. I looked at some data and it doesn't seem like vacuum_defer_cleanup_age is used in any significant way, whereas hot_standby_feedback is much more widely used. Given this, and all the problems + arguments made in the thread, we should just get rid of it for v16. There are cases where we should deprecate before removing, but I don't think this one based upon usage and having a better alternative. Per [1] it does sound like we can make some improvements to hot_standby_feedback, but those can wait to v17. We should probably set $DATE to finish this, too. I don't think it's a rush, but we should give enough time before Beta 1. Jonathan [1] https://www.postgresql.org/message-id/20230317230930.nhsgk3qfk7f4axls%40awork3.anarazel.de
Attachment
On Thu, 2023-04-13 at 12:16 -0400, Jonathan S. Katz wrote: > On 4/13/23 11:32 AM, Jonathan S. Katz wrote: > > On 4/12/23 11:34 PM, Amit Kapila wrote: > > > On Tue, Apr 11, 2023 at 11:50 PM Andres Freund <andres@anarazel.de> > > > > +1 to do one of the above. I think there is a good chance that > > > somebody might be doing more harm by using it so removing this > > > shouldn't be a problem. Personally, I have not heard of people using > > > it but OTOH it is difficult to predict so giving some time is also not > > > a bad idea. > > > > > > Do others have any opinion/suggestion on this matter? > > > > I need a bit more time to study this before formulating an opinion on > > whether we should remove it for v16. In any case, I'm not against > > documentation. > > [RMT hat] > > +1 for removing. I am not against this in principle, but I know that there are people using this parameter; see the discussion linked in https://postgr.es/m/E1jkzxE-0006Dw-Dg@gemulon.postgresql.org I can't say if they have a good use case for that parameter or not. Yours, Laurenz Albe
On Thu, Apr 13, 2023 at 11:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: > I am not against this in principle, but I know that there are people using > this parameter; see the discussion linked in > > https://postgr.es/m/E1jkzxE-0006Dw-Dg@gemulon.postgresql.org > > I can't say if they have a good use case for that parameter or not. Yeah, I feel similarly. Actually, personally I have no evidence, not even an anecdote, suggesting that this parameter is in use, but I'm a bit skeptical of any consensus of the form "no one is using X," because there sure are a lot of people running PostgreSQL and they sure do a lot of things. Some more justifiably than others, but often people have surprisingly good excuses for doing stuff that sounds bizarre when you first hear about it, and it doesn't seem totally impossible that somebody could have found a way to get value out of this. However, I suspect that there aren't many such people, and I think the setting is a kludge, so I support removing it. Maybe we'll find out that we ought to add something else instead, like a limited delimited in time rather than in XIDs. Or maybe the existing facilities are good enough. But as Peter rightly says, XID age is likely a poor proxy for whatever people really care about, so I don't think continuing to have a setting that works like that is a good plan. -- Robert Haas EDB: http://www.enterprisedb.com
> On 14 Apr 2023, at 14:30, Robert Haas <robertmhaas@gmail.com> wrote: > ..as Peter rightly says, XID age is likely a poor proxy for > whatever people really care about, so I don't think continuing to have > a setting that works like that is a good plan. Agreed, and removing it is likely to be a good vehicle for figuring out what we should have instead (if anything). -- Daniel Gustafsson
On 4/14/23 8:30 AM, Robert Haas wrote: > On Thu, Apr 13, 2023 at 11:06 PM Laurenz Albe <laurenz.albe@cybertec.at> wrote: >> I am not against this in principle, but I know that there are people using >> this parameter; see the discussion linked in >> >> https://postgr.es/m/E1jkzxE-0006Dw-Dg@gemulon.postgresql.org >> >> I can't say if they have a good use case for that parameter or not. > > Yeah, I feel similarly. Actually, personally I have no evidence, not > even an anecdote, suggesting that this parameter is in use, but I'm a > bit skeptical of any consensus of the form "no one is using X," > because there sure are a lot of people running PostgreSQL and they > sure do a lot of things. Some more justifiably than others, but often > people have surprisingly good excuses for doing stuff that sounds > bizarre when you first hear about it, and it doesn't seem totally > impossible that somebody could have found a way to get value out of > this. Let me restate [1] in a different way. Using a large enough dataset, I did qualitatively look at overall usage of both "vacuum_defer_cleanup_age" and compared to "hot_standby_feedback", given you can use both to accomplish similar outcomes. The usage of "vacuum_defer_cleanup_age" was really minimal, in fact approaching "0", whereas "hot_standby_feedback" had significant adoption. I'm not saying that "we should remove a setting just because it's not used" or that it may not have utility -- I'm saying that we can remove the setting given: 1. We know that using this setting incorrectly (which can be done fairly easily) can cause significant issues 2. There's another setting that can accomplish similar goals that's much safer 3. The setting itself is not widely used It's the combination of all 3 that led to my conclusion. If it were just (1), I'd lean more strongly towards trying to fix it first. > However, I suspect that there aren't many such people, and I think the > setting is a kludge, so I support removing it. Maybe we'll find out > that we ought to add something else instead, like a limited delimited > in time rather than in XIDs. Or maybe the existing facilities are good > enough. But as Peter rightly says, XID age is likely a poor proxy for > whatever people really care about, so I don't think continuing to have > a setting that works like that is a good plan. That seems like a good eventual outcome. Thanks, Jonathan [1] https://www.postgresql.org/message-id/bf42784f-4d57-0a3d-1a06-ffac1a09318c%40postgresql.org
Attachment
On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz <jkatz@postgresql.org> wrote: > > Let me restate [1] in a different way. > > Using a large enough dataset, I did qualitatively look at overall usage > of both "vacuum_defer_cleanup_age" and compared to > "hot_standby_feedback", given you can use both to accomplish similar > outcomes. I assume people would use hot_standby_feedback if they have streaming replication. The main use cases for vacuum_defer_cleanup_age would be if you're replaying WAL files. That may sound archaic but there are plenty of circumstances where your standby may not have network access to your primary at all or not want to be replaying continuously. I wonder whether your dataset is self-selecting sites that have streaming replication. That does seem like the more common usage pattern. Systems using wal files are more likely to be things like data warehouses, offline analytics systems, etc. They may not even be well known in the same organization that runs the online operations -- in my experience they're often run by marketing or sales organizations or in some cases infosec teams and consume data from lots of sources. The main reason to use wal archive replay is often to provide the isolation so that the operations team don't need to worry about the impact on production and that makes it easy to forget these even exist. -- greg
On 2023-Apr-14, Greg Stark wrote: > On Fri, 14 Apr 2023 at 09:47, Jonathan S. Katz <jkatz@postgresql.org> wrote: > > > > Let me restate [1] in a different way. > > > > Using a large enough dataset, I did qualitatively look at overall usage > > of both "vacuum_defer_cleanup_age" and compared to > > "hot_standby_feedback", given you can use both to accomplish similar > > outcomes. > > I assume people would use hot_standby_feedback if they have streaming > replication. Yes, either that or a replication slot. vacuum_defer_cleanup_age was added in commit efc16ea52067 (2009-12-19), for Postgres 9.0. hot_standby_feedback is a bit newer (bca8b7f16a3e, 2011-02-16), and replication slots are newer still (858ec11858a9, 2014-01-31). > The main use cases for vacuum_defer_cleanup_age would be if you're > replaying WAL files. That may sound archaic but there are plenty of > circumstances where your standby may not have network access to your > primary at all or not want to be replaying continuously. True, those cases exist. However, it sounds to me like in those cases vacuum_defer_cleanup_age doesn't really help you either; you'd just want to pause WAL replay depending on your queries or whatever. After all, you'd have to feed the WAL files "manually" to replay, so you're in control anyway without having to touch the primary. I find it very hard to believe that people are doing stuff with vacuum_defer_cleanup_age that cannot be done with either of the other newer mechanisms, which have also seen much wider usage and so bugs fixed, etc. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote: > On 2023-Apr-14, Greg Stark wrote: > > I assume people would use hot_standby_feedback if they have streaming > > replication. > > Yes, either that or a replication slot. A replication slot doesn't do anything against snapshot conflicts, which is what we are discussing here. Or are we not? > > I find it very hard to believe that people are doing stuff with > vacuum_defer_cleanup_age that cannot be done with either of the other > newer mechanisms, which have also seen much wider usage and so bugs > fixed, etc. vacuum_defer_cleanup_age offers a more fine-grained approach. With hot_standby_feedback you can only say "don't ever remove any dead tuples that the standby still needs". But perhaps you'd prefer "don't remove dead tuples unless they are quite old", so that you can get your shorter queries on the standby to complete, without delaying replay and without the danger that a long running query on the standby bloats your primary. How about this: Let's remove vacuum_defer_cleanup_age, and put a note in the release notes that recommends using statement_timeout and hot_standby_feedback = on on the standby instead. That should have pretty much the same effect, and it is measured in time and not in the number of transactions. Yours, Laurenz Albe
On Fri, 14 Apr 2023 at 13:15, Laurenz Albe <laurenz.albe@cybertec.at> wrote: > > On Fri, 2023-04-14 at 18:43 +0200, Alvaro Herrera wrote: > > On 2023-Apr-14, Greg Stark wrote: > > > I assume people would use hot_standby_feedback if they have streaming > > > replication. > > > > Yes, either that or a replication slot. > > A replication slot doesn't do anything against snapshot conflicts, > which is what we are discussing here. Or are we not? They're related -- the replication slot holds the feedback xmin so that if your standby disconnects it can reconnect later and not have lost data in the meantime. At least I think that's what I think it does -- I don't know if I'm just assuming that, but xmin is indeed in pg_replication_slots. -- greg
On 4/14/23 1:15 PM, Laurenz Albe wrote: > Let's remove vacuum_defer_cleanup_age, and put a note in the release notes > that recommends using statement_timeout and hot_standby_feedback = on > on the standby instead. > That should have pretty much the same effect, and it is measured in > time and not in the number of transactions. +1. Jonathan
Attachment
On Fri, Apr 14, 2023 at 03:07:37PM -0400, Jonathan S. Katz wrote: > +1. +1. I agree with the upthread discussion and support removing vacuum_defer_cleanup_age. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-04-13 13:18:38 +0200, Alvaro Herrera wrote: > On 2023-Apr-11, Andres Freund wrote: > > > Updated patch attached. I think we should either apply something like that > > patch, or at least add a <warning/> to the docs. > > I gave this patch a look. The only code change is that > ComputeXidHorizons() and GetSnapshotData() no longer handle the case > where vacuum_defer_cleanup_age is different from zero. It looks good. > The TransactionIdRetreatSafely() code being removed is pretty weird (I > spent a good dozen minutes writing a complaint that your rewrite was > faulty, but it turns out I had misunderstood the function), so I'm glad > it's being retired. My rewrite of what? The creation of TransactionIdRetreatSafely() in be504a3e974? I'm afraid we'll need TransactionIdRetreatSafely() again, when we convert more things to 64bit xids (lest they end up with the same bug as fixed by be504a3e974), so it's perhaps worth thinking about how to make it less confusing. > > <para> > > - Similarly, <xref linkend="guc-hot-standby-feedback"/> > > - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against > > - relevant rows being removed by vacuum, but the former provides no > > - protection during any time period when the standby is not connected, > > - and the latter often needs to be set to a high value to provide adequate > > - protection. Replication slots overcome these disadvantages. > > + Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without > > + also using a replication slot, provides protection against relevant rows > > + being removed by vacuum, but provides no protection during any time period > > + when the standby is not connected. Replication slots overcome these > > + disadvantages. > > I think it made sense to have this paragraph be separate from the > previous one when it was talking about two separate variables, but now > that it's just one, it looks a bit isolated. I would merge it with the > one above, which is talking about pretty much the same thing, and > reorder the whole thing approximately like this > > <para> > In lieu of using replication slots, it is possible to prevent the removal > of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by > storing the segments in an archive using > <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>. > However, these methods often result in retaining more WAL segments than > required. > Similarly, <xref linkend="guc-hot-standby-feedback"/> without > a replication slot provides protection against relevant rows > being removed by vacuum, but provides no protection during any time period > when the standby is not connected. > </para> > <para> > Replication slots overcome these disadvantages by retaining only the number > of segments known to be needed. > On the other hand, replication slots can retain so > many WAL segments that they fill up the space allocated > for <literal>pg_wal</literal>; > <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files > retained by replication slots. > </para> It seems a bit confusing now, because "by retaining only the number of segments ..." now also should cover hs_feedback (due to merging), but doesn't. > Though the "However," looks a poor fit; I would do this: I agree, I don't like the however. I think I'll push the version I had. Then we can separately word-smith the section? Unless somebody protests I'm gonna do that soon. Greetings, Andres Freund
On 2023-Apr-22, Andres Freund wrote: > On 2023-04-13 13:18:38 +0200, Alvaro Herrera wrote: > > > > > Updated patch attached. I think we should either apply something like that > > > patch, or at least add a <warning/> to the docs. > > > > I gave this patch a look. The only code change is that > > ComputeXidHorizons() and GetSnapshotData() no longer handle the case > > where vacuum_defer_cleanup_age is different from zero. It looks good. > > The TransactionIdRetreatSafely() code being removed is pretty weird (I > > spent a good dozen minutes writing a complaint that your rewrite was > > faulty, but it turns out I had misunderstood the function), so I'm glad > > it's being retired. > > My rewrite of what? The creation of TransactionIdRetreatSafely() in > be504a3e974? I meant the code that used to call TransactionIdRetreatSafely() and that you're changing in the proposed patch. > I'm afraid we'll need TransactionIdRetreatSafely() again, when we convert more > things to 64bit xids (lest they end up with the same bug as fixed by > be504a3e974), so it's perhaps worth thinking about how to make it less > confusing. The one thing that IMO makes it less confusing is to have it return the value rather than modifying it in place. > > <para> > > Replication slots overcome these disadvantages by retaining only the number > > of segments known to be needed. > > On the other hand, replication slots can retain so > > many WAL segments that they fill up the space allocated > > for <literal>pg_wal</literal>; > > <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files > > retained by replication slots. > > </para> > > It seems a bit confusing now, because "by retaining only the number of > segments ..." now also should cover hs_feedback (due to merging), but doesn't. Hah, ok. > I think I'll push the version I had. Then we can separately word-smith the > section? Unless somebody protests I'm gonna do that soon. No objection. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On Mon, Apr 24, 2023 at 8:36 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > The one thing that IMO makes it less confusing is to have it return the > value rather than modifying it in place. Yeah, I don't understand why we have these functions that modify the value in place. Those are probably convenient here and there, but overall they seem to make things more confusing. -- Robert Haas EDB: http://www.enterprisedb.com
Not very convenient but if autovacuum is enabled isn't vacuum_defer_cleanup_age the way to make extensions like pg_dirtyread more effective for temporal queries to quickly correct human DML mistakes without the need of a complete restore, even if no standby is in use ? vacuum_defer_cleanup_age+pg_dirtyread give PostgreSQL something like "flashback query" in Oracle.
Best regards,
Phil
De : Andres Freund <andres@anarazel.de>
Envoyé : dimanche 23 avril 2023 00:47
À : Alvaro Herrera <alvherre@alvh.no-ip.org>
Cc : Justin Pryzby <pryzby@telsasoft.com>; pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>; Amit Kapila <amit.kapila16@gmail.com>
Objet : Re: Should we remove vacuum_defer_cleanup_age?
Envoyé : dimanche 23 avril 2023 00:47
À : Alvaro Herrera <alvherre@alvh.no-ip.org>
Cc : Justin Pryzby <pryzby@telsasoft.com>; pgsql-hackers@postgresql.org <pgsql-hackers@postgresql.org>; Amit Kapila <amit.kapila16@gmail.com>
Objet : Re: Should we remove vacuum_defer_cleanup_age?
Hi,
On 2023-04-13 13:18:38 +0200, Alvaro Herrera wrote:
> On 2023-Apr-11, Andres Freund wrote:
>
> > Updated patch attached. I think we should either apply something like that
> > patch, or at least add a <warning/> to the docs.
>
> I gave this patch a look. The only code change is that
> ComputeXidHorizons() and GetSnapshotData() no longer handle the case
> where vacuum_defer_cleanup_age is different from zero. It looks good.
> The TransactionIdRetreatSafely() code being removed is pretty weird (I
> spent a good dozen minutes writing a complaint that your rewrite was
> faulty, but it turns out I had misunderstood the function), so I'm glad
> it's being retired.
My rewrite of what? The creation of TransactionIdRetreatSafely() in
be504a3e974?
I'm afraid we'll need TransactionIdRetreatSafely() again, when we convert more
things to 64bit xids (lest they end up with the same bug as fixed by
be504a3e974), so it's perhaps worth thinking about how to make it less
confusing.
> > <para>
> > - Similarly, <xref linkend="guc-hot-standby-feedback"/>
> > - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against
> > - relevant rows being removed by vacuum, but the former provides no
> > - protection during any time period when the standby is not connected,
> > - and the latter often needs to be set to a high value to provide adequate
> > - protection. Replication slots overcome these disadvantages.
> > + Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without
> > + also using a replication slot, provides protection against relevant rows
> > + being removed by vacuum, but provides no protection during any time period
> > + when the standby is not connected. Replication slots overcome these
> > + disadvantages.
>
> I think it made sense to have this paragraph be separate from the
> previous one when it was talking about two separate variables, but now
> that it's just one, it looks a bit isolated. I would merge it with the
> one above, which is talking about pretty much the same thing, and
> reorder the whole thing approximately like this
>
> <para>
> In lieu of using replication slots, it is possible to prevent the removal
> of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
> storing the segments in an archive using
> <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>.
> However, these methods often result in retaining more WAL segments than
> required.
> Similarly, <xref linkend="guc-hot-standby-feedback"/> without
> a replication slot provides protection against relevant rows
> being removed by vacuum, but provides no protection during any time period
> when the standby is not connected.
> </para>
> <para>
> Replication slots overcome these disadvantages by retaining only the number
> of segments known to be needed.
> On the other hand, replication slots can retain so
> many WAL segments that they fill up the space allocated
> for <literal>pg_wal</literal>;
> <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files
> retained by replication slots.
> </para>
It seems a bit confusing now, because "by retaining only the number of
segments ..." now also should cover hs_feedback (due to merging), but doesn't.
> Though the "However," looks a poor fit; I would do this:
I agree, I don't like the however.
I think I'll push the version I had. Then we can separately word-smith the
section? Unless somebody protests I'm gonna do that soon.
Greetings,
Andres Freund
On 2023-04-13 13:18:38 +0200, Alvaro Herrera wrote:
> On 2023-Apr-11, Andres Freund wrote:
>
> > Updated patch attached. I think we should either apply something like that
> > patch, or at least add a <warning/> to the docs.
>
> I gave this patch a look. The only code change is that
> ComputeXidHorizons() and GetSnapshotData() no longer handle the case
> where vacuum_defer_cleanup_age is different from zero. It looks good.
> The TransactionIdRetreatSafely() code being removed is pretty weird (I
> spent a good dozen minutes writing a complaint that your rewrite was
> faulty, but it turns out I had misunderstood the function), so I'm glad
> it's being retired.
My rewrite of what? The creation of TransactionIdRetreatSafely() in
be504a3e974?
I'm afraid we'll need TransactionIdRetreatSafely() again, when we convert more
things to 64bit xids (lest they end up with the same bug as fixed by
be504a3e974), so it's perhaps worth thinking about how to make it less
confusing.
> > <para>
> > - Similarly, <xref linkend="guc-hot-standby-feedback"/>
> > - and <xref linkend="guc-vacuum-defer-cleanup-age"/> provide protection against
> > - relevant rows being removed by vacuum, but the former provides no
> > - protection during any time period when the standby is not connected,
> > - and the latter often needs to be set to a high value to provide adequate
> > - protection. Replication slots overcome these disadvantages.
> > + Similarly, <xref linkend="guc-hot-standby-feedback"/> on its own, without
> > + also using a replication slot, provides protection against relevant rows
> > + being removed by vacuum, but provides no protection during any time period
> > + when the standby is not connected. Replication slots overcome these
> > + disadvantages.
>
> I think it made sense to have this paragraph be separate from the
> previous one when it was talking about two separate variables, but now
> that it's just one, it looks a bit isolated. I would merge it with the
> one above, which is talking about pretty much the same thing, and
> reorder the whole thing approximately like this
>
> <para>
> In lieu of using replication slots, it is possible to prevent the removal
> of old WAL segments using <xref linkend="guc-wal-keep-size"/>, or by
> storing the segments in an archive using
> <xref linkend="guc-archive-command"/> or <xref linkend="guc-archive-library"/>.
> However, these methods often result in retaining more WAL segments than
> required.
> Similarly, <xref linkend="guc-hot-standby-feedback"/> without
> a replication slot provides protection against relevant rows
> being removed by vacuum, but provides no protection during any time period
> when the standby is not connected.
> </para>
> <para>
> Replication slots overcome these disadvantages by retaining only the number
> of segments known to be needed.
> On the other hand, replication slots can retain so
> many WAL segments that they fill up the space allocated
> for <literal>pg_wal</literal>;
> <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files
> retained by replication slots.
> </para>
It seems a bit confusing now, because "by retaining only the number of
segments ..." now also should cover hs_feedback (due to merging), but doesn't.
> Though the "However," looks a poor fit; I would do this:
I agree, I don't like the however.
I think I'll push the version I had. Then we can separately word-smith the
section? Unless somebody protests I'm gonna do that soon.
Greetings,
Andres Freund
Hi, On 2023-04-24 14:36:36 +0200, Alvaro Herrera wrote: > On 2023-Apr-22, Andres Freund wrote: > > I'm afraid we'll need TransactionIdRetreatSafely() again, when we convert more > > things to 64bit xids (lest they end up with the same bug as fixed by > > be504a3e974), so it's perhaps worth thinking about how to make it less > > confusing. > > The one thing that IMO makes it less confusing is to have it return the > value rather than modifying it in place. Partially I made it that way because you otherwise end up repeating long variable names multiple times within one statement, yielding long repetitive lines... Not sure that's a good enough reason, but ... > > > <para> > > > Replication slots overcome these disadvantages by retaining only the number > > > of segments known to be needed. > > > On the other hand, replication slots can retain so > > > many WAL segments that they fill up the space allocated > > > for <literal>pg_wal</literal>; > > > <xref linkend="guc-max-slot-wal-keep-size"/> limits the size of WAL files > > > retained by replication slots. > > > </para> > > > > It seems a bit confusing now, because "by retaining only the number of > > segments ..." now also should cover hs_feedback (due to merging), but doesn't. > > Hah, ok. > > > I think I'll push the version I had. Then we can separately word-smith the > > section? Unless somebody protests I'm gonna do that soon. > > No objection. Cool. Pushed now.