Thread: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )
[ moving to -hackers ] If this topic has been discussed previously, please point me to the earlier threads. Why aren't we more opportunistic about freezing tuples? For instance, if we already have a dirty buffer in cache, we should be more aggressive about freezing those tuples than freezing tuples on disk. I looked at the code, and it looks like if we freeze one tuple on the page during VACUUM, we mark it dirty. Wouldn't that be a good opportunity to freeze all the other tuples on the page that we can? Or, perhaps when the bgwriter is flushing dirty buffers, it can look for opportunities to set hint bits or freeze tuples. Regards, Jeff Davis
Jeff Davis wrote: > Why aren't we more opportunistic about freezing tuples? For instance, if > we already have a dirty buffer in cache, we should be more aggressive > about freezing those tuples than freezing tuples on disk. The most widely cited reason is that you lose forensics data. Although they are increasingly rare, there are still situations in which the heap tuple machinery messes up and the xmin/xmax/etc fields of the tuple are the best/only way to find out what happened and thus fix the bug. If you freeze early, there's just no way to know. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
[PERFORM] Re: freezing tuples ( was: Why is vacuum_freeze_min_age100m? )
From
"Kevin Grittner"
Date:
Alvaro Herrera <alvherre@commandprompt.com> wrote: > Jeff Davis wrote: > >> Why aren't we more opportunistic about freezing tuples? For >> instance, if we already have a dirty buffer in cache, we should be >> more aggressive about freezing those tuples than freezing tuples on >> disk. > > The most widely cited reason is that you lose forensics data. > Although they are increasingly rare, there are still situations in > which the heap tuple machinery messes up and the xmin/xmax/etc > fields of the tuple are the best/only way to find out what happened > and thus fix the bug. If you freeze early, there's just no way to > know. Although I find it hard to believe that this is compelling argument in the case where an entire table or database is loaded in a single database transaction. In the more general case, I'm not sure why this argument applies here but not to cassert and other diagnostic options. It wouldn't surprise me to find workloads where writing data three times (once for the data, once for hint bits, and once to freeze the tid) affects performance more than cassert. -Kevin
On Thu, 2009-08-13 at 17:58 -0400, Alvaro Herrera wrote: > The most widely cited reason is that you lose forensics data. Although > they are increasingly rare, there are still situations in which the heap > tuple machinery messes up and the xmin/xmax/etc fields of the tuple are > the best/only way to find out what happened and thus fix the bug. If > you freeze early, there's just no way to know. As it stands, it looks like it's not just one extra write for each buffer, but potentially many (theoretically, as many as there are tuples on a page). I suppose the reasoning is that tuples on the same page have approximately the same xmin, and are likely to be frozen at the same time. But it seems entirely reasonable that the xmins on one page span several VACUUM runs, and that seems more likely with the FSM. That means that a few tuples on the page are older than 100M and get frozen, and the rest are only about 95M transactions old, so we have to come back and freeze them again, later. Let's say that we had a range like 50-100M, where if it's older than 100M, we freeze it, and if it's older than 50M we freeze it only if it's on a dirty page. We would still have forensic evidence, but we could make a range such that we avoid writing multiple times. And people who don't care about forensic evidence can set it to 0-100M. Regards,Jeff Davis
On Thu, 2009-08-13 at 17:17 -0500, Kevin Grittner wrote: > It wouldn't surprise > me to find workloads where writing data three times (once for the > data, once for hint bits, and once to freeze the tid) I'm not sure that we're limited to 3 times, here. I could be missing something, but if you have tuples with different xmins on the same page, some might be older than 100M, which you freeze, and then you will have to come back later to freeze the rest. As far as I can tell, the maximum number of writes is the number of tuples that fit on the page. Regards,Jeff Davis
On Thu, Aug 13, 2009 at 5:33 PM, Jeff Davis<pgsql@j-davis.com> wrote: > Or, perhaps when the bgwriter is flushing dirty buffers, it can look for > opportunities to set hint bits or freeze tuples. One of the tricky things here is that the time you are mostly likely to want to do this is when you are loading a lot of data. But in that case shared buffers are likely to be written back to disk before transaction commit, so it'll be too early to do anything. ...Robert
Re: [PERFORM] Re: freezing tuples ( was: Why is vacuum_freeze_min_age 100m? )
From
Josh Berkus
Date:
>> Why aren't we more opportunistic about freezing tuples? For instance, if >> we already have a dirty buffer in cache, we should be more aggressive >> about freezing those tuples than freezing tuples on disk. > > The most widely cited reason is that you lose forensics data. Although > they are increasingly rare, there are still situations in which the heap > tuple machinery messes up and the xmin/xmax/etc fields of the tuple are > the best/only way to find out what happened and thus fix the bug. If > you freeze early, there's just no way to know. That argument doesn't apply. If the page is in memory and is being written anyway, and some of the rows are past vacuum_freeze_min_age, then why not freeze them rather than waiting for a vacuum process to read them off disk and rewrite them? We're not talking about freezing every tuple as soon as it's out of scope. Just the ones which are more that 100m (or whatever the setting is) old. I seriously doubt that anyone is doing useful forensics using xids which are 100m old. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
Jeff Davis <pgsql@j-davis.com> writes: > Let's say that we had a range like 50-100M, where if it's older than > 100M, we freeze it, and if it's older than 50M we freeze it only if it's > on a dirty page. We would still have forensic evidence, but we could > make a range such that we avoid writing multiple times. Yeah, making the limit "slushy" would doubtless save some writes, with not a lot of downside. > And people who don't care about forensic evidence can set it to 0-100M. Everybody *thinks* they don't care about forensic evidence. Until they need it. regards, tom lane
On Thu, 2009-08-13 at 18:46 -0400, Tom Lane wrote: > Yeah, making the limit "slushy" would doubtless save some writes, with > not a lot of downside. OK, then should we make this a TODO? I'll make an attempt at this. > > And people who don't care about forensic evidence can set it to 0-100M. > > Everybody *thinks* they don't care about forensic evidence. Until they > need it. We already allow setting vacuum_freeze_min_age to zero, so I don't see a solution here other than documentation. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Thu, 2009-08-13 at 18:46 -0400, Tom Lane wrote: >> Everybody *thinks* they don't care about forensic evidence. Until they >> need it. > We already allow setting vacuum_freeze_min_age to zero, so I don't see a > solution here other than documentation. Yeah, we allow it. I just don't want to encourage it ... and definitely not make it default. What are you envisioning exactly? If vacuum finds any reason to dirty a page (or it's already dirty), then freeze everything on the page that's got age > some lower threshold? regards, tom lane
Jeff, Tom, >> Let's say that we had a range like 50-100M, where if it's older than >> 100M, we freeze it, and if it's older than 50M we freeze it only if it's >> on a dirty page. We would still have forensic evidence, but we could >> make a range such that we avoid writing multiple times. > > Yeah, making the limit "slushy" would doubtless save some writes, with > not a lot of downside. This would mean two settings: vacuum_freeze_min_age and vacuum_freeze_dirty_age. And we'd need to add those to the the autovacuum settings for each table as well. While we could just make one setting 1/2 of the other, that prevents me from saying: "freeze this table agressively if it's in memory, but wait a long time to vaccuum if it's on disk" I can completely imagine a table which has a vacuum_freeze_dirty_age of 10000 and a vacuum_freeze_min_age of 1m. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
> What are you envisioning exactly? If vacuum finds any reason to dirty > a page (or it's already dirty), then freeze everything on the page that's > got age > some lower threshold? I was envisioning, if the page is already dirty and in memory *for any reason*, the freeze rows at below some threshold. -- Josh Berkus PostgreSQL Experts Inc. www.pgexperts.com
On Thu, 2009-08-13 at 19:05 -0400, Tom Lane wrote: > What are you envisioning exactly? If vacuum finds any reason to dirty > a page (or it's already dirty), then freeze everything on the page that's > got age > some lower threshold? Yes. There are two ways to do the threshold: 1. Constant fraction of vacuum_freeze_min_age 2. Extra GUC I lean toward #1, because it avoids an extra GUC*, and it avoids the awkwardness when the "lower" setting is higher than the "higher" setting. However, #2 might be nice for people who want to live on the edge or experiment with new values. But I suspect most of the advantage would be had just by saying that we opportunistically freeze tuples older than 50% of vacuum_freeze_min_age. Regards,Jeff Davis *: As an aside, these GUCs already have incredibly confusing names, and an extra variable would increase the confusion. For instance, they seem to use "min" and "max" interchangeably.
On Thu, 2009-08-13 at 18:25 -0400, Robert Haas wrote: > On Thu, Aug 13, 2009 at 5:33 PM, Jeff Davis<pgsql@j-davis.com> wrote: > > Or, perhaps when the bgwriter is flushing dirty buffers, it can look for > > opportunities to set hint bits or freeze tuples. > > One of the tricky things here is that the time you are mostly likely > to want to do this is when you are loading a lot of data. But in that > case shared buffers are likely to be written back to disk before > transaction commit, so it'll be too early to do anything. I think it would be useful in other cases, like avoiding repeated freezing of different tuples on the same page. Regards,Jeff Davis
Josh Berkus <josh@agliodbs.com> writes: >> What are you envisioning exactly? If vacuum finds any reason to dirty >> a page (or it's already dirty), then freeze everything on the page that's >> got age > some lower threshold? > I was envisioning, if the page is already dirty and in memory *for any > reason*, the freeze rows at below some threshold. I believe we've had this discussion before. I do *NOT* want freezing operations pushed into any random page access, and in particular will do my best to veto any attempt to put them into the bgwriter. Freezing requires accessing the clog and emitting a WAL record, and neither is appropriate for low-level code like bgwriter. The deadlock potential alone is sufficient reason why not. regards, tom lane
On Fri, Aug 14, 2009 at 12:07 AM, Josh Berkus<josh@agliodbs.com> wrote: > "freeze this table agressively if it's in memory, but wait a long time > to vaccuum if it's on disk" Waitasec, "in memory"? There are two projects here: 1) Make vacuum when it's freezing tuples freeze every tuple > lesser age if it finds any tuples which are > max_age (or I suppose if the page is already dirty due to vacuum or something else). Vacuum still has to read in all the pages before it finds out that they don't need freezing so it doesn't mean distinguishing "in memory" from "needs to be read in". 2) Have something like bgwriter check if the page is dirty and vacuum and freeze things based on the lesser threshold. This would effectively only be vacuuming things that are "in memory" However the latter is a more complex and frought project. We looked at this a while back in EDB and we found that the benefits were less than we expected and the complexities more than we expected. I would recommend sticking with (1) for now and only looking at (2) if we have a more detailed plan and solid testable use cases. -- greg http://mit.edu/~gsstark/resume.pdf
On Fri, Aug 14, 2009 at 12:21 AM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >> I was envisioning, if the page is already dirty and in memory *for any >> reason*, the freeze rows at below some threshold. > > I believe we've had this discussion before. I do *NOT* want freezing > operations pushed into any random page access, and in particular will > do my best to veto any attempt to put them into the bgwriter. It's possible Josh accidentally waved this red flag and really meant just to make it conditional on whether the page is dirty rather than on whether vacuum dirtied it. However he did give me a thought.... With the visibility map vacuum currently only covers pages that are known to have in-doubt tuples. That's why we have the anti-wraparound vacuums. However it could also check if the pages its skipping are in memory and process them if they are even if they don't have in-doubt tuples. Or it could first go through ram and process any pages that are in cache before going to the visibility map and starting from page 0, which would hopefully avoid having to read them in later when we get to them and find they've been flushed out. I'm just brainstorming here. I'm not sure if either of these are actually worth the complexity and danger of finding new bottlenecks in special case optimization codepaths. -- greg http://mit.edu/~gsstark/resume.pdf
Jeff Davis <pgsql@j-davis.com> writes: > Yes. There are two ways to do the threshold: > 1. Constant fraction of vacuum_freeze_min_age > 2. Extra GUC > I lean toward #1, because it avoids an extra GUC*, and it avoids the > awkwardness when the "lower" setting is higher than the "higher" > setting. I tend to agree with Josh that you do need to offer two knobs. But expressing the second knob as a fraction (with range 0 to 1) might be better than an independent "min" parameter. As you say, that'd be useful to prevent people from setting them inconsistently. > *: As an aside, these GUCs already have incredibly confusing names, and > an extra variable would increase the confusion. For instance, they seem > to use "min" and "max" interchangeably. Some of them are in fact max's, I believe. They are complicated :-(. It might be worth somebody taking two steps back and seeing if we need quite so many knobs. I think we got here partly by not wanting to predetermine vacuuming strategies, but it doesn't help to offer flexibility if people can't figure out how to use it. regards, tom lane
On Fri, 2009-08-14 at 14:37 -0400, Tom Lane wrote: > I tend to agree with Josh that you do need to offer two knobs. But > expressing the second knob as a fraction (with range 0 to 1) might be > better than an independent "min" parameter. As you say, that'd be > useful to prevent people from setting them inconsistently. Ok. Any ideas for a name? Josh suggests "vacuum_freeze_dirty_age" (or perhaps he was using at as a placeholder). I don't particularly like that name, but I can't think of anything better without renaming vacuum_freeze_min_age. > > *: As an aside, these GUCs already have incredibly confusing names, and > > an extra variable would increase the confusion. For instance, they seem > > to use "min" and "max" interchangeably. > > Some of them are in fact max's, I believe. Looking at the definitions of vacuum_freeze_min_age and autovacuum_freeze_max_age there seems to be almost no distinction between "min" and "max" in those two names. I've complained about this before: http://archives.postgresql.org/pgsql-hackers/2008-12/msg01731.php I think both are essentially thresholds, so giving them two names with opposite meaning is misleading. Regards,Jeff Davis
On fre, 2009-08-14 at 13:57 -0700, Jeff Davis wrote: > Looking at the definitions of vacuum_freeze_min_age and > autovacuum_freeze_max_age there seems to be almost no distinction > between "min" and "max" in those two names. For min, the action happens at or above the min values. For max, the action happens at or below the max value. With those two particular parameters, the freezing happens exactly between the min and the max value.
On Sun, 2009-08-16 at 02:02 +0300, Peter Eisentraut wrote: > For min, the action happens at or above the min values. For max, the > action happens at or below the max value. >From the docs, 23.1.4: "autovacuum is invoked on any table that might contain XIDs older than the age specified by the configuration parameter autovacuum_freeze_max_age" I interpret that to mean that the forced autovacuum run happens above the value. You could reasonably call it the "minimum age of relfrozenxid that will cause autovacuum to forcibly run a vacuum". Similarly, you could call vacuum_freeze_min_age "the maximum age a tuple can be before a vacuum will freeze it". I'm not trying to be argumentative, I'm just trying to show that it can be confusing if you interpret it the wrong way. The first time I saw those configuration names, I was confused, and ever since, I have to think about it: "is that variable called min or max?". My general feeling is that both of these are thresholds. The only real maximum happens near wraparound. > With those two particular parameters, the freezing happens exactly > between the min and the max value. Thanks, that's a helpful way to remember it. It may be a little obsolete because now the freezing will normally happen between vacuum_freeze_min_age and vacuum_freeze_table_age; but at least I should be able to remember which of the other parameters is "min" and which one is "max". Regards,Jeff Davis
On lör, 2009-08-15 at 16:55 -0700, Jeff Davis wrote: > Similarly, you could call vacuum_freeze_min_age "the maximum age a > tuple > can be before a vacuum will freeze it". Heh, you could also call max_connections the "minimum number of connections before the server will refuse new connection attempts". It's not easy ... ;-)
Jeff Davis <pgsql@j-davis.com> wrote: > There are two ways to do the threshold: > 1. Constant fraction of vacuum_freeze_min_age > 2. Extra GUC I appreciate that there may be room to improve this while protecting the forensic values; but there are already strategies for managing the day-to-day performance issues as long as you have adequate backup to not need to rely on old XID information for recovery. What we don't have covered is loading a database from pg_dump without having to rewrite all pages at least once afterward -- and likely two more times, with most maintenance strategies. I seem to remember that someone took a shot at making a special case of WAL-bypassed inserts, but there was a problem or two that were hard to overcome. Does anyone recall the details? Was that about pre-setting the hint bits for a successful commit (based on the fact that the entire table will be empty if rolled back and no data will be visible to any other transaction until commit), or was it about setting the frozen XID in the inserted tuples (based on the fact that this is no less useful for forensic purposes than having all rows set to any other value)? Should we have a TODO item for this special case, or is it "not wanted" or viewed as having intractable problems? -Kevin