Thread: getting rid of freezing
Hi, after having discussed $subject shortly over dinner yesterday, while I should have been preparing the slides for my talk I noticed that there might be a rather easy way to get rid of freezing. I think that the existence of hint bits and the crash safe visibility maps should provide sufficient tooling to make freezing unneccessary without loosing much information for debugging if we modify the way vacuum works a bit. Currently, aside from recovery, we only set all visible in vacuum. vacuumlazy.c's lazy_scan_heap currently works like: for (blkno = 0; blkno < nblocks; blkno++) { if (!scan_all && invisible) continue; /* cannot lock buffer immediately */ if (!ConditionalLockBufferForCleanup(buf)) { if (!scan_all) continue; /* don't block if we don't need freezing */ if (!lazy_check_needs_freeze(buf)) continue; /* now wait for cleanup lock */ LockBufferForCleanup(buf); } for (tuple in all_tuples) { cleanup_tuple(); } if (nfrozen > 0) log_heap_freeze() if (all_visible) { PageSetAllVisible(page); visibilitymap_set(page); } } In other words, if we don't need to make sure there aren't any old tuples, we only scan visible parts of the relation. If we are making a freeze vacuum we scan the whole relation, waiting for a cleanup lock on the relation if necessary. We currently need to make sure we scanned the whole relation and have frozen everything to have a sensible relfrozenxid for a relation. So, what I propose instead is basically: 1) only vacuum non-all-visible pages, even when doing it for anti-wraparound 2) When we can set all-visible guarantee that all tuples on the page are fully hinted. During recovery do the same, so wedon't need to log all hint bits. We can do this with only an exclusive lock on the buffer, we don't need a cleanup lock. 3) When we cannot mark a page all-visible or we cannot get the cleanup lock, remember the oldest xmin on that page. We couldset all visible in the former case, but we want the page to be cleaned up sometime soonish. 4) If we can get the cleanup lock, purge dead tuples from the page and the indexes, just as today. Set the page as all-visible. That way we know that any page that is all-visible doesn't ever need to look at xmin/xmax since we are sure to have set all relevant hint bits. We don't even necessarily need to log the hint bits for all items since the redo for all_visible could make sure all items are hinted. The only problem is knowing up to where we can truncate pg_clog... Makes sense? Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-05-23 19:51:48 +0200, Andres Freund wrote: > I think that the existence of hint bits and the crash safe visibility > maps should provide sufficient tooling to make freezing unneccessary > without loosing much information for debugging if we modify the way > vacuum works a bit. > That way we know that any page that is all-visible doesn't ever need to > look at xmin/xmax since we are sure to have set all relevant hint > bits. One case that would make this problematic is row level locks on tuples. We would need to unset all visible for them, otherwise we might do the wrong thing when looking at xmax... Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 2013-05-23 19:51:48 +0200, Andres Freund wrote: > We currently need to make sure we scanned the whole relation and have > frozen everything to have a sensible relfrozenxid for a relation. > > So, what I propose instead is basically: > 1) only vacuum non-all-visible pages, even when doing it for > anti-wraparound > 2) When we can set all-visible guarantee that all tuples on the page are > fully hinted. During recovery do the same, so we don't need to log > all hint bits. > We can do this with only an exclusive lock on the buffer, we don't > need a cleanup lock. > 3) When we cannot mark a page all-visible or we cannot get the cleanup > lock, remember the oldest xmin on that page. We could set all visible > in the former case, but we want the page to be cleaned up sometime > soonish. > 4) If we can get the cleanup lock, purge dead tuples from the page and > the indexes, just as today. Set the page as all-visible. > > That way we know that any page that is all-visible doesn't ever need to > look at xmin/xmax since we are sure to have set all relevant hint > bits. Heikki noticed that I made quite the omission here which is that you would need to mark tuples as all visible as well. I was thinking about using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, May 23, 2013 at 1:51 PM, Andres Freund <andres@2ndquadrant.com> wrote: > So, what I propose instead is basically: > 1) only vacuum non-all-visible pages, even when doing it for > anti-wraparound Check. We might want an option to force a scan of the whole relation. > 2) When we can set all-visible guarantee that all tuples on the page are > fully hinted. During recovery do the same, so we don't need to log > all hint bits. > We can do this with only an exclusive lock on the buffer, we don't > need a cleanup lock. I don't think this works. Emitting XLOG_HEAP_VISIBLE for a heap page does not emit an FPI for the heap page, only (if needed) for the visibility map page. So a subsequent crash that tears the page could keep XLOG_HEAP_VISIBLE but lose other changes on the page - i.e. the hint bits. > 3) When we cannot mark a page all-visible or we cannot get the cleanup > lock, remember the oldest xmin on that page. We could set all visible > in the former case, but we want the page to be cleaned up sometime > soonish. I think you mean "in the latter case" not "in the former case". If not, then I'm confused. > 4) If we can get the cleanup lock, purge dead tuples from the page and > the indexes, just as today. Set the page as all-visible. > > That way we know that any page that is all-visible doesn't ever need to > look at xmin/xmax since we are sure to have set all relevant hint > bits. > > We don't even necessarily need to log the hint bits for all items since > the redo for all_visible could make sure all items are hinted. The only > problem is knowing up to where we can truncate pg_clog... The redo for all_visible cannot make sure all items are hinted. Again, there's no FPI on the heap page. The heap page could in fact contain dead tuples at the time we mark it all-visible. Consider, for example: 0. Checkpoint. 1. The buffer becomes all visible. 2. A tuple is inserted, making the buffer not-all-visible. 3. The page is written by the OS. 4. Crash. Now, recovery will first find the record marking the buffer all-visible, and will mark it all-visible. Now the all-visible bit on the page is flat-out wrong, but it doesn't matter because we haven't reached consistency. Next we'll find the heap-insert record, which will have an FPI, since it's the first WAL-logged change to the buffer since the last checkpoint. Now the FPI fixes everything and we're back in a sane state. Now in this particular case it wouldn't hurt anything if the redo routine that set the all-visible bit also hinted all the tuples, because the FPI is going to overwrite it anyway. But suppose in lieu of steps (3) and (4) we write half of the page and then crash, leaving behind a torn page. Now it's pretty crazy to think about trying to hint tuples; the page may be in a completely insane state. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 05/23/2013 10:03 PM, Andres Freund wrote: > On 2013-05-23 19:51:48 +0200, Andres Freund wrote: >> We currently need to make sure we scanned the whole relation and have >> frozen everything to have a sensible relfrozenxid for a relation. >> >> So, what I propose instead is basically: >> 1) only vacuum non-all-visible pages, even when doing it for >> anti-wraparound >> 2) When we can set all-visible guarantee that all tuples on the page are >> fully hinted. During recovery do the same, so we don't need to log >> all hint bits. >> We can do this with only an exclusive lock on the buffer, we don't >> need a cleanup lock. >> 3) When we cannot mark a page all-visible or we cannot get the cleanup >> lock, remember the oldest xmin on that page. We could set all visible >> in the former case, but we want the page to be cleaned up sometime >> soonish. >> 4) If we can get the cleanup lock, purge dead tuples from the page and >> the indexes, just as today. Set the page as all-visible. >> >> That way we know that any page that is all-visible doesn't ever need to >> look at xmin/xmax since we are sure to have set all relevant hint >> bits. > Heikki noticed that I made quite the omission here which is that you > would need to mark tuples as all visible as well. I was thinking about > using HEAP_MOVED_OFF | HEAP_MOVED_IN as a hint for that. We could have a "vacuum_less=true" mode, where instead of marking tuples all visible here you actually freeze them, that is set the xid to frozen. You will get less forensic capability in exchange of less vacuuming. Maybe also add an "early_freeze" hint bit to mark this situation. Or maybe set the tuples frozenxid when un-marking the page as all visible to delay the effects a little ? Hannu > > Greetings, > > Andres Freund >
On 2013-05-23 22:09:02 -0400, Robert Haas wrote: > On Thu, May 23, 2013 at 1:51 PM, Andres Freund <andres@2ndquadrant.com> wrote: > > So, what I propose instead is basically: > > 1) only vacuum non-all-visible pages, even when doing it for > > anti-wraparound > > Check. We might want an option to force a scan of the whole relation. Yea, thought of that as well. VACUUM (DEEP) ;). > > 3) When we cannot mark a page all-visible or we cannot get the cleanup > > lock, remember the oldest xmin on that page. We could set all visible > > in the former case, but we want the page to be cleaned up sometime > > soonish. > I think you mean "in the latter case" not "in the former case". If > not, then I'm confused. Uh. Yes. > > We don't even necessarily need to log the hint bits for all items since > > the redo for all_visible could make sure all items are hinted. The only > > problem is knowing up to where we can truncate pg_clog... > [all-visible cannot restore hint bits without FPI because of torn pages] I haven't yet thought about this sufficiently yet. I think we might have a chance of working around this, let me ponder a bit. But even if that means needing a full page write via the usual mechanism for all visible if any hint bits needed to be set we are still out far ahead of the current state imo. * cleanup would quite possibly do an FPI shortly after in vacuum anyway. If we do it for all visible, it possibly does notneed to be done for it. * freezing would FPI almost guaranteedly since we do it so much later. * Not having to rescan the whole heap will be a bigger cost saving... Greetings, Andres Freund --Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> [all-visible cannot restore hint bits without FPI because of torn pages] > > I haven't yet thought about this sufficiently yet. I think we might have > a chance of working around this, let me ponder a bit. Yeah. I too feel like there might be a solution. But I don't know have something specific in mind, yet anyway. > But even if that means needing a full page write via the usual mechanism > for all visible if any hint bits needed to be set we are still out far > ahead of the current state imo. > * cleanup would quite possibly do an FPI shortly after in vacuum > anyway. If we do it for all visible, it possibly does not need to be > done for it. > * freezing would FPI almost guaranteedly since we do it so much > later. > * Not having to rescan the whole heap will be a bigger cost saving... The basic problem is that if the data is going to be removed before it would have gotten frozen, then the extra FPIs are just overhead. In effect, we're just deciding to freeze a lot sooner. And while that might well be beneficial in some use cases (e.g. the data's already in cache) it might also not be so beneficial (the table is larger than cache and would have been dropped before freezing kicked in). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-05-24 11:29:10 -0400, Robert Haas wrote: > > But even if that means needing a full page write via the usual mechanism > > for all visible if any hint bits needed to be set we are still out far > > ahead of the current state imo. > > * cleanup would quite possibly do an FPI shortly after in vacuum > > anyway. If we do it for all visible, it possibly does not need to be > > done for it. > > * freezing would FPI almost guaranteedly since we do it so much > > later. > > * Not having to rescan the whole heap will be a bigger cost saving... > > The basic problem is that if the data is going to be removed before it > would have gotten frozen, then the extra FPIs are just overhead. In > effect, we're just deciding to freeze a lot sooner. Well, freezing without removing information for debugging. > And while that > might well be beneficial in some use cases (e.g. the data's already in > cache) it might also not be so beneficial (the table is larger than > cache and would have been dropped before freezing kicked in). Not sure how caching comes into play here? At this point we know the page to be in cache already since vacuum is looking at it anyway? I think it's not really comparable since in those situations we a) already do an XLogInsert(). b) already dirty the page. so the only change is that we possibly write an additionall full page image. If there is actually near future DML write activity that would make the all-visible superflous that would have to FPI likely anyway. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>> [all-visible cannot restore hint bits without FPI because of torn pages] >> >> I haven't yet thought about this sufficiently yet. I think we might have >> a chance of working around this, let me ponder a bit. > > Yeah. I too feel like there might be a solution. But I don't know > have something specific in mind, yet anyway. One thought I had is that it might be beneficial to freeze when a page ceases to be all-visible, rather than when it becomes all-visible. Any operation that makes the page not-all-visible is going to emit an FPI anyway, so we don't have to worry about torn pages in that case. Under such a scheme, we'd have to enforce the rule that xmin and xmax are ignored for any page that is all-visible; and when a page ceases to be all-visible, we have to go back and really freeze the pre-existing tuples. I think we might be able to use the existing all_visible_cleared/new_all_visible_cleared flags to trigger this behavior, without adding anything new to WAL at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 24, 2013 at 11:52 AM, Andres Freund <andres@2ndquadrant.com> wrote: >> The basic problem is that if the data is going to be removed before it >> would have gotten frozen, then the extra FPIs are just overhead. In >> effect, we're just deciding to freeze a lot sooner. > > Well, freezing without removing information for debugging. Sure, but what I'm trying to avoid is incurring the WAL cost of freezing. If we didn't mind paying that sooner, we could just drop vacuum_freeze_min/table_age. But we do mind that. >> And while that >> might well be beneficial in some use cases (e.g. the data's already in >> cache) it might also not be so beneficial (the table is larger than >> cache and would have been dropped before freezing kicked in). > > Not sure how caching comes into play here? At this point we know the > page to be in cache already since vacuum is looking at it anyway? OK, true. > I think it's not really comparable since in those situations we a) > already do an XLogInsert(). b) already dirty the page. so the only > change is that we possibly write an additionall full page image. If > there is actually near future DML write activity that would make the > all-visible superflous that would have to FPI likely anyway. Well, if there's near-future write activity, then freezing is pretty worthless anyway. What I'm trying to avoid is adding WAL overhead in the case where there *isnt* any near-future write activity, like inserting 100MB of data into an existing table. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 5/24/13 9:53 AM, Andres Freund wrote: >>> We don't even necessarily need to log the hint bits for all items since >>> > >the redo for all_visible could make sure all items are hinted. The only >>> > >problem is knowing up to where we can truncate pg_clog... >> >[all-visible cannot restore hint bits without FPI because of torn pages] > I haven't yet thought about this sufficiently yet. I think we might have > a chance of working around this, let me ponder a bit. > > But even if that means needing a full page write via the usual mechanism > for all visible if any hint bits needed to be set we are still out far > ahead of the current state imo. > * cleanup would quite possibly do an FPI shortly after in vacuum > anyway. If we do it for all visible, it possibly does not need to be > done for it. > * freezing would FPI almost guaranteedly since we do it so much > later. > * Not having to rescan the whole heap will be a bigger cost saving... Would we only set all the hint bits within vacuum? If so I don't think the WAL hit matters at all, because vacuum is almostalways a background, throttled process. -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
Andres, If I understand your solution correctly, though, this doesn't really help the pathological case for freezing, which is the time-oriented append-only table. For data which isn't being used, allvisible won't be set either because it won't have been read, no? Is it still cheaper to set allvisible than vacuum freeze even in that case? Don't get me wrong, I'm in favor of this if it fixes the other (more common) cases. I just want to be clear on the limitations. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-05-24 15:49:31 -0400, Josh Berkus wrote: > If I understand your solution correctly, though, this doesn't really > help the pathological case for freezing, which is the time-oriented > append-only table. For data which isn't being used, allvisible won't be > set either because it won't have been read, no? Is it still cheaper to > set allvisible than vacuum freeze even in that case? all visible is only set in vacuum and it determines which parts of a table will be scanned in a non full table vacuum. So, since we won't regularly start vacuum in the insert only case there will still be a batch of work at once. But nearly all of that work is *already* performed. We would just what the details of that around for a bit. *But* since we now would only need to vacuum the non all-visible part that would get noticeably cheaper as well. I think for that case we should run vacuum more regularly for insert only tables since we currently don't do regularly enough which a) increases the amount of work needed at once and b) prevents index only scans from working there. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/24/2013 07:00 PM, Robert Haas wrote: > On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>>> [all-visible cannot restore hint bits without FPI because of torn pages] >>> I haven't yet thought about this sufficiently yet. I think we might have >>> a chance of working around this, let me ponder a bit. >> Yeah. I too feel like there might be a solution. But I don't know >> have something specific in mind, yet anyway. > One thought I had is that it might be beneficial to freeze when a page > ceases to be all-visible, rather than when it becomes all-visible. That what I aimed to describe in my mail earlier, but your description is much clearer :) > Any operation that makes the page not-all-visible is going to emit an > FPI anyway, so we don't have to worry about torn pages in that case. > Under such a scheme, we'd have to enforce the rule that xmin and xmax > are ignored for any page that is all-visible; Agreed. We already relay on all-visible pages enough that we can trust it to be correct. Making that universal rule should not add any risks . The rule "page all-visible ==> assume all tuples frozen" would also enable VACUUM FREEZE to only work only on the non-all-visible pages . > and when a page ceases > to be all-visible, we have to go back and really freeze the > pre-existing tuples. We can do this unconditionally, or in milder case use vacuum_freeze_min_age if we want to retain xids for forensic purposes. > I think we might be able to use the existing > all_visible_cleared/new_all_visible_cleared flags to trigger this > behavior, without adding anything new to WAL at all. This seems to be easiest -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 24 May 2013 17:00, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>>> [all-visible cannot restore hint bits without FPI because of torn pages] >>> >>> I haven't yet thought about this sufficiently yet. I think we might have >>> a chance of working around this, let me ponder a bit. >> >> Yeah. I too feel like there might be a solution. But I don't know >> have something specific in mind, yet anyway. > > One thought I had is that it might be beneficial to freeze when a page > ceases to be all-visible, rather than when it becomes all-visible. > Any operation that makes the page not-all-visible is going to emit an > FPI anyway, so we don't have to worry about torn pages in that case. > Under such a scheme, we'd have to enforce the rule that xmin and xmax > are ignored for any page that is all-visible; and when a page ceases > to be all-visible, we have to go back and really freeze the > pre-existing tuples. I think we might be able to use the existing > all_visible_cleared/new_all_visible_cleared flags to trigger this > behavior, without adding anything new to WAL at all. I like the idea but it would mean we'd have to freeze in the foreground path rather in a background path. Have we given up on the double buffering idea to remove FPIs completely? If we did that, then this wouldn't work. Anyway, I take it the direction of this idea is that "we don't need a separate freezemap, just use the vismap". That seems to be forcing ideas down a particular route we may regret. I'd rather just keep those things separate, even if we manage to merge the WAL actions for most of the time. Some other related thoughts: ISTM that if we really care about keeping xids for debug purposes that it could be a parameter. For the mainline, we just freeze blocks at the same time we do page pruning. I think the right way is actually to rethink and simplify all this complexity of Freezing/Pruning/Hinting/Visibility --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres, > all visible is only set in vacuum and it determines which parts of a > table will be scanned in a non full table vacuum. So, since we won't > regularly start vacuum in the insert only case there will still be a > batch of work at once. But nearly all of that work is *already* > performed. We would just what the details of that around for a > bit. *But* since we now would only need to vacuum the non all-visible > part that would get noticeably cheaper as well. Yeah, I can see that. Seems worthwhile, then. > I think for that case we should run vacuum more regularly for insert > only tables since we currently don't do regularly enough which a) increases > the amount of work needed at once and b) prevents index only scans from > working there. Yes. I'm not sure how we would set this though; I think it's another example of how autovacuum's parameters for when to vaccuum etc. are too simple-minded for the real world. Doing an all-visible scan on an insert-only table, for example, should be based on XID age and not on % inserted, no? Speaking of which, I need to get on revamping the math for autoanalyze. Mind you, in the real-world insert-only table case, this does create extra IO -- real insert-only tables often have a few rows ( < 5% ) which are updated/deleted. Vacuum would see these and want to clean the pages up, which would create much more substantial IO. It might still be a good tradeoff, but we should be aware of it. Unless we want a special VACUUM ALL VISIBLE mode. I vote no, unless we demonstrate some really convincing case for it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 05/25/2013 01:14 PM, Simon Riggs wrote: > On 24 May 2013 17:00, Robert Haas <robertmhaas@gmail.com> wrote: >> On Fri, May 24, 2013 at 11:29 AM, Robert Haas <robertmhaas@gmail.com> wrote: >>> On Fri, May 24, 2013 at 10:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: >>>>> [all-visible cannot restore hint bits without FPI because of torn pages] >>>> I haven't yet thought about this sufficiently yet. I think we might have >>>> a chance of working around this, let me ponder a bit. >>> Yeah. I too feel like there might be a solution. But I don't know >>> have something specific in mind, yet anyway. >> One thought I had is that it might be beneficial to freeze when a page >> ceases to be all-visible, rather than when it becomes all-visible. >> Any operation that makes the page not-all-visible is going to emit an >> FPI anyway, so we don't have to worry about torn pages in that case. >> Under such a scheme, we'd have to enforce the rule that xmin and xmax >> are ignored for any page that is all-visible; and when a page ceases >> to be all-visible, we have to go back and really freeze the >> pre-existing tuples. I think we might be able to use the existing >> all_visible_cleared/new_all_visible_cleared flags to trigger this >> behavior, without adding anything new to WAL at all. > I like the idea but it would mean we'd have to freeze in the > foreground path rather in a background path. > > Have we given up on the double buffering idea to remove FPIs > completely? If we did that, then this wouldn't work. > > Anyway, I take it the direction of this idea is that "we don't need a > separate freezemap, just use the vismap". That seems to be forcing > ideas down a particular route we may regret. I'd rather just keep > those things separate, even if we manage to merge the WAL actions for > most of the time. > > > Some other related thoughts: > > ISTM that if we really care about keeping xids for debug purposes that > it could be a parameter. For the mainline, we just freeze blocks at > the same time we do page pruning. > > I think the right way is actually to rethink and simplify all this > complexity of Freezing/Pruning/Hinting/Visibility I think that tis xmin, xmax business is mainly leftovers from the time when PostgreSQL was a full history database. If we are happy to descide that we do not want to resurrect this feature, at least not the same way, then freezing at the earliest or most convenient possibility seems the way to go . The "forensic" part has always been just a nice side effect of this design and not the main design considerataion. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Andres, I was talking this over with Jeff on the plane, and we wanted to be clear on your goals here: are you looking to eliminate the *write* cost of freezing, or just the *read* cost of re-reading already frozen pages? If just the latter, what about just adding a bit to the visibility map to indicate that the page is frozen? That seems simpler than what you're proposing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Sat, May 25, 2013 at 6:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> One thought I had is that it might be beneficial to freeze when a page >> ceases to be all-visible, rather than when it becomes all-visible. >> Any operation that makes the page not-all-visible is going to emit an >> FPI anyway, so we don't have to worry about torn pages in that case. >> Under such a scheme, we'd have to enforce the rule that xmin and xmax >> are ignored for any page that is all-visible; and when a page ceases >> to be all-visible, we have to go back and really freeze the >> pre-existing tuples. I think we might be able to use the existing >> all_visible_cleared/new_all_visible_cleared flags to trigger this >> behavior, without adding anything new to WAL at all. > > I like the idea but it would mean we'd have to freeze in the > foreground path rather in a background path. That's true, but I think with this approach it would be really cheap. The overhead of setting a few bits in a page is very small compared to the overhead of emitting a WAL record. We'd have to test it, but I wouldn't be surprised to find the cost is too small to measure. > Have we given up on the double buffering idea to remove FPIs > completely? If we did that, then this wouldn't work. I don't see why those things are mutually exclusive. What is the relationship? > Anyway, I take it the direction of this idea is that "we don't need a > separate freezemap, just use the vismap". That seems to be forcing > ideas down a particular route we may regret. I'd rather just keep > those things separate, even if we manage to merge the WAL actions for > most of the time. Hmm. To me it seems highly desirable to merge those things, because they're basically the same thing. The earliest time at which we can freeze a tuple is when it's all-visible, and the only argument I've ever heard for waiting longer is to preserve the original xmin for forensic purposes, which I think we can do anyway. I have posted a patch for that on another thread. I don't like having two separate concepts where one will do; I think the fact that it is structured that way today is mostly an artifact of one setting being page-level and the other tuple-level, which is a thin excuse for so much complexity. > I think the right way is actually to rethink and simplify all this > complexity of Freezing/Pruning/Hinting/Visibility I agree, but I think that's likely to have to wait until we get a pluggable storage API, and then a few years beyond that for someone to develop the technology to enable the new and better way. In the meantime, if we can eliminate or even reduce the impact of freezing in the near term, I think that's worth doing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: > I was talking this over with Jeff on the plane, and we wanted to be > clear on your goals here: are you looking to eliminate the *write* cost > of freezing, or just the *read* cost of re-reading already frozen pages? Both. The latter is what I have seen causing more hurt, but the former alone is painful enough. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 05/28/2013 07:17 AM, Andres Freund wrote: > On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: >> I was talking this over with Jeff on the plane, and we wanted to be >> clear on your goals here: are you looking to eliminate the *write* cost >> of freezing, or just the *read* cost of re-reading already frozen pages? > > Both. The latter is what I have seen causing more hurt, but the former > alone is painful enough. I guess I don't see how your proposal is reducing the write cost for most users then? - for users with frequently, randomly updated data, pdallvisible would not be ever set, so they still need to be rewritten to freeze - for users with append-only tables, allvisible would never be set since those pages don't get vacuumed - it would prevent us from getting rid of allvisible, which has a documented and known write overhead This means that your optimization would benefit only users whose pages get updated occasionally (enough to trigger vaccuum) but not too frequently (which would unset allvisible). While we lack statistics, intuition suggests that this is a minority of databases. If we just wanted to reduce read cost, why not just take a simpler approach and give the visibility map a "isfrozen" bit? Then we'd know which pages didn't need rescanning without nearly as much complexity. That would also make it more effective to do precautionary vacuum freezing. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 2013-05-28 09:29:26 -0700, Josh Berkus wrote: > On 05/28/2013 07:17 AM, Andres Freund wrote: > > On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: > >> I was talking this over with Jeff on the plane, and we wanted to be > >> clear on your goals here: are you looking to eliminate the *write* cost > >> of freezing, or just the *read* cost of re-reading already frozen pages? > > > > Both. The latter is what I have seen causing more hurt, but the former > > alone is painful enough. > > I guess I don't see how your proposal is reducing the write cost for > most users then? > > - for users with frequently, randomly updated data, pdallvisible would > not be ever set, so they still need to be rewritten to freeze If they update all data they simply never need to get frozen since they are not old enough. > - for users with append-only tables, allvisible would never be set since > those pages don't get vacuumed They do get vacuumed at least every autovacuum_freeze_max_age even now. And we should vacuum them more often to make index only scan work without manual intervention. > - it would prevent us from getting rid of allvisible, which has a > documented and known write overhead Aha. > This means that your optimization would benefit only users whose pages > get updated occasionally (enough to trigger vaccuum) but not too > frequently (which would unset allvisible). While we lack statistics, > intuition suggests that this is a minority of databases. I don't think that follows. > If we just wanted to reduce read cost, why not just take a simpler > approach and give the visibility map a "isfrozen" bit? Then we'd know > which pages didn't need rescanning without nearly as much complexity. > That would also make it more effective to do precautionary vacuum freezing. Because we would still write/dirty/xlog the changes three times? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Tue, May 28, 2013 at 12:29 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 05/28/2013 07:17 AM, Andres Freund wrote: >> On 2013-05-26 16:58:58 -0700, Josh Berkus wrote: >>> I was talking this over with Jeff on the plane, and we wanted to be >>> clear on your goals here: are you looking to eliminate the *write* cost >>> of freezing, or just the *read* cost of re-reading already frozen pages? >> >> Both. The latter is what I have seen causing more hurt, but the former >> alone is painful enough. > > I guess I don't see how your proposal is reducing the write cost for > most users then? > > - for users with frequently, randomly updated data, pdallvisible would > not be ever set, so they still need to be rewritten to freeze Do these users never run vacuum? As of 9.3, vacuum phase 2 will typically set PD_ALL_VISIBLE on each relevant page. The only time that this WON'T happen is if an insert, update, or delete hits the page after phases 1 of vacuum and before phase 2 of vacuum. I don't think that's going to be the common case. > - for users with append-only tables, allvisible would never be set since > those pages don't get vacuumed There's no good solution for append-only tables. Eventually, they will get vacuumed, and when that happens, PD_ALL_VISIBLE will be set, and freezing will also happen. I don't think anything that is being proposed here is going to make that a whole lot better, but it shouldn't make it any worse than it is now, either. Since it's probably not solvable without a rewrite of the heap AM, I'm not going to feel too bad about that. > - it would prevent us from getting rid of allvisible, which has a > documented and known write overhead Again, I think this is going to be much less of an issue with 9.3, for the reason explained above. In 9.2 and prior, we'd scan a page with dead tuples, prune them to line pointers, vacuum the indexes, and then mark the dead pointers as unused. Then, the NEXT vacuum would revisit the same page and dirty it again ONLY to mark it all-visible. But in 9.3, the first vacuum will mark the page all-visible at the same time it marks the dead line pointers unused. So the write overhead of PD_ALL_VISIBLE should basically be gone. If it's not, it would be good to know why. > If we just wanted to reduce read cost, why not just take a simpler > approach and give the visibility map a "isfrozen" bit? Then we'd know > which pages didn't need rescanning without nearly as much complexity. That would break pg_upgrade, which would have to remove visibility map forks when upgrading. More importantly, it would require another round of complex changes to the write-ahead logging in this area. It's not obvious to me that we'd end up ahead of where we are today, although perhaps I am a pessimist. > That would also make it more effective to do precautionary vacuum freezing. But wouldn't it be a whole lot nicer if we just didn't have to do vacuum freezing AT ALL? The point here is to absorb freezing into some other operation that we already have to do. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Tue, 2013-05-28 at 19:51 -0400, Robert Haas wrote: > > If we just wanted to reduce read cost, why not just take a simpler > > approach and give the visibility map a "isfrozen" bit? Then we'd know > > which pages didn't need rescanning without nearly as much complexity. > > That would break pg_upgrade, which would have to remove visibility map > forks when upgrading. More importantly, it would require another > round of complex changes to the write-ahead logging in this area. > It's not obvious to me that we'd end up ahead of where we are today, > although perhaps I am a pessimist. If we removed PD_ALL_VISIBLE, then this would be very simple, right? We would just follow normal logging rules for setting the visible or frozen bit. Regards,Jeff Davis
On Tue, 2013-05-28 at 09:29 -0700, Josh Berkus wrote: > - it would prevent us from getting rid of allvisible, which has a > documented and known write overhead It would? I don't think these proposals are necessarily in conflict. It's not entirely clear to me how they fit together in detail, but it seems like it may be possible -- it may even simplify things. Regards,Jeff Davis
On 28 May 2013 15:15, Robert Haas <robertmhaas@gmail.com> wrote: > On Sat, May 25, 2013 at 6:14 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> I think the right way is actually to rethink and simplify all this >> complexity of Freezing/Pruning/Hinting/Visibility > > I agree, but I think that's likely to have to wait until we get a > pluggable storage API, and then a few years beyond that for someone to > develop the technology to enable the new and better way. In the > meantime, if we can eliminate or even reduce the impact of freezing in > the near term, I think that's worth doing. I think we can do better more quickly than that. Andres' basic idea of skipping freeze completely was a valuable one and is the right way forwards. And it looks like the epoch based approach that Heikki and I have come up seems likely to end up somewhere workable. --Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services