Thread: Re: removing PD_ALL_VISIBLE
On Wed, May 29, 2013 at 1:11 PM, Jeff Davis <pgsql@j-davis.com> wrote: > On Tue, 2013-05-28 at 19:51 -0400, Robert Haas wrote: >> > If we just wanted to reduce read cost, why not just take a simpler >> > approach and give the visibility map a "isfrozen" bit? Then we'd know >> > which pages didn't need rescanning without nearly as much complexity. >> >> That would break pg_upgrade, which would have to remove visibility map >> forks when upgrading. More importantly, it would require another >> round of complex changes to the write-ahead logging in this area. >> It's not obvious to me that we'd end up ahead of where we are today, >> although perhaps I am a pessimist. > > If we removed PD_ALL_VISIBLE, then this would be very simple, right? We > would just follow normal logging rules for setting the visible or frozen > bit. I don't see how that makes Josh's proposal any simpler. His proposal was to change, in a backward-incompatible fashion, the contents of the visibility map. Getting rid of PD_ALL_VISIBLE will not eliminate that backward-incompatibility. Neither will it eliminate the need to keep the visibility/freeze map in sync with the heap itself. Whether we get rid of PD_ALL_VISIBLE or not, we'll still have to go look at every type of WAL record that clears the visibility map bit and make it clear both the visibility and freeze bits. We'll still need a WAL record to set the visibility map bit, just as we do today, and we'll also need a new WAL record type (or a change to the existing WAL record type) to set the all-frozen bit, when applicable. Now, independently of Josh's proposal, we could change PD_ALL_VISIBLE to emit FPIs for every heap page it touches. For pages that have been hit by updates or deletes, this would be pretty much free, in 9.3, since the PD_ALL_VISIBLE bit will probably be set at the same time we're setting dead line pointers to unused, which is a WAL-logged operation anyway. However, for pages that have been hit only by inserts, this would emit many extra FPIs. Again independently of Josh's proposal, we could eliminate PD_ALL_VISIBLE. This would require either surrendering the optimization whereby sequential scans can skip visibility checks on individual tuples within the page, or referring to the visibility map to get the bit that way. I know you tested this and couldn't measure an impact, but with all respect I find that result hard to accept. Contention around buffer locks and pins is very real; why should it matter on other workloads and not matter on this one? It would also require page modifications prior to consistency to clear the visibility map bit unconditionally, instead of only when PD_ALL_VISIBLE is set on the page (though I think it'd be OK to pay that price if it ended there). AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we eliminate one write cycle; that is, we won't dirty the page once to hint it and then again to mark it all-visible. But as of 9.3, that should really only be a problem in the insert-only case. And in that case, my proposal to consider all-visible pages as frozen would be a huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every page in the heap, rather than XLOG_HEAP_FREEZE. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, 2013-05-29 at 22:46 -0400, Robert Haas wrote: > Again independently of Josh's proposal, we could eliminate > PD_ALL_VISIBLE. This would require either surrendering the > optimization whereby sequential scans can skip visibility checks on > individual tuples within the page, or referring to the visibility map > to get the bit that way. I know you tested this and couldn't measure > an impact, but with all respect I find that result hard to accept. > Contention around buffer locks and pins is very real; why should it > matter on other workloads and not matter on this one? The number of pins required during a sequential scan without my patch is: PAGES_IN_HEAP The number of pins required during a sequential scan with my patch is: PAGES_IN_HEAP + ceil(PAGES_IN_HEAP/HEAP_PAGES_PER_VM_PAGE) Because HEAP_PAGES_PER_VM_PAGE is huge, the second term only matters when N is very small and the "ceil" is significant. So, I simply elected not to use the VM if the table is less than 32 pages in size. For such small tables, the benefit of using a page-at-a-time visibility check was not apparent in my tests anyway. > AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we > eliminate one write cycle; that is, we won't dirty the page once to > hint it and then again to mark it all-visible. But as of 9.3, that > should really only be a problem in the insert-only case. And in that > case, my proposal to consider all-visible pages as frozen would be a > huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every > page in the heap, rather than XLOG_HEAP_FREEZE. Agreed. Regards,Jeff Davis
On Thu, May 30, 2013 at 12:06 AM, Jeff Davis <pgsql@j-davis.com> wrote: >> AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we >> eliminate one write cycle; that is, we won't dirty the page once to >> hint it and then again to mark it all-visible. But as of 9.3, that >> should really only be a problem in the insert-only case. And in that >> case, my proposal to consider all-visible pages as frozen would be a >> huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every >> page in the heap, rather than XLOG_HEAP_FREEZE. > > Agreed. Just to quantify that a bit more, I ran this command a couple of times: dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum' And also this one: dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum freeze' In the first one, the vacuum at the end takes about 25 seconds. In the second one, it takes about 15 minutes, during which time there's one CPU core running at about 10%; the remainder of the time is spent waiting for disk I/O. A little follow-up testing shows that the vacuum emits 88MB of WAL, while the vacuum freeze emits 13GB of WAL. This is on the 16-core, 64-thread IBM POWER box with the following non-default configuration settings: shared_buffers = 8GB maintenance_work_mem = 1GB synchronous_commit = off checkpoint_segments = 300 checkpoint_timeout = 15min checkpoint_completion_target = 0.9 log_line_prefix = '%t [%p] ' Andres' proposal for freezing at the same time we mark pages all-visible relies on emitting FPIs when we mark pages all-visible, but I hope that the test above is convincing evidence that it would be *really* expensive for some users. My proposal to consider all-visible pages as frozen avoids that cost, but as far as I can see, it also requires PD_ALL_VISIBLE to stick around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-05-30 07:54:38 -0400, Robert Haas wrote: > On Thu, May 30, 2013 at 12:06 AM, Jeff Davis <pgsql@j-davis.com> wrote: > >> AFAICS, the main benefit of eliminating PD_ALL_VISIBLE is that we > >> eliminate one write cycle; that is, we won't dirty the page once to > >> hint it and then again to mark it all-visible. But as of 9.3, that > >> should really only be a problem in the insert-only case. And in that > >> case, my proposal to consider all-visible pages as frozen would be a > >> huge win, because you'd only need to emit XLOG_HEAP_VISIBLE for every > >> page in the heap, rather than XLOG_HEAP_FREEZE. > > > > Agreed. > > Just to quantify that a bit more, I ran this command a couple of times: > > dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; > sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum' > > And also this one: > > dropdb rhaas ; sleep 5 ; createdb ; sleep 5 ; pgbench -i -s 1000 -n; > sleep 5 ; time psql -c checkpoint ; time psql -c 'vacuum freeze' > > In the first one, the vacuum at the end takes about 25 seconds. In > the second one, it takes about 15 minutes, during which time there's > one CPU core running at about 10%; the remainder of the time is spent > waiting for disk I/O. A little follow-up testing shows that the > vacuum emits 88MB of WAL, while the vacuum freeze emits 13GB of WAL. > > This is on the 16-core, 64-thread IBM POWER box with the following > non-default configuration settings: > > shared_buffers = 8GB > maintenance_work_mem = 1GB > synchronous_commit = off > checkpoint_segments = 300 > checkpoint_timeout = 15min > checkpoint_completion_target = 0.9 > log_line_prefix = '%t [%p] ' > > Andres' proposal for freezing at the same time we mark pages > all-visible relies on emitting FPIs when we mark pages all-visible, > but I hope that the test above is convincing evidence that it would be > *really* expensive for some users. My proposal to consider > all-visible pages as frozen avoids that cost I think I basically suggested treating all visible as frozen, didn't I? If not, I had lost sync between my fingers and my thoughts which happens too often ;). You had noticed that my proposed was lacking a bit around when we omit FPIs for the page while setting all-visible, but we both thought that we may find a workaround that - which looking at the page level flag first basically is. As far as I understand the trick basically is that we can rely on an FPI being logged when an action unsetting ALL_VISIBLE is performed. That all-visible would then make sure the hint-bits marking indvidual tuples as frozen would hit disk. For that we need to add some more work though, consider: 1) write tuples on a page 2) "freeze" page by setting ALL_VISIBLE and setting hint bits. Setting ALL_VISIBLE is wall logged 3) crash 4) replay ALL_VISIBLE, set it on the page level. The individual tuples are *not* guaranteed to be marked frozen. 5) update tuple on the page unsetting all visible. Emits an FPI which does *not* have the tuples marked as frozen. Easy enough and fairly cheap to fix by having a function that checks that updates the hint bits on a page when unsetting all visible since we can just set it for all pre-existing tuples. > but as far as I can see, it also requires PD_ALL_VISIBLE to stick > around. Now, I am far from being convinced its a good idea to get rid of PD_ALL_VISIBLE, but I don't think it does. Except that it currently is legal for the page level ALL_VISIBLE being set while the corresponding visibilitymap one isn't there's not much prohibiting us fundamentally from looking in the vm when we need to know whether the page is all visible, is there? To the contrary, this actually seems to be a pretty good case for Jeff's proposed behaviour since it would allow freezing while only writing the vm? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 30.05.2013 15:12, Andres Freund wrote: > Now, I am far from being convinced its a good idea to get rid of > PD_ALL_VISIBLE, but I don't think it does. Except that it currently is > legal for the page level ALL_VISIBLE being set while the corresponding > visibilitymap one isn't there's not much prohibiting us fundamentally > from looking in the vm when we need to know whether the page is all > visible, is there? Hmm, so you're suggesting that the visibility map would be *required* to interpret the pages correctly. Ie. if you just zapped the visibility map, you'd lose critical information and the heap would appear to be corrupt. I guess that's possible, but it makes me quite uneasy. At the moment, it's relieving to know that it's always safe to just truncate the visibility map in case of emergency. - Heikki
On 2013-05-30 15:34:04 +0300, Heikki Linnakangas wrote: > On 30.05.2013 15:12, Andres Freund wrote: > >Now, I am far from being convinced its a good idea to get rid of > >PD_ALL_VISIBLE, but I don't think it does. Except that it currently is > >legal for the page level ALL_VISIBLE being set while the corresponding > >visibilitymap one isn't there's not much prohibiting us fundamentally > >from looking in the vm when we need to know whether the page is all > >visible, is there? > > Hmm, so you're suggesting that the visibility map would be *required* to > interpret the pages correctly. Ie. if you just zapped the visibility map, > you'd lose critical information and the heap would appear to be corrupt. I > guess that's possible, but it makes me quite uneasy. At the moment, it's > relieving to know that it's always safe to just truncate the visibility map > in case of emergency. I didn't say its a good idea, just that I don't think Robert's conclusion is necessarily valid. But requiring only the few kbytes of the vm to be written instead of all of the heap during freezeing (or whatever we would call it) has quite some allure, I admit that. But I think that should be a separate project to reeingineering how freezing works. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Thu, May 30, 2013 at 8:12 AM, Andres Freund <andres@2ndquadrant.com> wrote: > As far as I understand the trick basically is that we can rely on an FPI > being logged when an action unsetting ALL_VISIBLE is performed. That > all-visible would then make sure the hint-bits marking indvidual tuples > as frozen would hit disk. For that we need to add some more work though, > consider: > > 1) write tuples on a page > 2) "freeze" page by setting ALL_VISIBLE and setting hint > bits. Setting ALL_VISIBLE is wall logged > 3) crash > 4) replay ALL_VISIBLE, set it on the page level. The individual tuples > are *not* guaranteed to be marked frozen. > 5) update tuple on the page unsetting all visible. Emits an FPI which > does *not* have the tuples marked as frozen. > > Easy enough and fairly cheap to fix by having a function that checks > that updates the hint bits on a page when unsetting all visible since we > can just set it for all pre-existing tuples. Basically, yes, though I would say "infomask bits" rather than "hint bits", since not all of them are only hints, and this case would not be merely a hint. >> but as far as I can see, it also requires PD_ALL_VISIBLE to stick >> around. > > Now, I am far from being convinced its a good idea to get rid of > PD_ALL_VISIBLE, but I don't think it does. Except that it currently is > legal for the page level ALL_VISIBLE being set while the corresponding > visibilitymap one isn't there's not much prohibiting us fundamentally > from looking in the vm when we need to know whether the page is all > visible, is there? > To the contrary, this actually seems to be a pretty good case for Jeff's > proposed behaviour since it would allow freezing while only writing the > vm? Well, as Heikki points out, I think that's unacceptably dangerous. Loss or corruption of a single visibility map page means possible loss of half a gigabyte of data. Also, if we go that route, looking at the visibility map is no longer an optimization; it's essential for correctness. We can't decide to skip it when it seems expensive, for example, as Jeff was proposing. There's another thing that's bothering me about this whole discussion, too. If looking at another page for the information we need to make visibility decisions is so cheap that we need not be concerned with it, then why do we need hint bits? I realize that it's not quite the same thing, because CLOG doesn't have as much locality of access as the visibility map; you're guaranteed to find all the information you need for a single heap page on a single VM page. Also, CLOG is per-tuple, not per-page, and we get a decent speed-up from checking once for the whole page rather than for each tuple. Nonetheless, I think the contrast between Jeff's tests, which aren't showing much impact from the increased visibility map traffic, and previous hint-bit removal tests, which have crashed and burned, may be caused in part by the fact that our algorithms and locking regimen for shared_buffers are much more sophisticated than for SLRU. I'm not eager to have our design decisions driven by that gap. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Thu, May 30, 2013 at 09:47:22AM -0400, Robert Haas wrote: > Well, as Heikki points out, I think that's unacceptably dangerous. > Loss or corruption of a single visibility map page means possible loss > of half a gigabyte of data. > > Also, if we go that route, looking at the visibility map is no longer > an optimization; it's essential for correctness. We can't decide to > skip it when it seems expensive, for example, as Jeff was proposing. Isn't the visibility map already required for proper return results as we use it for index-only scans. I think the optimization-only ship has sailed. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
On 2013-05-31 13:14:13 -0400, Bruce Momjian wrote: > On Thu, May 30, 2013 at 09:47:22AM -0400, Robert Haas wrote: > > Well, as Heikki points out, I think that's unacceptably dangerous. > > Loss or corruption of a single visibility map page means possible loss > > of half a gigabyte of data. > > > > Also, if we go that route, looking at the visibility map is no longer > > an optimization; it's essential for correctness. We can't decide to > > skip it when it seems expensive, for example, as Jeff was proposing. > > Isn't the visibility map already required for proper return results as > we use it for index-only scans. I think the optimization-only ship has > sailed. At the moment we can remove it without causing corruption. If we were to use it for freezing we couldn't anymore. So there's a difference - how big it is I am not sure. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
>> Isn't the visibility map already required for proper return results as >> we use it for index-only scans. I think the optimization-only ship has >> sailed. > > At the moment we can remove it without causing corruption. If we were to > use it for freezing we couldn't anymore. So there's a difference - how > big it is I am not sure. Depends on your definition of corruption, really. But yes, right now, the vismap can lose bits without causing any corruption, and making all-frozen depend on it would eliminate that. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, May 31, 2013 at 10:28:12AM -0700, Josh Berkus wrote: > > >> Isn't the visibility map already required for proper return results as > >> we use it for index-only scans. I think the optimization-only ship has > >> sailed. > > > > At the moment we can remove it without causing corruption. If we were to > > use it for freezing we couldn't anymore. So there's a difference - how > > big it is I am not sure. > > Depends on your definition of corruption, really. > > But yes, right now, the vismap can lose bits without causing any > corruption, and making all-frozen depend on it would eliminate that. Roberts statement was: > Loss or corruption of a single visibility map page means possible loss > of half a gigabyte of data. Certainly unidentified corruption of a visibility map page could easily cause incorrect results. So, technically, _adding_ bits would cause corruption. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
Bruce, > Roberts statement was: > >> Loss or corruption of a single visibility map page means possible loss >> of half a gigabyte of data. I fail to be alarmed at this; currently losing a single page of the clog causes just as widespread corruption (worse, actually, since it's not confined to one table). It does point to the eventual need to checksum these things, though. > Certainly unidentified corruption of a visibility map page could easily > cause incorrect results. So, technically, _adding_ bits would cause > corruption. Yes, that's already true. I'm pointing out that if we depend on the vismap for all-frozen, then losing bits *also* causes corruption, so that's something we need to test for. Right now, there is no possible corruption from losing bits; we simply end up scannning more pages than we have to. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, May 31, 2013 at 11:00:19AM -0700, Josh Berkus wrote: > Bruce, > > > Roberts statement was: > > > >> Loss or corruption of a single visibility map page means possible loss > >> of half a gigabyte of data. > > I fail to be alarmed at this; currently losing a single page of the clog > causes just as widespread corruption (worse, actually, since it's not > confined to one table). It does point to the eventual need to checksum > these things, though. > > > Certainly unidentified corruption of a visibility map page could easily > > cause incorrect results. So, technically, _adding_ bits would cause > > corruption. > > Yes, that's already true. I'm pointing out that if we depend on the > vismap for all-frozen, then losing bits *also* causes corruption, so > that's something we need to test for. Right now, there is no possible > corruption from losing bits; we simply end up scannning more pages than > we have to. Right, and it is hard to see that losing and adding are somehow more/less likely, so it seems we already realy on the visibility map being correct. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + It's impossible for everything to be true. +
* Bruce Momjian (bruce@momjian.us) wrote: > Right, and it is hard to see that losing and adding are somehow > more/less likely, so it seems we already realy on the visibility map > being correct. Yes, but there's also a way to get *back* to a valid state if things go south with the visibility map. It's similar to an index today- sure, you might get corruption, and that might give you wrong results, but if you detect it, it's easy to correct- rebuild the index, drop the visibility map, etc. With CRCs, it'll be even easier to detect and discover when you need to do such a thing. Corruption in the heap is certainly bad too- you may lose some records, but you could zero those pages out and move on with only losing 8k or what-have-you. Clog corruption is certainly a bad thing, but if nearly everything in your DB is already frozen, it's not as bad as it *could* be. I wonder if you could rebuild a portion of clog from the WAL.. There are a lot of different tradeoffs here, certainly. It's certainly nice that single bit or single page corruption can often be recovered from by rebuilding an index or rebuilding the visbility map or just nuking a few records from a table. CRCs help with identifying when corruption has happened, but they do nothing for what to *do* when it happens, and "restore completely from backup" isn't a great answer when you've got terrabytes of data. Where I'm going with this whole thing is simply that I do worry a bit about using a bitmap for freeze, or similar, information and not being able to reconstruct that bitmap from the heap. Perhaps that's overly paranoid, but, well, we also write the same data out to disk in multiple places multiple times- some might call that paranoid too. ;) Thanks, Stephen
On 05/31/2013 12:22 PM, Stephen Frost wrote: > Where I'm going with this whole thing is simply that I do worry a bit > about using a bitmap for freeze, or similar, information and not being > able to reconstruct that bitmap from the heap. Perhaps that's overly > paranoid, but, well, we also write the same data out to disk in multiple > places multiple times- some might call that paranoid too. ;) On the other hand, we could combine Heikki's proposal (epoch numbers in the page header) together with using the visibility map for pages we don't need to vacuum freeze, and get vastly better behavior without increasing corruption risk that I can see. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
* Josh Berkus (josh@agliodbs.com) wrote: > On 05/31/2013 12:22 PM, Stephen Frost wrote: > > Where I'm going with this whole thing is simply that I do worry a bit > > about using a bitmap for freeze, or similar, information and not being > > able to reconstruct that bitmap from the heap. Perhaps that's overly > > paranoid, but, well, we also write the same data out to disk in multiple > > places multiple times- some might call that paranoid too. ;) > > On the other hand, we could combine Heikki's proposal (epoch numbers in > the page header) together with using the visibility map for pages we > don't need to vacuum freeze, and get vastly better behavior without > increasing corruption risk that I can see. That was actually my thinking as well.. Thanks, Stephen
On Fri, May 31, 2013 at 1:44 PM, Bruce Momjian <bruce@momjian.us> wrote: > On Fri, May 31, 2013 at 10:28:12AM -0700, Josh Berkus wrote: >> >> Isn't the visibility map already required for proper return results as >> >> we use it for index-only scans. I think the optimization-only ship has >> >> sailed. >> > >> > At the moment we can remove it without causing corruption. If we were to >> > use it for freezing we couldn't anymore. So there's a difference - how >> > big it is I am not sure. >> >> Depends on your definition of corruption, really. >> >> But yes, right now, the vismap can lose bits without causing any >> corruption, and making all-frozen depend on it would eliminate that. > > Roberts statement was: > >> Loss or corruption of a single visibility map page means possible loss >> of half a gigabyte of data. > > Certainly unidentified corruption of a visibility map page could easily > cause incorrect results. So, technically, _adding_ bits would cause > corruption. Adding bits could cause tuples that ought to be invisible to be treated as visible. Currently, removing bits is harmless (except to performance), but if we used the VM bit to indicate whether the page was frozen in lieu of actually freezing it, a cleared bit would potentially cause vacuum to nuke everything on that page. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, May 31, 2013 at 3:45 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 05/31/2013 12:22 PM, Stephen Frost wrote: >> Where I'm going with this whole thing is simply that I do worry a bit >> about using a bitmap for freeze, or similar, information and not being >> able to reconstruct that bitmap from the heap. Perhaps that's overly >> paranoid, but, well, we also write the same data out to disk in multiple >> places multiple times- some might call that paranoid too. ;) > > On the other hand, we could combine Heikki's proposal (epoch numbers in > the page header) together with using the visibility map for pages we > don't need to vacuum freeze, and get vastly better behavior without > increasing corruption risk that I can see. Yeah, I was thinking about that as well. In fact, under either Heikki's proposal or Andres's proposal or my variants of either one of them, anti-wraparound vacuums no longer need to scan all-visible pages. Under Andres's proposal (and variants), all-visible pages are ipso facto frozen and therefore need not be scanned for freezing. But under Heikki's proposal (and variants), anti-wraparound vacuums only need to remove dead tuples; freezing live ones is a no-op. And all-visible pages don't contain any dead tuples, so we're right back in the same place.[1] Where things diverge a little is what you when an anti-wraparound scan encounters a page that isn't all-visible and can't be marked all-visible. Under the "XID epoch" family of proposals, we need to truncate any dead tuples on the page to line pointers, and that's it. Under the "treat all-visible as frozen" family of proposals, we *also* need to do old-style freezing on any aged but live tuples on the page.So the "XID epoch" saves write I/O in this case, becausewe don't dirty pages or write WAL just because we see a tuple with a high XID age. Only pages that contain actual dead tuples get dirtied. So on further reflection, I'm not seeing any advantage to combining the two proposals. The "XID epoch" approach has the complications of requiring page format changes and consuming space on every page, although I think I may have worked out a variant approach that will avoid that (which no one has commented on, so maybe I'm hallucinating). But in other respects it seems better all around. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] I said upthread that Heikki's idea would require a separate freeze map, but now I think I was wrong.