Thread: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Tom Lane
Date:
sriggs@postgresql.org (Simon Riggs) writes: > Log Message: > ----------- > Augment WAL records for btree delete with GetOldestXmin() to reduce > false positives during Hot Standby conflict processing. Simple > patch to enhance conflict processing, following previous discussions. > Controlled by parameter minimize_standby_conflicts = on | off, with > default off allows measurement of performance impact to see whether > it should be set on all the time. WTF? Simon, this seems to be getting way way beyond anything the community has agreed to. Particularly, inventing GUCs is not something to be doing without consensus. regards, tom lane
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Stefan Kaltenbrunner
Date:
Tom Lane wrote: > sriggs@postgresql.org (Simon Riggs) writes: >> Log Message: >> ----------- >> Augment WAL records for btree delete with GetOldestXmin() to reduce >> false positives during Hot Standby conflict processing. Simple >> patch to enhance conflict processing, following previous discussions. >> Controlled by parameter minimize_standby_conflicts = on | off, with >> default off allows measurement of performance impact to see whether >> it should be set on all the time. > > WTF? Simon, this seems to be getting way way beyond anything the > community has agreed to. Particularly, inventing GUCs is not something > to be doing without consensus. yeah actually both of the last commits seem rather strange given the current discussion on this list I must say... Stefan
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Fri, 2010-01-29 at 14:04 -0500, Tom Lane wrote: > sriggs@postgresql.org (Simon Riggs) writes: > > Log Message: > > ----------- > > Augment WAL records for btree delete with GetOldestXmin() to reduce > > false positives during Hot Standby conflict processing. Simple > > patch to enhance conflict processing, following previous discussions. > > Controlled by parameter minimize_standby_conflicts = on | off, with > > default off allows measurement of performance impact to see whether > > it should be set on all the time. > > WTF? Simon, this seems to be getting way way beyond anything the > community has agreed to. Particularly, inventing GUCs is not something > to be doing without consensus. If you or anybody else would like me to revoke it then I am happy to do that, with no problem or argument. I agree it hasn't had discussion, though am happy to have such a discussion. The commit is a one line change, with parameter to control it, discussed by Heikki and myself in December 2008. I stand by the accuracy of the change; the parameter is really to ensure we can test during beta. I imagined such a minor addition would pass without comment. This is not the same patch not-being-discussed on the other thread, which is too complex for a commit without review. It is a separate change altogether, although it does relate to conflict processing. In any case I would never commit a patch while discussion on it continues. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > On Fri, 2010-01-29 at 14:04 -0500, Tom Lane wrote: >> WTF? Simon, this seems to be getting way way beyond anything the >> community has agreed to. Particularly, inventing GUCs is not something >> to be doing without consensus. > If you or anybody else would like me to revoke it then I am happy to do > that, with no problem or argument. I agree it hasn't had discussion, > though am happy to have such a discussion. > The commit is a one line change, with parameter to control it, discussed > by Heikki and myself in December 2008. I stand by the accuracy of the > change; the parameter is really to ensure we can test during beta. Well, I was waiting to see if anyone else had an opinion, but: my opinion is that a GUC is not appropriate here. Either test it yourself enough to be sure it's a win, or don't put it in. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: > > The commit is a one line change, with parameter to control it, discussed > > by Heikki and myself in December 2008. I stand by the accuracy of the > > change; the parameter is really to ensure we can test during beta. > > Well, I was waiting to see if anyone else had an opinion, but: my > opinion is that a GUC is not appropriate here. Either test it yourself > enough to be sure it's a win, or don't put it in. I will remove the parameter then, keeping the augmentation. That OK? -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Stefan Kaltenbrunner
Date:
Simon Riggs wrote: > On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: > >>> The commit is a one line change, with parameter to control it, discussed >>> by Heikki and myself in December 2008. I stand by the accuracy of the >>> change; the parameter is really to ensure we can test during beta. >> Well, I was waiting to see if anyone else had an opinion, but: my >> opinion is that a GUC is not appropriate here. Either test it yourself >> enough to be sure it's a win, or don't put it in. > > I will remove the parameter then, keeping the augmentation. That OK? Well how much is the actual hit with this on the master for different workloads do we have realistic numbers on that? Also how much of an actual win is it in the other direction - as in under what circumstances and workloads does it help in avoiding superflous cancelations on the standby? Stefan
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 20:48 +0100, Stefan Kaltenbrunner wrote: > Simon Riggs wrote: > > On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: > > > >>> The commit is a one line change, with parameter to control it, discussed > >>> by Heikki and myself in December 2008. I stand by the accuracy of the > >>> change; the parameter is really to ensure we can test during beta. > >> Well, I was waiting to see if anyone else had an opinion, but: my > >> opinion is that a GUC is not appropriate here. Either test it yourself > >> enough to be sure it's a win, or don't put it in. > > > > I will remove the parameter then, keeping the augmentation. That OK? > > Well how much is the actual hit with this on the master for different > workloads do we have realistic numbers on that? Also how much of an > actual win is it in the other direction - as in under what circumstances > and workloads does it help in avoiding superflous cancelations on the > standby? At the moment a btree delete record will cancel every request 1. no matter how long they have been running 2. no matter if they haven't accessed the index being cleaned (they might later, is the thinking...) This change improves case (1) in that it will only remove queries that are older than the oldest snapshot on the primary, should max_standby_delay be exceeded. Case (2) would have been improved by my other proposed patch should max_standby_delay be exceeded. The cost of improving case (1) is that we do the equivalent of taking a snapshot of the procarray while holding locks on the btree block being cleaned. That will increase index contention somewhat in applications that do btree deletes, i.e. non-hot index updates or deletes. Greg Stark's comments have been that he wants to see max_standby_delay = -1 put back, which would effecively prevent both (1) and (2) from occurring. I am working on that now. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > At the moment a btree delete record will cancel every request > 1. no matter how long they have been running > 2. no matter if they haven't accessed the index being cleaned (they > might later, is the thinking...) That seems seriously horrid. What is the rationale for #2 in particular? I would hope that at worst this would affect sessions that are actively competing for the index being cleaned. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 15:41 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > At the moment a btree delete record will cancel every request > > 1. no matter how long they have been running > > 2. no matter if they haven't accessed the index being cleaned (they > > might later, is the thinking...) > > That seems seriously horrid. What is the rationale for #2 in > particular? I would hope that at worst this would affect sessions > that are actively competing for the index being cleaned. That is exactly the feedback I received from many other people and why I prioritised the relation-specific conflict patch. It's worse that that because point 2 effects WAL cleanup records for the heap also. The rationale is that a session *might* in the future access a table, and if it did so it would receive the wrong answer *potentially*. This is the issue I have been discussing for a long time now, in various forms, starting on-list in Aug 2008. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Stefan Kaltenbrunner
Date:
Simon Riggs wrote: > On Sun, 2010-01-31 at 20:48 +0100, Stefan Kaltenbrunner wrote: >> Simon Riggs wrote: >>> On Sun, 2010-01-31 at 14:07 -0500, Tom Lane wrote: >>> >>>>> The commit is a one line change, with parameter to control it, discussed >>>>> by Heikki and myself in December 2008. I stand by the accuracy of the >>>>> change; the parameter is really to ensure we can test during beta. >>>> Well, I was waiting to see if anyone else had an opinion, but: my >>>> opinion is that a GUC is not appropriate here. Either test it yourself >>>> enough to be sure it's a win, or don't put it in. >>> I will remove the parameter then, keeping the augmentation. That OK? >> Well how much is the actual hit with this on the master for different >> workloads do we have realistic numbers on that? Also how much of an >> actual win is it in the other direction - as in under what circumstances >> and workloads does it help in avoiding superflous cancelations on the >> standby? > > At the moment a btree delete record will cancel every request > 1. no matter how long they have been running > 2. no matter if they haven't accessed the index being cleaned (they > might later, is the thinking...) > > This change improves case (1) in that it will only remove queries that > are older than the oldest snapshot on the primary, should > max_standby_delay be exceeded. Case (2) would have been improved by my > other proposed patch should max_standby_delay be exceeded. > > The cost of improving case (1) is that we do the equivalent of taking a > snapshot of the procarray while holding locks on the btree block being > cleaned. That will increase index contention somewhat in applications > that do btree deletes, i.e. non-hot index updates or deletes. hmm makes sense from the HS side - but I think to make a case for a GUC it would help a lot to see actual numbers(as in benchmarks) showing how much of a hit it is to the master. Stefan
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 22:05 +0100, Stefan Kaltenbrunner wrote: > hmm makes sense from the HS side - but I think to make a case for a GUC > it would help a lot to see actual numbers(as in benchmarks) showing how > much of a hit it is to the master. Agreed, though my approach to benchmarking was to provide the mechanism by which it was possible to benchmark. I didn't presume to be able to cover wide area with benchmarking tests just for this. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > On Sun, 2010-01-31 at 22:05 +0100, Stefan Kaltenbrunner wrote: > >> hmm makes sense from the HS side - but I think to make a case for a GUC >> it would help a lot to see actual numbers(as in benchmarks) showing how >> much of a hit it is to the master. > > Agreed, though my approach to benchmarking was to provide the mechanism > by which it was possible to benchmark. I didn't presume to be able to > cover wide area with benchmarking tests just for this. I don't think this patch helps much though. It's setting lastRemovedXid to GetOldestXmin(), which is still a very pessimistic estimate. In fact, if there's only one transaction running in the master, it's not any better than just setting it to InvalidTransactionId and killing all active queries in the standby. What we'd want to set it to is the xmin/xmax of the oldest heap tuple whose pointer was removed from the index page. That could be much much older than GetOldestXmin(), allowing many more read-only queries to live in the standby. IIRC it was Greg Stark who suggested last time this was discussed that we could calculate the exact value for latestRemovedXid in the standby. When replaying the deletion record, the standby could look at all the heap tuples whose index pointers are being removed, to see which one was newest. That can be pretty expensive, involving random I/O, but it gives an exact value, and doesn't stress the master at all. And we could skip it if there's no potentially conflicting read-only queries running. That seems like the most user-friendly approach to me. Even though random I/O is expensive, surely it's better than killing queries. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > IIRC it was Greg Stark who suggested last time this was discussed that > we could calculate the exact value for latestRemovedXid in the standby. > When replaying the deletion record, the standby could look at all the > heap tuples whose index pointers are being removed, to see which one was > newest. This assumes that the standby can tell which heap tuples are being removed, which I'm not sure is true --- consider the case where the deletion record is carrying a page image instead of a list of deleted tuples. But maybe we could rejigger the representation to make sure that the info is always available. In general it sounds like a much nicer answer. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: > IIRC it was Greg Stark who suggested last time this was discussed that > we could calculate the exact value for latestRemovedXid in the > standby. When replaying the deletion record, the standby could look at > all the heap tuples whose index pointers are being removed, to see > which one was newest. That can be pretty expensive, involving random > I/O, but it gives an exact value, and doesn't stress the master at > all. And we could skip it if there's no potentially conflicting > read-only queries running. > > That seems like the most user-friendly approach to me. Even though > random I/O is expensive, surely it's better than killing queries. Best solution, no time to do it. Should I revoke this change? -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > Should I revoke this change? Please. We can always put it back later if nothing better gets implemented. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 17:10 -0500, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > Should I revoke this change? > > Please. Will do, in morning. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 16:53 -0500, Tom Lane wrote: > Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > > IIRC it was Greg Stark who suggested last time this was discussed that > > we could calculate the exact value for latestRemovedXid in the standby. > > When replaying the deletion record, the standby could look at all the > > heap tuples whose index pointers are being removed, to see which one was > > newest. > > This assumes that the standby can tell which heap tuples are being > removed, which I'm not sure is true --- consider the case where the > deletion record is carrying a page image instead of a list of deleted > tuples. But maybe we could rejigger the representation to make sure > that the info is always available. In general it sounds like a much > nicer answer. I'm looking at this now since we definitely need either this or the previous patch. XLOG_BTREE_DELETE records would not carry latestRemovedXid, this would be calculated by inspection of heap tuples. XLOG_BTREE_DELETE would have a heap relid, to allow direct access to the heap without a Relation. We would check xmin, xmax and xvac. Code will look a little like heap_fetch() though disimilar enough not to use directly. We would only look at the tuple header and info. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > XLOG_BTREE_DELETE records would not carry latestRemovedXid, this would > be calculated by inspection of heap tuples. You might still want to keep the conservative latestRemovedXid value in the WAL record to avoid the extra work when latestRemovedXid alone allows. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Tue, 2010-03-09 at 11:20 +0200, Heikki Linnakangas wrote: > Simon Riggs wrote: > > XLOG_BTREE_DELETE records would not carry latestRemovedXid, this would > > be calculated by inspection of heap tuples. > > You might still want to keep the conservative latestRemovedXid value in > the WAL record to avoid the extra work when latestRemovedXid alone allows. OK will do. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: > When replaying the deletion record, the standby could look at all the > heap tuples whose index pointers are being removed, to see which one > was newest. Long after coding this, I now realise this really is a dumb-ass idea. There is no spoon. The index tuples did once point at valid heap tuples. 1. heap tuples are deleted 2. heap tuples become dead 3. index tuples can now be marked killed 4. index tuples removed Heap tuples can be removed at step 2, index tuples can't be removed until step 4. so the dead index tuples can't be followed reliably to read the xids. They might be the correct ones, might not, and no way to tell. So that puts this back to exactly the place we were on 31 Jan: On Sun, 2010-01-31 at 17:10 -0500, Tom Lane wrote: > We can always put it back later if nothing better gets > implemented. So, barring huge injections of insight, I'll be putting back the generation of OldestXmin() into the btree delete path. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: >> When replaying the deletion record, the standby could look at all the >> heap tuples whose index pointers are being removed, to see which one >> was newest. > Long after coding this, I now realise this really is a dumb-ass idea. > There is no spoon. The index tuples did once point at valid heap tuples. > 1. heap tuples are deleted > 2. heap tuples become dead > 3. index tuples can now be marked killed > 4. index tuples removed > Heap tuples can be removed at step 2, index tuples can't be removed > until step 4. Uh, no, heap tuples can't be removed until after all index entries that are pointing at them have been removed. Please tell me you have not broken this. > So, barring huge injections of insight, I'll be putting back the > generation of OldestXmin() into the btree delete path. I stand by my previous comments about where this is going to end up. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Fri, 2010-03-26 at 16:16 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Sun, 2010-01-31 at 23:43 +0200, Heikki Linnakangas wrote: > >> When replaying the deletion record, the standby could look at all the > >> heap tuples whose index pointers are being removed, to see which one > >> was newest. > > > Long after coding this, I now realise this really is a dumb-ass idea. > > > There is no spoon. The index tuples did once point at valid heap tuples. > > 1. heap tuples are deleted > > 2. heap tuples become dead > > 3. index tuples can now be marked killed > > 4. index tuples removed > > Heap tuples can be removed at step 2, index tuples can't be removed > > until step 4. > > Uh, no, heap tuples can't be removed until after all index entries that > are pointing at them have been removed. Please tell me you have not > broken this. Nothing broken. It appears that in practice many of the index items point to heap items that are LP_DEAD. So for the purposes of accessing a heap tuple's xmin, then we're both right. To the current purpose the tuple has been removed, though you are also right: its stub remains. So how do I calculate xmin and xmax for an LP_DEAD tuple? Clearly nothing can be done directly. Is there another way? A conjecture: if the index items point to a heap tuple that is LP_NORMAL then we can get the xmin/xmax from there. The xmin/xmax of LP_DEAD items will always be *earlier* than the latest LP_NORMAL tuple that is being removed. So as long as I have at least 1 LP_NORMAL heap tuple, then I can use the latestRemovedXid from that and simply discard the LP_DEAD items (for the purposes of this calculation). The idea is that whatever marked those heap tuples LP_DEAD would also have marked the others, if they were the same or earlier than the LP_DEAD ones. Do you agree with this conjecture? If you do, then attached patch is complete. -- Simon Riggs www.2ndQuadrant.com
Attachment
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Greg Stark
Date:
On Sat, Mar 27, 2010 at 10:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > It appears that in practice many of the index items point to heap items > that are LP_DEAD. So for the purposes of accessing a heap tuple's xmin, > then we're both right. To the current purpose the tuple has been > removed, though you are also right: its stub remains. If we're pruning an index entry to a heap tuple that has been HOT pruned wouldn't the HOT pruning record have already conflicted with any queries that could see it? -- greg
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sat, 2010-03-27 at 19:15 +0000, Greg Stark wrote: > On Sat, Mar 27, 2010 at 10:10 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > It appears that in practice many of the index items point to heap items > > that are LP_DEAD. So for the purposes of accessing a heap tuple's xmin, > > then we're both right. To the current purpose the tuple has been > > removed, though you are also right: its stub remains. > > If we're pruning an index entry to a heap tuple that has been HOT > pruned wouldn't the HOT pruning record have already conflicted with > any queries that could see it? Quite probably, but a query that started after that record arrived might slip through. We have to treat each WAL record separately. Do you agree with the conjecture? That LP_DEAD items can be ignored because their xid would have been earlier than the latest LP_NORMAL tuple we find? (on any page). Or is a slightly less strong condition true: we can ignore LP_DEAD items on a page that is also referenced by an LP_NORMAL item. -- Simon Riggs www.2ndQuadrant.com
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Greg Stark
Date:
On Sat, Mar 27, 2010 at 7:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2010-03-27 at 19:15 +0000, Greg Stark wrote: > > If we're pruning an index entry to a heap tuple that has been HOT > > pruned wouldn't the HOT pruning record have already conflicted with > > any queries that could see it? > > Quite probably, but a query that started after that record arrived might > slip through. We have to treat each WAL record separately. Slip through? I'm not following you. > Do you agree with the conjecture? That LP_DEAD items can be ignored > because their xid would have been earlier than the latest LP_NORMAL > tuple we find? (on any page). > > Or is a slightly less strong condition true: we can ignore LP_DEAD items > on a page that is also referenced by an LP_NORMAL item. I don't like having dependencies on the precise logic in vacuum rather than only on the guarantees that vacuum provides. We want to improve the logic in vacuum and hot pruning to cover more cases and that will be harder if there's code elsewhere depending on its simple-minded xid <= globalxmin test. -- greg
Re: [COMMITTERS] pgsql: Augment WAL records for btree delete with GetOldestXmin() to
From
Simon Riggs
Date:
On Sat, 2010-03-27 at 22:39 +0000, Greg Stark wrote: > On Sat, Mar 27, 2010 at 7:36 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > > On Sat, 2010-03-27 at 19:15 +0000, Greg Stark wrote: > > > If we're pruning an index entry to a heap tuple that has been HOT > > > pruned wouldn't the HOT pruning record have already conflicted with > > > any queries that could see it? > > > > Quite probably, but a query that started after that record arrived might > > slip through. We have to treat each WAL record separately. > > Slip through? I'm not following you. No, there is no possibility for it to slip through, you're right. (After much thinking). > > Do you agree with the conjecture? That LP_DEAD items can be ignored > > because their xid would have been earlier than the latest LP_NORMAL > > tuple we find? (on any page). > > > > Or is a slightly less strong condition true: we can ignore LP_DEAD items > > on a page that is also referenced by an LP_NORMAL item. > > I don't like having dependencies on the precise logic in vacuum rather > than only on the guarantees that vacuum provides. We want to improve > the logic in vacuum and hot pruning to cover more cases and that will > be harder if there's code elsewhere depending on its simple-minded xid > <= globalxmin test. Agreed -- Simon Riggs www.2ndQuadrant.com