Thread: HOT WIP Patch - version 1

HOT WIP Patch - version 1

From
"Pavan Deolasee"
Date:

This is a WIP patch based on the recent posting by Simon and discussions
thereafter. We are trying to do one piece at a time and intention is to post
the work ASAP so that we could get early and continuous feedback from
the community. We could then incorporate those suggestions in the next
WIP patch.

To start with, this patch implements  HOT-update for a simple case
when there is enough free space in the same block so that it can
accommodate the new version of the tuple. A necessary condition for
doing HOT-update is that none of the index columns is changed.
The old version is marked as HEAP_UPDATE_ROOT and the new
version is marked as HEAP_ONLY_TUPLE. If a tuple is HOT-updated,
no new index entry is added.

When fetching a tuple using an index, if the root tuple is not visible to
the given snapshot, the ctid chain is followed until a visible tuple is found or
end of HOT-update chain is reached. The prior_xmax/next_xmin chain
is validated while following the ctid chain.

This patch is generated on the current CVS head. It passes all the regression
tests, but I haven't measured any performance impact since thats not the
goal for posting this early version. There are several things that are not
yet implemented and there are few unresolved issues for which I am looking
for community help and feedback.

Open Issues:
------------------

- CREATE INDEX needs more work in the HOT context. The existing HOT
tuples may require chilling for the CREATE INDEX to work correctly. There
are concerns about the crash-safety on chilling operation. Few suggestions
were posted in this regard. We need to conclude that and post a working
design/patch.

- We need to find a way to handle DEAD root tuples, either convert them into
stubs or overwrite them with a new version. We can also perform pointer
swinging from the index. Again there are concerns about crash-safety and
concurrent index-scans working properly. We don't have a community
consensus on any of the suggestions in this regard. But hopefully we
would converge on some design soon.

- Retail VACUUM. We need to implement the block-level vacuum for
UPDATEs to find enough free space in the block to do HOT-update.
Though we are still discussing how to handle the dead root tuples, we
should be able to remove any intermediate dead tuples in the HOT-update
chain safely. If we do so without fixing the root tuple, the
prior_xmax/next_xmin chain would be broken. A similar problem exists
with freezing HOT tuples.

Whats Next:
-----------------

In the current implementation, an HOT-updated tuple can not be vacuumed
because it might be in the middle of the access path to the heap-only visible tuple.
This can cause the table to grow rapidly even if autovacuum is turned on. The
HOT-update chain also keeps growing if there is enough free space in the block.
I am thinking of implementing some sort of HOT-update chain squeezing logic
so that intermediate dead tuples can be retired and vacuumed away. This would
also help us keep the HOT-update chain small enough so that the chain following
does not become unduly costly.

I am thinking of squeezing the HOT-update chain while following it in the index fetch.
If the root tuple is dead, we follow the chain until the first LIVE or
RECENTLY_DEAD tuple is found. The ctid pointer in the root tuple is made
point to the first LIVE or RECENTLY_DEAD tuple. All the intermediate
DEAD tuples are marked ~HEAP_UPDATE_ROOT so that they can be vacuumed
in the next cycle. We hold an exclusive lock on the page while doing so. That should
avoid any race conditions. This infrastructure should also help us retail vacuum the
block later.

Please let me know your comments.

Thanks,
Pavan

--

EnterpriseDB     http://www.enterprisedb.com
Attachment

Re: HOT WIP Patch - version 1

From
Heikki Linnakangas
Date:
Pavan Deolasee wrote:
> - We need to find a way to handle DEAD root tuples, either convert them 
> into
> stubs or overwrite them with a new version. We can also perform pointer
> swinging from the index. Again there are concerns about crash-safety and
> concurrent index-scans working properly. We don't have a community
> consensus on any of the suggestions in this regard. But hopefully we
> would converge on some design soon.

This seems to be the most fundamental problem we have at the moment. If 
we stick to the basic rule we have now that a live tuple's ctid doesn't 
change, the only way to get rid of a dead tuple at the root of the 
update chain is by changing the index pointer. The backward-pointers 
Hannu suggested or the "scan the whole page to find the previous tuple" 
would allow reuse of those dead tuples for new tuples in the chain, but 
even those methods wouldn't completely eliminate the problem.

We could say that in some scenarios we just leave behind some dead 
tuples/stubs that can never be reclaimed. What do you guys think, if we 
can bring it down to just an extra line pointer, would that be 
acceptable? We could also do that for now and implement the 
pointer-swinging later if it turns out to be a problem in practice.

What's the verdict on relaxing the "live tuple's ctid doesn't change 
rule"? If we did allow that within a page, what would we need to change?   Inside the backend we'd have to make sure
thatwhenever a ctid is 
 
used, the page is kept pinned. How likely is it that it would brake any 
external projects, and how difficult would it be to detect the broken 
usage pattern? It would mean that the ctid system column could change 
within a transaction, unless we change it so that it returns the ctid of 
the root tuple + xmin + cmin, but it'd be a user-visible change anyhow.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: HOT WIP Patch - version 1

From
Tom Lane
Date:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> What's the verdict on relaxing the "live tuple's ctid doesn't change 
> rule"?

I think that's unacceptable; it is known that that will break the ODBC
and JDBC drivers, as well as any other programs that make use of the
ctid for re-finding a tuple they read earlier in the same transaction.
We have not only never deprecated client-side use of ctid for this, but
actively encouraged it, for instance by going out of our way to support
fast access for queries "WHERE ctid = 'constant'".

What's more, your proposal would break plain old UPDATE and DELETE,
as well as SELECT FOR UPDATE, none of which promise to hold a pin
continuously on every page containing a tuple they might decide to
revisit (by ctid) later.  Are you prepared to disallow hash join and
sort/merge join in all such queries?
        regards, tom lane


Re: HOT WIP Patch - version 1

From
Heikki Linnakangas
Date:
Tom Lane wrote:
> Heikki Linnakangas <heikki@enterprisedb.com> writes:
>> What's the verdict on relaxing the "live tuple's ctid doesn't change 
>> rule"?
> 
> I think that's unacceptable; it is known that that will break the ODBC
> and JDBC drivers, as well as any other programs that make use of the
> ctid for re-finding a tuple they read earlier in the same transaction.

AFAIK the JDBC driver doesn't use ctid. But ODBC and other programs do.

> We have not only never deprecated client-side use of ctid for this, but
> actively encouraged it, for instance by going out of our way to support
> fast access for queries "WHERE ctid = 'constant'".

The idea I had was to change what the ctid system column returns to 
root ctid + xmin + cmin. As long as programs treat the ctid as an opaque 
string, it should work. Tid scan would use that to locate the original 
tuple.

> What's more, your proposal would break plain old UPDATE and DELETE,
> as well as SELECT FOR UPDATE, none of which promise to hold a pin
> continuously on every page containing a tuple they might decide to
> revisit (by ctid) later.  Are you prepared to disallow hash join and
> sort/merge join in all such queries?

No, of course not. We'd have to do the same thing here; use root tid + 
xmin + cmin instead of just ctid.

But now that I think of it, how do we get the root tid of a tuple? I 
suppose we'd be back to having backpointers or scanning the whole 
page... I guess pointer-swinging it is, then.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: HOT WIP Patch - version 1

From
"Pavan Deolasee"
Date:

On 2/14/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki Linnakangas <heikki@enterprisedb.com> writes:
> What's the verdict on relaxing the "live tuple's ctid doesn't change
> rule"?

I think that's unacceptable; it is known that that will break the ODBC
and JDBC drivers, as well as any other programs that make use of the
ctid for re-finding a tuple they read earlier in the same transaction.
We have not only never deprecated client-side use of ctid for this, but
actively encouraged it, for instance by going out of our way to support
fast access for queries "WHERE ctid = 'constant'".

What's more, your proposal would break plain old UPDATE and DELETE,
as well as SELECT FOR UPDATE, none of which promise to hold a pin
continuously on every page containing a tuple they might decide to
revisit (by ctid) later.  Are you prepared to disallow hash join and
sort/merge join in all such queries?


Not that I am suggesting we do this, but I believe we had some solution
to this problem in the earlier version of HOT. The live tuple when copied-back
to the root tuple, the tuple is marked with a HEAP_COPIED_BACK flag.

HeapTupleSatisfiesUpdate() checks for this flag and if set returns a new
return code, HeapTupleCopiedBack. heap_update() returns the same
to ExecUpdate along with the ctid of the root tuple. The UPDATE/DELETE
operation then retried on the root tuple, very similar to read-committed
update/delete. The xmax of the copied-back tuple is set so that its
not vacuumed away until all the current transactions are completed.

Though I have tested this patch several times and it seems to work fine,
I probably don''t have insight into the code as much others on this list
has. So if someone wants to take a look and see if it would work fine,
I would be more than happy to post the latest HOT patch (older design).

Thanks,
Pavan



--

EnterpriseDB     http://www.enterprisedb.com

Re: HOT WIP Patch - version 1

From
"Zeugswetter Andreas ADI SD"
Date:
> What's the verdict on relaxing the "live tuple's ctid doesn't
> change rule"? If we did allow that within a page, what would
> we need to change?

I already said this, but why would this need to be visible from the
outside ?

A few assumptions:no back pointersindexes only point at slots marked as roots (and non hot tuples)

During vacuum, you swap the tuples and keep a stub at the slot that the
user's ctid might be pointing at. You mark the stub to detect this
situation.
When a select/update by ctid comes along it needs to do one step to the
root
and use that tuple instead.

It needs a second vacuum (or a per page vacuum during update) to remove
the
extra stub when it is dead and not recently dead.

I fail to see the hole.

Andreas


Re: HOT WIP Patch - version 1

From
"Zeugswetter Andreas ADI SD"
Date:
> But now that I think of it, how do we get the root tid of a
> tuple? I suppose we'd be back to having backpointers or
> scanning the whole page... I guess pointer-swinging it is, then.

During vacuum you see a root [stub] not recently dead. You follow
the chain to detect if you find a live tuple that can replace
the root. You replace the root. You replace the original with a stub
that points at the root and mark it recently dead (and HEAP_COPIED_BACK
aka Pavan). ... (see prev post)
No need for anyone but vacuum to find a root.

Andreas


Re: HOT WIP Patch - version 1

From
Bruce Momjian
Date:
Just to summarize:
o  Every tuple gets a heap ctido  Only the root tuple gets an index entryo  We can easily remove dead tuples that
aren'tthe root because   by definition, nothing points to them, including backends and   indexes
 

The problem is that a dead root tuple has to stay around because while
no backends can see it, the index does.  We could move a live tuple into
root ctid slot, but if we do that, the live tuple changes its ctid while
it is visible.

Could we insert index tuples for the live tuple and then remove the root
tuple, perhaps later?  So basically we break the chain at that time. 
The problem there is that we basically have nothing better than what we
have now --- we are just delaying the index insert, and I don't see what
that buys us.

Could a _new_ tuple take over the root tuple slot?  It is new, so it
doesn't have a ctid yet to change.  I think that means the index walking
could go forward or backward, but on the same page.   To illustrate,
with ctid slot numbers:[1] root INSERT[2][3][1] root INSERT[2] UPDATE[3][1] root INSERT (dead)[2] UPDATE 1[3][1] root
UPDATE2[2] UPDATE 1[3]
 

---------------------------------------------------------------------------

Heikki Linnakangas wrote:
> Pavan Deolasee wrote:
> > - We need to find a way to handle DEAD root tuples, either convert them 
> > into
> > stubs or overwrite them with a new version. We can also perform pointer
> > swinging from the index. Again there are concerns about crash-safety and
> > concurrent index-scans working properly. We don't have a community
> > consensus on any of the suggestions in this regard. But hopefully we
> > would converge on some design soon.
> 
> This seems to be the most fundamental problem we have at the moment. If 
> we stick to the basic rule we have now that a live tuple's ctid doesn't 
> change, the only way to get rid of a dead tuple at the root of the 
> update chain is by changing the index pointer. The backward-pointers 
> Hannu suggested or the "scan the whole page to find the previous tuple" 
> would allow reuse of those dead tuples for new tuples in the chain, but 
> even those methods wouldn't completely eliminate the problem.
> 
> We could say that in some scenarios we just leave behind some dead 
> tuples/stubs that can never be reclaimed. What do you guys think, if we 
> can bring it down to just an extra line pointer, would that be 
> acceptable? We could also do that for now and implement the 
> pointer-swinging later if it turns out to be a problem in practice.
> 
> What's the verdict on relaxing the "live tuple's ctid doesn't change 
> rule"? If we did allow that within a page, what would we need to change? 
>    Inside the backend we'd have to make sure that whenever a ctid is 
> used, the page is kept pinned. How likely is it that it would brake any 
> external projects, and how difficult would it be to detect the broken 
> usage pattern? It would mean that the ctid system column could change 
> within a transaction, unless we change it so that it returns the ctid of 
> the root tuple + xmin + cmin, but it'd be a user-visible change anyhow.
> 
> -- 
>    Heikki Linnakangas
>    EnterpriseDB   http://www.enterprisedb.com
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: HOT WIP Patch - version 1

From
mark@mark.mielke.cc
Date:
On Wed, Feb 14, 2007 at 01:56:03PM -0500, Bruce Momjian wrote:
> Could we insert index tuples for the live tuple and then remove the root
> tuple, perhaps later?  So basically we break the chain at that time. 
> The problem there is that we basically have nothing better than what we
> have now --- we are just delaying the index insert, and I don't see what
> that buys us.

At some point - inserting into the block would not be possible, as
there is no free space. Would that be a good time to do the index
insert? Then, a later vacuum would eventually prune out the whole old
chain. As long as vacuuming the intermediate entries in the chain
keeps the block with free space, there is no need to remove the root
tuple. If space ever runs out (vacuum not running frequently enough -
many updates performed in the same interval) - fall back to the
mechanism that is being used today.

I see it buying increased performance for rows that are frequently
updated.  If it can delay modifying the indices to only once every 10
or more updates, it seems to me that the improvement should be
significant. Perhaps PostgreSQL could be used for page hit counters
again... :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: HOT WIP Patch - version 1

From
Heikki Linnakangas
Date:
Zeugswetter Andreas ADI SD wrote:
> A few assumptions:
>     no back pointers
>     indexes only point at slots marked as roots (and non hot tuples)
> 
> During vacuum, you swap the tuples and keep a stub at the slot that the
> user's ctid might be pointing at. You mark the stub to detect this
> situation.
> When a select/update by ctid comes along it needs to do one step to the
> root
> and use that tuple instead.

As Pavan pointed out, that's more or less what he ended up doing 
originally. You need to mark the stub with the current most recent xid, 
and wait until that's no longer running. Only after that you can remove 
the stub.

> It needs a second vacuum (or a per page vacuum during update) to remove
> the 
> extra stub when it is dead and not recently dead. 

Requiring two vacuums to remove the tuple sounds bad at first, but it's 
actually not so bad since both steps could by done by retail vacuum, or 
even normal scans while.

> I fail to see the hole.

The only potential problem I can see is how to make sure that a heap 
scan or a bitmap heap scan doesn't visit the tuple twice. If we make 
sure that the page is scanned in one go while keeping the buffer pinned, 
we're good. We already do that except for system catalogs, so I believe 
we'd have to forbid hot updates on system tables, like we forbid bitmap 
scans.

To me this sounds like the best idea this far.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: HOT WIP Patch - version 1

From
"Zeugswetter Andreas ADI SD"
Date:
Bruce Momjian wrote:
> Just to summarize:
>
>     o  Every tuple gets a heap ctid
>     o  Only the root tuple gets an index entry

>     o  We can easily remove dead tuples that aren't the root because
>        by definition, nothing points to them, including backends and
>        indexes

I am still wondering about the "easily" here. Basically this
needs some kind of wal entry to be crash safe.

Else some later tx might reuse the slot:- some update on page produces page image in wal- slot removed- slot reused and
comitted-page not written- crash- wal fullpage restores the page to the version before slot 
removed(- would need a wal replay for slot removed from hot chain here)- wal restores slot reuse, but the slot is now
partof a wrong 
hot chain  and the chain is broken (unless we have the above step)

Do we have this wal entry ?
> The problem is that a dead root tuple has to stay around
> because while no backends can see it, the index does.  We
> could move a live tuple into root ctid slot, but if we do
> that, the live tuple changes its ctid while it is visible.
>
> Could we insert index tuples for the live tuple and then
> remove the root tuple, perhaps later?  So basically we break
> the chain at that time.
> The problem there is that we basically have nothing better
> than what we have now --- we are just delaying the index
> insert, and I don't see what that buys us.

yes, not really promising.

> Could a _new_ tuple take over the root tuple slot?  It is
> new, so it doesn't have a ctid yet to change.  I think that
> means the index walking
> could go forward or backward, but on the same page.   To illustrate,
> with ctid slot numbers:
>
>     [1] root INSERT
>     [2]
>     [3]
>
>     [1] root INSERT
>     [2] UPDATE
>     [3]
>
>     [1] root INSERT (dead)
>     [2] UPDATE 1
>     [3]
>
>     [1] root UPDATE 2
>     [2] UPDATE 1
>     [3]

Imho no, because that would need a back pointer in [1] to [2] so that
readers arriving at [1] (e.g. through index) can see [2] until [1] is
visible.
I think we don't want backpointers. (rather go the tuple swapping route)

Andreas


Re: HOT WIP Patch - version 1

From
Heikki Linnakangas
Date:
Zeugswetter Andreas ADI SD wrote:
> I am still wondering about the "easily" here. Basically this
> needs some kind of wal entry to be crash safe. 
> 
> Else some later tx might reuse the slot:
>     - some update on page produces page image in wal
>     - slot removed
>     - slot reused and comitted
>     - page not written
>     - crash
>     - wal fullpage restores the page to the version before slot
> removed
>     (- would need a wal replay for slot removed from hot chain here)
>     - wal restores slot reuse, but the slot is now part of a wrong
> hot chain
>       and the chain is broken (unless we have the above step)
> 
> Do we have this wal entry ?

We already log tuple removals by normal vacuums. We can't use that wal 
entry as it is: if a dead tuple is in the middle of an update chain, it 
needs to be unlinked from the chain. But I don't see any particular 
problem with that, it just needs to be wal logged like every other data 
changing operation.

Do we actually ever want to remove dead tuples from the middle of the 
chain? If a tuple in the middle of the chain is dead, surely every tuple 
before it in the chain is dead as well, and we want to remove them as 
well. I'm thinking, removing tuples from the middle of the chain can be 
problematic, because we'd need to fiddle with the xmin/xmax of the other 
tuples to make them match. Or change the tuple-following logic to not do 
the xmin=xmax check, but it's a nice robustness feature.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: HOT WIP Patch - version 1

From
"Pavan Deolasee"
Date:

On 2/15/07, Heikki Linnakangas <heikki@enterprisedb.com> wrote:

Do we actually ever want to remove dead tuples from the middle of the
chain? If a tuple in the middle of the chain is dead, surely every tuple
before it in the chain is dead as well, and we want to remove them as
well. I'm thinking, removing tuples from the middle of the chain can be
problematic, because we'd need to fiddle with the xmin/xmax of the other
tuples to make them match. Or change the tuple-following logic to not do
the xmin=xmax check, but it's a nice robustness feature.

I am precisely working on this right now. In the next patch version that I
intend to send shortly, I am thinking of removing the dead tuples in the
middle of the chain. We don't have agreement on how to deal with the
root tuple, but we can safely remove the intermediate dead tuples and
vacuum them. Also when all the tuples in the chain are dead because the
last tuple is either deleted or COLD updated, the entire chain along with
the root tuple and the index entry can be vacuumed.

The operation must be WAL logged and you caught the xmin/xmax
problem very rightly. One option is to change the xmax of root tuple
to the xmin of the first live/recently-dead tuple, if we remove a set of
intermediate dead tuples. This xmin of the first live/recently-dead tuple
is also the xmax of the last dead tuple we removed and hence must
be older than the oldtestXmin. So assigning that to the root tuple
should not break any visibility rules for the root tuple (it would still
be dead).

Do we see any problem with this ?

Thanks,
Pavan


--

EnterpriseDB     http://www.enterprisedb.com

Re: HOT WIP Patch - version 1

From
Hannu Krosing
Date:
Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki
Linnakangas:

> We already log tuple removals by normal vacuums. We can't use that wal 
> entry as it is: if a dead tuple is in the middle of an update chain, it 
> needs to be unlinked from the chain. But I don't see any particular 
> problem with that, it just needs to be wal logged like every other data 
> changing operation.
> 
> Do we actually ever want to remove dead tuples from the middle of the 
> chain? If a tuple in the middle of the chain is dead, surely every tuple 
> before it in the chain is dead as well, and we want to remove them as 
> well. I'm thinking, removing tuples from the middle of the chain can be 
> problematic, because we'd need to fiddle with the xmin/xmax of the other 
> tuples to make them match. Or change the tuple-following logic to not do 
> the xmin=xmax check, but it's a nice robustness feature.

What kind of robustness does it provide ? In other words - what failure
scenario does this guard against ?

I can't see the case where the xmin=xmax check can not succeed, at least
not for same page tuples.

-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com




Re: HOT WIP Patch - version 1

From
"Simon Riggs"
Date:
On Fri, 2007-02-16 at 09:36 +0200, Hannu Krosing wrote:
> Ühel kenal päeval, N, 2007-02-15 kell 10:49, kirjutas Heikki
> Linnakangas:
>
> > We already log tuple removals by normal vacuums. We can't use that wal
> > entry as it is: if a dead tuple is in the middle of an update chain, it
> > needs to be unlinked from the chain. But I don't see any particular
> > problem with that, it just needs to be wal logged like every other data
> > changing operation.
> >
> > Do we actually ever want to remove dead tuples from the middle of the
> > chain? If a tuple in the middle of the chain is dead, surely every tuple
> > before it in the chain is dead as well, and we want to remove them as
> > well. I'm thinking, removing tuples from the middle of the chain can be
> > problematic, because we'd need to fiddle with the xmin/xmax of the other
> > tuples to make them match. Or change the tuple-following logic to not do
> > the xmin=xmax check, but it's a nice robustness feature.
>
> What kind of robustness does it provide ? In other words - what failure
> scenario does this guard against ?
>
> I can't see the case where the xmin=xmax check can not succeed, at least
> not for same page tuples.

The xmin == xmax case was required to support a complex VACUUM FULL
failure case discovered some time ago. I'll post the details.

With HOT, ISTM that testing this becomes even more important in some
form because of the extra complexity we're putting into that part of the
code. We need some way of detecting a corruption introduced by some
weird use case we haven't thought of.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




Re: HOT WIP Patch - version 1

From
"Simon Riggs"
Date:
On Thu, 2007-02-15 at 10:49 +0000, Heikki Linnakangas wrote:

> Do we actually ever want to remove dead tuples from the middle of the 
> chain? If a tuple in the middle of the chain is dead, surely every tuple 
> before it in the chain is dead as well, and we want to remove them as 
> well. I'm thinking, removing tuples from the middle of the chain can be 
> problematic, because we'd need to fiddle with the xmin/xmax of the other 
> tuples to make them match. Or change the tuple-following logic to not do 
> the xmin=xmax check, but it's a nice robustness feature.

I think the phrase "middle of the chain" is being used in two ways here.

If a tuple in the chain is dead, then all prior tuples are dead also.
That gives us a live portion of the chain and a dead portion of the
chain.

We will only ever need to follow the chain along the live portion, so
breaking the live portion of the chain is not allowable. Breaking the
chain in the dead portion *is* allowable and that is what is being
proposed here. We break the chain rather than simply remove all of it
because there may still be index pointers to some of the dead rows. 

Pavan's code does this now.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com