Re: vacuum, performance, and MVCC - Mailing list pgsql-hackers

From Jan Wieck
Subject Re: vacuum, performance, and MVCC
Date
Msg-id 449F41B2.2000906@Yahoo.com
Whole thread Raw
In response to Re: vacuum, performance, and MVCC  (Bruce Momjian <bruce@momjian.us>)
Responses Re: vacuum, performance, and MVCC  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
On 6/25/2006 5:18 PM, Bruce Momjian wrote:

> Jan Wieck wrote:
>> >> An update that results in all the same values of every indexed column of 
>> >> a known deleted invisible tuple. This reused tuple can by definition not 
>> >> be the one currently updated. So unless it is a table without a primary 
>> >> key, this assumes that at least 3 versions of the same row exist within 
>> >> the same block. How likely is that to happen?
>> > 
>> > Good question.  You take the current tuple, and make another one on the
>> > same page.  Later, an update can reuse the original tuple if it is no
>> > longer visible to anyone (by changing the item id), so you only need two
>> > tuples, not three.  My hope is that a repeated update would eventually
>> > move to a page that enough free space for two (or more) versions.
>> > 
>> > Does that help explain it?
>> > 
>> 
>> That's exactly what I meant. You need space for 3 or more tuple versions 
>> within one page and the luck that one of them is invisible at the time 
>> of the update. I don't know how likely or unlikely this is in reality, 
>> but it doesn't sound very promising to me so far.
> 
> Why three?  I explained using only two heap tuples:

For some reason I counted in the new tuple ... sorry that. Yes, it can 
work with two tuples.

> 
>     [item1]...[tuple1]
> 
> becomes on UPDATE:
>            ---------->
>     [item1]...[tuple1][tuple2]
>                       ----->
> 
> on another UPDATE, if tuple1 is no longer visible:
> 
>            ------------------>
>     [item1]...[tuple1][tuple2]
>                       <------
> 
>> Another problem with this is that even if you find such row, it doesn't 
>> spare you the index traversal. The dead row whos item id you're reusing 
>> might have resulted from an insert that aborted or crashed before it 
>> finished creating all index entries. Or some of its index entries might 
>> already be flagged known dead, and you better reset those flags.
> 
> You can only reuse heap rows that were created and expired by committed
> transactions.  In fact, you can only UPDATE a row that was created by a
> committed transaction.  You cannot _reuse_ any row, but only a row that
> is being UPDATEd.  Also, it cannot be known dead because it are are in
> the process of updating it.

Now you lost me. What do you mean "a row that is being UPDATEd"? The row 
(version) being UPDATEd right now cannot be expired, or why would you 
update that one? And if your transaction rolls back later, the row you 
update right now must be the one surviving.

Any row that was created by a committed transaction does indeed have all 
the index entries created. But if it is deleted and expired, that means 
that the transaction that stamped xmax has committed and is outside of 
every existing snapshot. You can only reuse a slot that is used by a 
tuple that satisfies the vacuum snapshot. And a tuple that satisfies 
that snapshot has potentially index entries flagged known dead.

> I am thinking my idea was not fully understood.  Hopefully this email
> helps.

I must be missing something because I still don't see how it can work.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC
Next
From: Bruce Momjian
Date:
Subject: Re: vacuum, performance, and MVCC