On 11/10/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: "Pavan Deolasee" <pavan.deolasee@gmail.com> writes:
> On 11/10/06, Josh Berkus < josh@agliodbs.com> wrote:
>> I believe that's the "unsolved technical issue" in the prototype, unless
>> Pavan has solved it in the last two weeks. Pavan?
>>
> When an overflow tuple is copied back to the main heap, the overflow tuple
> is
> marked with a special flag (HEAP_OVERFLOW_MOVEDBACK). Subsequently,
> when a backend tries to lock the overflow version of the tuple, it checks
> the flag
> and jumps to the main heap if the flag is set.
(1) How does it "jump to the main heap"? The links go the other
direction.
The overflow tuple has a special header to store the back pointer to the main heap.
This increases the tuple header size by 6 bytes, but the overhead is restricted only to the overflow
tuples.
(2) Isn't this full of race conditions?
I agree, there could be race conditions. But IMO we can handle those. When we
follow the tuple chain, we hold a SHARE lock on the main heap buffer. Also, when
the root tuple is vacuumable and needs to be overwritten, we acquire and keep EXCLUSIVE
lock on the main heap buffer.
This reduces the race conditions to a great extent.
(3) I thought you already used up the one remaining t_infomask bit.
Yes. The last bit in the t_infomask is used up to mark presence of overflow tuple header. But I believe there are few more bits that can be reused. There are three bits available in the t_ctid field as well (since ip_posid needs maximum 13 bits). One bit is used to identify whether a given tid points to the main heap or the overflow heap. This helps when tids are passed around in the code.
Since the back pointer from the overflow tuple always points to the main heap, the same bit can be used to mark copied-back tuples (we are doing it in a slight different way in the current prototype though).
Regards,
Pavan