Re: online debloatification (was: extending relations more efficiently) - Mailing list pgsql-hackers

From Robert Haas
Subject Re: online debloatification (was: extending relations more efficiently)
Date
Msg-id CA+TgmoZoNOvDqeuugZqiZF+jrhJMLubwfcDpzLThODA9A+01dQ@mail.gmail.com
Whole thread Raw
In response to Re: online debloatification (was: extending relations more efficiently)  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Wed, May 2, 2012 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Brainstorming wildly, how about something like this:
>
>> 1. Insert a new copy of the tuple onto some other heap page.  The new
>> tuple's xmin will be that of the process doing the tuple move, and
>> we'll also set a flag indicating that a move is in progress.
>> 2. Set a flag on the old tuple, indicating that a tuple move is in
>> progress.  Set its TID to the new location of the tuple.  Set xmax to
>> the tuple mover's XID.  Optionally, truncate away the old tuple data,
>> leaving just the tuple header.
>> 3. Scan all indexes and replace any references to the old tuple's TID
>> with references to the new tuple's TID.
>> 4. Commit.
>
> What happens when you crash partway through that?

Well, there are probably a few loose ends here, but the idea is that
if we crash after step 2 is complete, the next vacuum is responsible
for performing steps 3 and 4.  As written, there's probably a problem
if we crash between (1) and (2); I think those would need to be done
atomically, or at least we need to make sure that the moving-in flag
is set on the new tuple if and only if there is actually a redirect
pointing to it.

> Also, what happens if
> somebody wishes to update the tuple before the last step is complete?

Then we let them.  The idea is that they see the redirect tuple at the
old TID, follow it to the new copy of the tuple, and update that
instead.

> In any case, this doesn't address the fundamental problem with unlocked
> tuple movement, which is that you can't just arbitrarily change a
> tuple's TID when there might be other operations relying on the TID
> to hold still.

Well, that's why I invented the redirect tuple, so that anyone who was
relying on the TID to hold still would see the redirect and say, oh, I
need to go look at this other TID instead.  It's entirely possible
there's some reason why that can't work, but at the moment I'm not
seeing it.  I see that there's a problem if the old TID gets freed
while someone's relying on it, but replacing it with a pointer to some
other TID seems like it ought to be workable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: online debloatification (was: extending relations more efficiently)
Next
From: Tom Lane
Date:
Subject: Re: proposal: additional error fields