Thread: Re: online debloatification (was: extending relations more efficiently)

Re: online debloatification (was: extending relations more efficiently)

From

Robert Haas

Date:

02 May 2012, 15:14:47

On Wed, May 2, 2012 at 1:06 PM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Excerpts from Robert Haas's message of mié may 02 12:55:17 -0400 2012:
>> On Wed, May 2, 2012 at 12:46 PM, Alvaro Herrera
>> <alvherre@commandprompt.com> wrote:
>> > Agreed.  Perhaps to solve this issue what we need is a way to migrate
>> > tuples from later pages into earlier ones.  (This was one of the points,
>> > never resolved, that we discussed during the VACUUM FULL rework.)
>>
>> Yeah, I agree.  And frankly, we need to find a way to make it work
>> without taking AccessExclusiveLock on the relation.  Having to run
>> VACUUM FULL is killing actual users and scaring off potential ones.
>
> And ideally without bloating the indexes while at it.

Yeah.

Brainstorming wildly, how about something like this:

1. Insert a new copy of the tuple onto some other heap page.  The new
tuple's xmin will be that of the process doing the tuple move, and
we'll also set a flag indicating that a move is in progress.
2. Set a flag on the old tuple, indicating that a tuple move is in
progress.  Set its TID to the new location of the tuple.  Set xmax to
the tuple mover's XID.  Optionally, truncate away the old tuple data,
leaving just the tuple header.
3. Scan all indexes and replace any references to the old tuple's TID
with references to the new tuple's TID.
4. Commit.
5. Once the XID of the tuple mover is all-visible, nuke the old TID
and clear the flag on the new tuple indicating a move-in-progress
(these two operations must be done together, atomically, with a single
WAL record covering both).

Any scan that encounters the old tuple will decide whether or not it
can see the tuple based on the xmin & xmax in the old tuple's header.
If it decides it can see it, it follows the TID pointer and does its
work using the new tuple instead.  Scans that encounter the new tuple
need no special handling; the existing visibility rules are fine for
that case.  Prune operations must not truncate away tuples that are
being moved into or out of the page, and vacuum must not mark pages
containing such tuples as all-visible.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: online debloatification (was: extending relations more efficiently)

From

Tom Lane

Date:

02 May 2012, 17:39:31

Robert Haas <robertmhaas@gmail.com> writes:
> Brainstorming wildly, how about something like this:

> 1. Insert a new copy of the tuple onto some other heap page.  The new
> tuple's xmin will be that of the process doing the tuple move, and
> we'll also set a flag indicating that a move is in progress.
> 2. Set a flag on the old tuple, indicating that a tuple move is in
> progress.  Set its TID to the new location of the tuple.  Set xmax to
> the tuple mover's XID.  Optionally, truncate away the old tuple data,
> leaving just the tuple header.
> 3. Scan all indexes and replace any references to the old tuple's TID
> with references to the new tuple's TID.
> 4. Commit.

What happens when you crash partway through that?  Also, what happens if
somebody wishes to update the tuple before the last step is complete?

In any case, this doesn't address the fundamental problem with unlocked
tuple movement, which is that you can't just arbitrarily change a
tuple's TID when there might be other operations relying on the TID
to hold still.
        regards, tom lane

Re: online debloatification (was: extending relations more efficiently)

From

Robert Haas

Date:

02 May 2012, 19:24:01

On Wed, May 2, 2012 at 4:39 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> Brainstorming wildly, how about something like this:
>
>> 1. Insert a new copy of the tuple onto some other heap page.  The new
>> tuple's xmin will be that of the process doing the tuple move, and
>> we'll also set a flag indicating that a move is in progress.
>> 2. Set a flag on the old tuple, indicating that a tuple move is in
>> progress.  Set its TID to the new location of the tuple.  Set xmax to
>> the tuple mover's XID.  Optionally, truncate away the old tuple data,
>> leaving just the tuple header.
>> 3. Scan all indexes and replace any references to the old tuple's TID
>> with references to the new tuple's TID.
>> 4. Commit.
>
> What happens when you crash partway through that?

Well, there are probably a few loose ends here, but the idea is that
if we crash after step 2 is complete, the next vacuum is responsible
for performing steps 3 and 4.  As written, there's probably a problem
if we crash between (1) and (2); I think those would need to be done
atomically, or at least we need to make sure that the moving-in flag
is set on the new tuple if and only if there is actually a redirect
pointing to it.

> Also, what happens if
> somebody wishes to update the tuple before the last step is complete?

Then we let them.  The idea is that they see the redirect tuple at the
old TID, follow it to the new copy of the tuple, and update that
instead.

> In any case, this doesn't address the fundamental problem with unlocked
> tuple movement, which is that you can't just arbitrarily change a
> tuple's TID when there might be other operations relying on the TID
> to hold still.

Well, that's why I invented the redirect tuple, so that anyone who was
relying on the TID to hold still would see the redirect and say, oh, I
need to go look at this other TID instead.  It's entirely possible
there's some reason why that can't work, but at the moment I'm not
seeing it.  I see that there's a problem if the old TID gets freed
while someone's relying on it, but replacing it with a pointer to some
other TID seems like it ought to be workable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company