Re: HOT for PostgreSQL 8.3 - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: HOT for PostgreSQL 8.3
Date
Msg-id 1170979630.3776.383.camel@silverbirch.site
Whole thread Raw
In response to Re: HOT for PostgreSQL 8.3  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: HOT for PostgreSQL 8.3  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
List pgsql-hackers
On Wed, 2007-02-07 at 14:17 -0500, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > The basic idea is that when a tuple is UPDATEd we can, in certain
> > circumstances, avoid inserting index tuples for a tuple. Such tuples are
> > marked HEAP_ONLY_TUPLE, but their storage is otherwise identical to
> > other tuples.
> 
> What is VACUUM FULL going to do when it wants to move one of these things?

In addition to others suggested, one option is to rework VACUUM FULL:

Use case for VACUUM FULL is very low these days. It has an appallingly
long execution time. This can be speeded up by dropping and re-creating
indexes, says the manual, but it is still lengthy. It is even faster to
drop the indexes, do a CREATE TABLE AS SELECT * FROM table, drop the old
table and then rebuild the indexes. When moving into the new relation
the space requirement is not double, since we only copy useful
data/space and we also do this using the space the indexes occupied, so
the actual space overhead isn't that high. VACUUM FULL also generates
lots of WAL and ties up lots of memory while it operates - and it uses
memory without any constraint. The CTAS technique doesn't carry across
all of the visible tuple chains, but then to be brutally frank, by the
time VACUUM FULL has actually finished executing, it is very frequently
the oldest transaction anyway, so we needn't really have gone to all the
trouble of moving the tuple chains.

So the main use case for VACUUM FULL is when the space to be freed
inside the table is low enough to make defraging the table quicker than
a CTAS, yet still high enough that we were worried enough to do a VACUUM
FULL. Thats a very narrow use case, and if it exists at all there's a
narrow time window associated with it - only a heavily updated/deleted
table needs vacuuming anyway - that means most often be performing the
VF when we're already into the zone where the CTAS approach is quicker.
VACUUM FULL also forces us to handle various failure cases that leave
half-moved tuples scattered across tables. (Incomplete VACUUM FULLs are
actually fairly common because of its incredibly long run time).

So the radical suggestion is to continue to support the VACUUM FULL
command, but using a newly coded technique that is both higher
performance and more robust. We can either make VACUUM FULL wait until
it actually is the oldest transaction, or we can mark the table in some
way so that a lock cannot be obtained upon it by an earlier Xid. We
should be able to compact the files of a large table one physical file
at a time, so the space overhead is only ever MAX_PHYSICAL_FILESIZE and
the space overhead may become a net space gain as the VF continues. This
is of course (almost) identical to the approach already in use for
CLUSTER, so it seems like that should be acceptable. As a result, its
really not that much code and can still be accomplished on time.

Note also that we would not have to drop and re-add Foreign Keys, since
nothing can have changed while we have the table locked.

Doing this also frees up two heap info bits and simplifies many of the
HeapTupleSatisfies code, which could probably use the helping hand.
Tuples moved to the new files would retain their info bit settings as
they are copied across.

So overall, it seems a lot easier to completely replace VF than to fight
through its complexities and failure cases. If we do the above, then
we'll speed up VACUUM FULL and we'll be able to handle HOT tuples
easily.

> > CREATE INDEX requires some careful work to allow it to identify and
> > correctly index HEAP_ONLY_TUPLEs that need to become ~HEAP_ONLY_TUPLE as
> > a result of the new index.
> 
> I think you've glossed over the CREATE INDEX problem much too easily.
> The difficulty with that is going to be that de-HOT-ifying a tuple
> is going to require multiple updates that can't possibly be put into
> a single WAL record, and I don't think that WAL replay can clean up
> after an incomplete update (since it can't run user-defined functions
> and hence cannot be expected to compute index entries for itself).
> So I don't think you can do that while preserving crash safety.

No intention to gloss, just wanted to get past first post and onto the
really complex stuff. You're absolutely right that this is where much of
the thinking/work needs to take place.

> > Removing the root tuple will require a VACUUM *FULL*.
> 
> That seems unacceptable ... it won't take too long for your table to
> fill up with stubs, and we don't want to return to the bad old days
> when periodic VACUUM FULL was unavoidable.

Completely agree. I wanted to start right at the very beginning, so
everybody would understand the issues, rather than jump straight in
again with additional complexity.

> ISTM we could fix that by extending the index VACUUM interface to
> include two concepts: aside from "remove these TIDs when you find them",
> there could be "replace these TIDs with those TIDs when you find them".
> This would allow pointer-swinging to one of the child tuples, after
> which the old root could be removed.  This has got the same atomicity
> problem as for CREATE INDEX, because it's the same thing: you're
> de-HOT-ifying the child.  So if you can solve the former, I think you
> can make this work too.

This is looking like the best option out of the many, since it doesn't
have any serious restrictions or penalties. Let's see what Pavan thinks,
since he's been working on this aspect.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] plpgsql, return can contains any expression
Next
From: Jan Wieck
Date:
Subject: Re: Proposal: Commit timestamp