Re: 'TID index' - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: 'TID index'
Date
Msg-id NOEFLCFHBPDAFHEIPGBOKEBFCFAA.simon@2ndquadrant.com
Whole thread Raw
In response to 'TID index'  ("Jim C. Nasby" <decibel@decibel.org>)
Responses Re: 'TID index'  ("Ross J. Reedstrom" <reedstrm@rice.edu>)
List pgsql-hackers
>Jim C. Nasby
> On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote:
> > There are many good ideas out there, yet it is almost impossible to find
> > somebody else to implement yours!
> >
> > The acid test is to try and write it...
> >
> > Overall, I agree VACUUM could do with some tuning - and 8.0 has
> just that.
> > It needs very careful thought to make sure both concurrency and
> > recoverability considerations are fully met in any solution you come up
> > with.
>
> Before I try and tackle this I wanted to investigate why PostgreSQL's
> MVCC depends on vacuum instead of going with an undo log ala Oracle or
> InnoDB. The only thing I could find is this thread
> (http://tinyurl.com/42opl) which doesn't say all that much.
>
> Has anyone looked at the performance impact of having to vacuum vs a
> method that doesn't recquire vacuuming? Obviously rollback with an undo
> log is slow, but it seems it's better to optimize for the normal
> situation of committing.
>
> I'm not looking for a holy war here but I'd like to know if any analysis
> has happened to determine how efficient vacuum can be made and if it's
> as efficient as just deleting an undo log once you no longer need it.

Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole
of its history. There is much written on this and you should search some
more - references are in the manual.
http://citeseer.ist.psu.edu/cache/papers/cs/4130/http:zSzzSzwuarchive.wustl.
eduzSzpackageszSzpostgreszSzpaperszSzERL-M87-06.pdf/stonebraker87design.pdf

Put simply, MVCC requires vacuum - they are two halves of the overall
mechanism. An Undo log would be an alternative to MVCC and therefore offer
an alternative to vacuum. But there are other things to consider - no other
RDBMS requires a "vacuum", but all of them have SOME requirement for
off-line scanning of the database, whether its ANALYZE, or REORG, or dbcc
(or more than of those...). Removing vacuum won't remove the other
requirements, so there is less to be gained than you might think.

IMHO vacuum could be optimised further in terms of what it does and how it
does it, but many of those optimizations wouldn't apply generally - so you'd
then either be introducing further tweaking requirements for DBAs (which are
frowned upon) or giving yourself the additional task of auto-optimizing
their deployment.

In PostgreSQL 8.0, vacuum has been enhanced so that it operates in the
background, avoids locks, doesn't spoil the dbpage cache - on top of being
automated. That approach is simple and pragmatic and that's hard to beat.

I think the effort of altering the current storage system is not something
you'd have people invest their time in (round here) and would probably
regard even the debate as wasted effort (and remember that is a most closely
guarded resource on the project). PG does allow you to write your own
storage manager...

If you're keen to code, there are many TODO items that are "pre-agreed" as
being required (from earlier discussions). That's probably the best place to
start. Having said that, research into why the code is the way it is can be
interesting and very educational - overall, the code really is written
fairly optimally from a robustness and performance perspective - you need to
be a fair way in to spot opps for improvement. Don't be put off - get stuck
in.

Best Regards, Simon Riggs



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: 7.4.5 losing committed transactions
Next
From: Mahmoud Taghizadeh
Date:
Subject: Re: How to add locale support for each column?