Re: 'TID index' - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: 'TID index' |
Date | |
Msg-id | NOEFLCFHBPDAFHEIPGBOKEBFCFAA.simon@2ndquadrant.com Whole thread Raw |
In response to | 'TID index' ("Jim C. Nasby" <decibel@decibel.org>) |
Responses |
Re: 'TID index'
|
List | pgsql-hackers |
>Jim C. Nasby > On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote: > > There are many good ideas out there, yet it is almost impossible to find > > somebody else to implement yours! > > > > The acid test is to try and write it... > > > > Overall, I agree VACUUM could do with some tuning - and 8.0 has > just that. > > It needs very careful thought to make sure both concurrency and > > recoverability considerations are fully met in any solution you come up > > with. > > Before I try and tackle this I wanted to investigate why PostgreSQL's > MVCC depends on vacuum instead of going with an undo log ala Oracle or > InnoDB. The only thing I could find is this thread > (http://tinyurl.com/42opl) which doesn't say all that much. > > Has anyone looked at the performance impact of having to vacuum vs a > method that doesn't recquire vacuuming? Obviously rollback with an undo > log is slow, but it seems it's better to optimize for the normal > situation of committing. > > I'm not looking for a holy war here but I'd like to know if any analysis > has happened to determine how efficient vacuum can be made and if it's > as efficient as just deleting an undo log once you no longer need it. Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole of its history. There is much written on this and you should search some more - references are in the manual. http://citeseer.ist.psu.edu/cache/papers/cs/4130/http:zSzzSzwuarchive.wustl. eduzSzpackageszSzpostgreszSzpaperszSzERL-M87-06.pdf/stonebraker87design.pdf Put simply, MVCC requires vacuum - they are two halves of the overall mechanism. An Undo log would be an alternative to MVCC and therefore offer an alternative to vacuum. But there are other things to consider - no other RDBMS requires a "vacuum", but all of them have SOME requirement for off-line scanning of the database, whether its ANALYZE, or REORG, or dbcc (or more than of those...). Removing vacuum won't remove the other requirements, so there is less to be gained than you might think. IMHO vacuum could be optimised further in terms of what it does and how it does it, but many of those optimizations wouldn't apply generally - so you'd then either be introducing further tweaking requirements for DBAs (which are frowned upon) or giving yourself the additional task of auto-optimizing their deployment. In PostgreSQL 8.0, vacuum has been enhanced so that it operates in the background, avoids locks, doesn't spoil the dbpage cache - on top of being automated. That approach is simple and pragmatic and that's hard to beat. I think the effort of altering the current storage system is not something you'd have people invest their time in (round here) and would probably regard even the debate as wasted effort (and remember that is a most closely guarded resource on the project). PG does allow you to write your own storage manager... If you're keen to code, there are many TODO items that are "pre-agreed" as being required (from earlier discussions). That's probably the best place to start. Having said that, research into why the code is the way it is can be interesting and very educational - overall, the code really is written fairly optimally from a robustness and performance perspective - you need to be a fair way in to spot opps for improvement. Don't be put off - get stuck in. Best Regards, Simon Riggs
pgsql-hackers by date: