Thread: 'TID index'
I just had a thought that could potentially greatly improve vacuum performance. What about some kind of TID (or does vacuum use CID?) index? This would allow vacuum to visit only the pages it needs to visit. Actually, I guess TID/CID wouldn't even be involved; the only information needed would be if any tuples on a page have been marked deleted. Something as simple as a bitmap could work. Storing TID info might provide added vacuum efficiency, but my guess is it's probably not worth the extra effort. This might not help much for tables that just see a lot of random update activity, but I think it would be very useful for large tables where pages with dead tuples are likely to be a small percentage of the total number of pages. Maintaining this information on a per-transaction basis might prove difficult to do without causing concurrency issues. Luckily, I think this could probably be done in the background without much difficulty. One possibility is to check for dead tuples as pages are written to disk (actually, by definition, there would have to be dead tuples at that point I would think). If memory serves writing these pages is now a background process, so this shouldn't cause contention issues. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
> Jim C. Nasby wrote > I just had a thought that could potentially greatly improve vacuum > performance. What about some kind of TID (or does vacuum use CID?) > index? This would allow vacuum to visit only the pages it needs to > visit. Actually, I guess TID/CID wouldn't even be involved; the only > information needed would be if any tuples on a page have been marked > deleted. Something as simple as a bitmap could work. Storing TID info > might provide added vacuum efficiency, but my guess is it's probably not > worth the extra effort. > > This might not help much for tables that just see a lot of random update > activity, but I think it would be very useful for large tables where > pages with dead tuples are likely to be a small percentage of the total > number of pages. > > Maintaining this information on a per-transaction basis might prove > difficult to do without causing concurrency issues. Luckily, I think > this could probably be done in the background without much difficulty. > One possibility is to check for dead tuples as pages are written to disk > (actually, by definition, there would have to be dead tuples at that > point I would think). If memory serves writing these pages is now a > background process, so this shouldn't cause contention issues. There are many good ideas out there, yet it is almost impossible to find somebody else to implement yours! The acid test is to try and write it... Overall, I agree VACUUM could do with some tuning - and 8.0 has just that. It needs very careful thought to make sure both concurrency and recoverability considerations are fully met in any solution you come up with. Best regards, Simon Riggs
On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote: > There are many good ideas out there, yet it is almost impossible to find > somebody else to implement yours! > > The acid test is to try and write it... > > Overall, I agree VACUUM could do with some tuning - and 8.0 has just that. > It needs very careful thought to make sure both concurrency and > recoverability considerations are fully met in any solution you come up > with. Heh, I wasn't even thinking of implentation yet. :) I fully understand the lack of developers. Unfortunately, I have very little idea on the internals of PGSQL, and I'm decidedly not a C coder. I *might* be able to get something hacked up that stores info in a table (since that would mean all the space management stuff would be handled for me). If this is a worthwhile idea can we at least get a TODO? Would it be useful to come up with a high-level design (something I could probably do)? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
>Jim C. Nasby > On Wed, Sep 15, 2004 at 10:56:28PM +0100, Simon Riggs wrote: > > There are many good ideas out there, yet it is almost impossible to find > > somebody else to implement yours! > > > > The acid test is to try and write it... > > > > Overall, I agree VACUUM could do with some tuning - and 8.0 has > just that. > > It needs very careful thought to make sure both concurrency and > > recoverability considerations are fully met in any solution you come up > > with. > > Before I try and tackle this I wanted to investigate why PostgreSQL's > MVCC depends on vacuum instead of going with an undo log ala Oracle or > InnoDB. The only thing I could find is this thread > (http://tinyurl.com/42opl) which doesn't say all that much. > > Has anyone looked at the performance impact of having to vacuum vs a > method that doesn't recquire vacuuming? Obviously rollback with an undo > log is slow, but it seems it's better to optimize for the normal > situation of committing. > > I'm not looking for a holy war here but I'd like to know if any analysis > has happened to determine how efficient vacuum can be made and if it's > as efficient as just deleting an undo log once you no longer need it. Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole of its history. There is much written on this and you should search some more - references are in the manual. http://citeseer.ist.psu.edu/cache/papers/cs/4130/http:zSzzSzwuarchive.wustl. eduzSzpackageszSzpostgreszSzpaperszSzERL-M87-06.pdf/stonebraker87design.pdf Put simply, MVCC requires vacuum - they are two halves of the overall mechanism. An Undo log would be an alternative to MVCC and therefore offer an alternative to vacuum. But there are other things to consider - no other RDBMS requires a "vacuum", but all of them have SOME requirement for off-line scanning of the database, whether its ANALYZE, or REORG, or dbcc (or more than of those...). Removing vacuum won't remove the other requirements, so there is less to be gained than you might think. IMHO vacuum could be optimised further in terms of what it does and how it does it, but many of those optimizations wouldn't apply generally - so you'd then either be introducing further tweaking requirements for DBAs (which are frowned upon) or giving yourself the additional task of auto-optimizing their deployment. In PostgreSQL 8.0, vacuum has been enhanced so that it operates in the background, avoids locks, doesn't spoil the dbpage cache - on top of being automated. That approach is simple and pragmatic and that's hard to beat. I think the effort of altering the current storage system is not something you'd have people invest their time in (round here) and would probably regard even the debate as wasted effort (and remember that is a most closely guarded resource on the project). PG does allow you to write your own storage manager... If you're keen to code, there are many TODO items that are "pre-agreed" as being required (from earlier discussions). That's probably the best place to start. Having said that, research into why the code is the way it is can be interesting and very educational - overall, the code really is written fairly optimally from a robustness and performance perspective - you need to be a fair way in to spot opps for improvement. Don't be put off - get stuck in. Best Regards, Simon Riggs
On Sat, Sep 25, 2004 at 11:14:53AM +0100, Simon Riggs wrote: > >Jim C. Nasby > > Fair questions. MVCC has been tightly locked into Postgres/SQL for the whole > of its history. There is much written on this and you should search some > more - references are in the manual. Well, not quite it's whole history: MVCC showed up in 6.5. Vacuum's been there since before SQL. Actually, is a bit of a historical accident. My understanding of the squence of events is that Hewlett-Packard donated an early WORM optical drive to the Stonebraker lab. Since it's write once, it had the beavior that you could only append to files. Someone thought it might be useful for auditing, etc., so they wrote the first storage mananger for postgres to accommodate that drive. The other storage manager at the time was for battery-backed, persistent RAM. So, all this append-only writing leads to files with lots of dead tuples, so the vacuum command was added to reclaim space. I think on the WORM drive, this was supposed to mark blocks 'invisible' in some sense. I don't know if the WORM drive ever actually got used with postgres. Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.rice.edu fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > ... So, all this append-only writing leads to files with lots of dead > tuples, so the vacuum command was added to reclaim space. Actually, I believe the original Postgres design envisioned that no tuple would ever be permanently deleted: the idea was that you would always be able to query the database state as of past times as well as the present instant. Stonebraker intended to use the WORM drive as the repository for dead tuples that might be needed to answer such historical queries. The "vacuum cleaner" was originally a background process that pushed dead tuples from magnetic disk to WORM storage. Its current manifestation is dumbed-down from that, since we only delete rows and make no attempt to save them somewhere else --- but it was always an integral part of the system design. It's quite entertaining to read "The design of the POSTGRES storage system" (ERL-M87-06, available at http://db.cs.berkeley.edu//papers/) and compare it to where we are now. There is just enough similarity that it's obviously the ancestor of our present code ... but there is also a lot in that paper that has left *no* trace in our present code. I would love to know just how much of the paper actually got implemented and then discarded, and how much never got beyond the arm-waving stage. regards, tom lane
On P, 2004-09-26 at 09:17, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > > ... So, all this append-only writing leads to files with lots of dead > > tuples, so the vacuum command was added to reclaim space. > > Actually, I believe the original Postgres design envisioned that no > tuple would ever be permanently deleted: the idea was that you would > always be able to query the database state as of past times as well > as the present instant. Stonebraker intended to use the WORM drive as > the repository for dead tuples that might be needed to answer such > historical queries. The "vacuum cleaner" was originally a background > process that pushed dead tuples from magnetic disk to WORM storage. We are now getting back to the point where the "background process" part is true again - how hard would it be to modify vacuum to write recalimed tuples to somewhere (network pipe, WORM drive, other DB). It seems that in addition to writing deleted tuples out to history DB and making create and delete transaction ids explicitly visible (and do something(tm) about the transaction id wraparound while at it), the only thing left to do is some kind of transaction time log - and voila! we have the original Time Travel feature back - a great feature for resolving both "audit trail" and "clumsy dba" problems. The modern WORM drive equivalent is an IDE(-RAID) disk with its very tape-like access profile (3+ hours to write full 300GB disk, random access order(s) of magnitude slower than sequential); So if someone is looking for a project, this seems to be something that is both theoretically possible and also theoretically useful ;) ---------------- Hannu
>Hannu Krosing [mailto:hannu@tm.ee] > On P, 2004-09-26 at 09:17, Tom Lane wrote: > > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > > > ... So, all this append-only writing leads to files with lots of dead > > > tuples, so the vacuum command was added to reclaim space. > > > > Actually, I believe the original Postgres design envisioned that no > > tuple would ever be permanently deleted: the idea was that you would > > always be able to query the database state as of past times as well > > as the present instant. Stonebraker intended to use the WORM drive as > > the repository for dead tuples that might be needed to answer such > > historical queries. The "vacuum cleaner" was originally a background > > process that pushed dead tuples from magnetic disk to WORM storage. > > We are now getting back to the point where the "background process" part > is true again - how hard would it be to modify vacuum to write recalimed > tuples to somewhere (network pipe, WORM drive, other DB). > > It seems that in addition to writing deleted tuples out to history DB > and making create and delete transaction ids explicitly visible (and do > something(tm) about the transaction id wraparound while at it), the only > thing left to do is some kind of transaction time log - and voila! we > have the original Time Travel feature back - a great feature for > resolving both "audit trail" and "clumsy dba" problems. > > The modern WORM drive equivalent is an IDE(-RAID) disk with its very > tape-like access profile (3+ hours to write full 300GB disk, random > access order(s) of magnitude slower than sequential); > > So if someone is looking for a project, this seems to be something that > is both theoretically possible and also theoretically useful ;) > Yes, I thought that too - but not for me, not now. Look here: http://research.microsoft.com/db/ImmortalDB/ for similar Best Regards, Simon Riggs
To answer Simon's earlier comment about if I was looking to start hacking on PostgreSQL... I'm not. :) I might take a look at the TODO again, but I seem to do a great job of finding things to put on my plate as it is. I am interested in minimizing the impact of vacuum, which is why I brought my idea up originally. So, if anything, I'll be much more interested to work on improving vacuum than on something else. Heh, I was about to ask for the pages with dead tuples list to be added to the TODO, but it seems it's already there ('Maintain a map of recently-expired rows'). One thing that isn't there which I remember being discussed was having the page-write daemon do a vacuum of a page before it's written; has this been done already? -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"
On Thu, Sep 30, 2004 at 04:57:37PM -0500, Jim C. Nasby wrote: > Heh, I was about to ask for the pages with dead tuples list to be added > to the TODO, but it seems it's already there ('Maintain a map of > recently-expired rows'). One thing that isn't there which I remember > being discussed was having the page-write daemon do a vacuum of a page > before it's written; has this been done already? Not done, but discussed. One problem is that to vacuum a page, you may need to bring the related index pages to memory, which is more work than you are saving. One idea would be to vacuum the page if it can be determined that the relation doesn't have indexes. This information is generally not known, because the index list isn't constructed until/unless someone explicitly asks for it. But anyway, do you have a lot of tables with no index? And how many of them are big enough to warrant doing this? -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "Having your biases confirmed independently is how scientific progress is made, and hence made our great society what it is today" (Mary Gardiner)
Alvaro Herrera <alvherre@dcc.uchile.cl> writes: > One idea would be to vacuum the page if it can be determined that the > relation doesn't have indexes. This information is generally not known, ... especially not by the page writer. You can't assume that you have access to the relation descriptor --- for instance, it's entirely possible that the bgwriter (or another backend) will need to push out a dirty page that belongs to a newly created relation for which the catalog data remains uncommitted. There are related scenarios involving uncommitted drops. More generally I think that invoking VACUUM processing from the bgwriter would be a serious violation of the module hierarchy, and would inflict more pain in the form of bugs and maintenance headaches than it could possibly be worth. We just this version managed to get smgr decoupled from the relcache, like it should have been all along. (bufmgr should be too, but I haven't tackled that yet...) This was actually a necessary step to make the separate bgwriter feasible. Let's not reverse that cleanup in pursuit of dubious optimizations. regards, tom lane
On Thu, Sep 30, 2004 at 07:11:29PM -0400, Tom Lane wrote: > More generally I think that invoking VACUUM processing from the bgwriter > would be a serious violation of the module hierarchy, and would inflict > more pain in the form of bugs and maintenance headaches than it could > possibly be worth. We just this version managed to get smgr decoupled > from the relcache, like it should have been all along. (bufmgr should > be too, but I haven't tackled that yet...) This was actually a > necessary step to make the separate bgwriter feasible. Let's not > reverse that cleanup in pursuit of dubious optimizations. Yeah, I thought about the same thing. It would certainly be more modular to have a vacuum daemon that runs ahead of the page writer. As for the indexes, would it be reasonable to see if the required index pages were already in memory? Ultimately, it's going to depend on the table and access patterns as to whether this provides a speed improvement. A table with a lot of indexes and a lot of updates per page might not see much benefit. A table where updates are spread across the table would more likely benefit even if it does have to bring index pages into memory. The alternative is the vacuum running later, having to bring in the base data page, and then all the indexes anyway. But once there is a list of pages with dead tuples only one read and one write would be saved, which probably isn't worth the extra code. -- Jim C. Nasby, Database Consultant decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?"