Thread: Long-running performance (MVCC, Vacuum, etc.) - Any fix?
I'm building an appliance where I don't want my customers having to tune postgresql in any way from the configuration when I install it. I don't even want them to know it is there. Yet in my study of posgresql, it seems that even conscientious use of a vacuum daemon or cron job in vacuum's various forms leaves databases after a while much larger and lower performing than the actual space and performance after a dump/restore. It seems to me that with MVCC, an UPDATE (for example) is really a DELETE+INSERT and it takes VACUUM to mark the old DELETED row as free space again. So changing one boolean or increasing one counter in a row of 100 bytes will relocate this row in a different location. While this in itself isn't terrible as some DB page is dirty no matter what, unless I'm lucky enough to insert into the same page that changed, my index(es) for this table will have to be changed and this will dirty yet another page that wouldn't be dirty without MVCC. And this is not to mention the resources required by VACUUM. And the disk bloat over time would imply that postgresql doesn't always efficiently use free-space with records/rows of similar size and so performance degrades as there are fewer rows per page over time. My application is multi-threaded and I could care less about the concurrency afforded by MVCC. I'd rather just update the things in place and get less postgresql concurrency but more consistent long-running performance and disk space utilization. Is my interpretation correct? Is there a way to turn off MVCC? Do fixed sized rows help any? Is there anybody using this thing in an appliance type application? I'm quite far along with stored-procedures and whatnot, but if postgresql really isn't the right solution due to these reasons, I'm curious if anybody has alternate OpenSource suggestions? (I'm actually migrating from sleepycat bsddb at the moment because I didn't realize the licensing costs involved there. My needs really aren't that extensive really - a few associative indexes, cascading delete, etc.) My large data sets will have the following characteristics: 5,000,000 Rows x 50 bytes/row (could be fixed) w/1 multi-column index, 1 single-column timestamp index AT 1,000,000 index searches, 300,000 reads, 150,000 updates per day 720,000 Rows x 32 bytes/row fixed w/ 1 multi-column index, 1 single-column timestamp index AT 150,000 index search, 150,000 updates, 100s of reads per day 2,000,000 Rows x 4000 bytes avg/row AT 100,000 inserts per day, 150,000 reads per day Thanks, Eric Eric Brown 408-571-6341 www.propel.com
On Tue, Dec 28, 2004 at 02:40:52AM -0800, Eric Brown wrote: > I'm building an appliance where I don't want my customers having to > tune postgresql in any way from the configuration when I install it. I > don't even want them to know it is there. Yet in my study of posgresql, > it seems that even conscientious use of a vacuum daemon or cron job in > vacuum's various forms leaves databases after a while much larger and > lower performing than the actual space and performance after a > dump/restore. <snip> Firstly, you havn't said what version of PostgreSQL you are using. Recent versions behave better. Also, the autovacuum daemon will probably do most of what you want... > My application is multi-threaded and I could care less about the > concurrency afforded by MVCC. I'd rather just update the things in > place and get less postgresql concurrency but more consistent > long-running performance and disk space utilization. > > Is my interpretation correct? Yes. Although, for a simple counter you might be able to use sequences, depending on the exact semantics. They don't take any additional space when you increment them. > Is there a way to turn off MVCC? Nope. This is a Relational Database with everything that entails. If you don't want that, maybe you should look into a database designed to be embedded (SQLDB iirc). > Do fixed sized rows help any? I doubt it. > Is there anybody using this thing in an appliance type application? Probably, but I don't know where... > I'm quite far along with stored-procedures and whatnot, but if > postgresql really isn't the right solution due to these reasons, I'm > curious if anybody has alternate OpenSource suggestions? (I'm actually > migrating from sleepycat bsddb at the moment because I didn't realize > the licensing costs involved there. My needs really aren't that > extensive really - a few associative indexes, cascading delete, etc.) I'd suggest, use the autovacuum daemon, tune your FSM settings to what you expect the load to be and maybe schedule a database-wide REINDEX/VACUUM FULL regularly. We found doing it once a month was enough to keep it tip-top over the long term... Hope this helps, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
>>>>> "MvO" == Martijn van Oosterhout <kleptog@svana.org> writes: MvO> I'd suggest, use the autovacuum daemon, tune your FSM settings to what MvO> you expect the load to be and maybe schedule a database-wide MvO> REINDEX/VACUUM FULL regularly. We found doing it once a month was MvO> enough to keep it tip-top over the long term... If you know your application usage patterns well, scheduling vacuum via cron probably would be better than autovacuum. autovacuum runs whenever it pleases, and doesn't deal well with extremely large tables (ie, it never runs vacuum on them since the % change may be small even though millions of rows are "expired"). You can even vacuum differnt tables on different schedules. Just make sure the VACUUMs don't overlap! I haven't run reindex or vacuum full since I upgraded to 7.4.x. The indexes are not bloating, and sufficient calls to vacuum keep the "stable set size" of the db pretty constant. ie, it doesn't grow without bounds, as the OP fears. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D. Khera Communications, Inc. Internet: khera@kciLink.com Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/
Eric Brown <eric.brown@propel.com> writes: > My application is multi-threaded and I could care less about the concurrency > afforded by MVCC. I'd rather just update the things in place and get less > postgresql concurrency but more consistent long-running performance and disk > space utilization. Postgres uses MVCC for guaranteeing that it can roll back transactions too... > Is my interpretation correct? > Is there a way to turn off MVCC? > Do fixed sized rows help any? Postgres doesn't do anything special for fixed size rows. I would suggest you look at pg_autovacuum, but in general while a maintenance-free system is everyone's goal it's pretty far off. -- greg