Re: What popular, large commercial websites run - Mailing list pgsql-general
From | Scott Marlowe |
---|---|
Subject | Re: What popular, large commercial websites run |
Date | |
Msg-id | Pine.LNX.4.33.0205021452250.5658-100000@css120.ihs.com Whole thread Raw |
In response to | Re: What popular, large commercial websites run (Shaun Thomas <sthomas@townnews.com>) |
List | pgsql-general |
I just tested this, and it's true. On my 7.2.1 machine, with booles, updating 50% of the 100,000 test rows I made, the data usage climbs about 1 Meg, and most of it does not get recovered by vacuum. I'll compile 7.2.1 (I'm running 7.2) and test it for other datatypes as well. On Thu, 2 May 2002, Shaun Thomas wrote: > On Wed, 1 May 2002 postgres@vrane.com wrote: > > > I'm very curious to know why you have problem with growing > > database. Does the performance suffer significantly > > if you don't do the FULL vacuum? Surely if you can > > afford the oracle you can afford relatively much > > cheaper storage. You must have other reasons > > than just not liking large database > > Well, it's not the fact that it's growing that's the problem. It's the > fact that 100 actual MB of frequently changed data becomes 2gigs if not > frequently vacuumed. Even with hourly full vacuums, it still slowly > bloats to 200mb in two weeks, with the same amount of data. The worst > part about this is that the more it bloats, the longer vacuum takes, > and the speed of the bloating increases almost exponentially. Given > two months, it's back up to 2 gigs. > > Full dump and restore? Back down to 100MB. I'm sorry, but no amount of > disk storage should have to compensate for a database growing to 60x > larger than the actual data stored (the data dump is 30 megs, but you > can expect a certain amount of bloat due to column sizes). When the > data files start hitting 2GB each, the Linux file-size limit comes into > play, and you have no choice but to dump and restore. > > Even worse? Since there is only about 100 MB of real data in that 2GB > morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean > up. That makes all subsequent vacuums slower, which makes their locks > last longer, which means all selects on the tables being vacuumed are > stalled until the vacuum is done. What happens when the vacuum takes > half an hour, and it's a web application? Oh darn, you stop serving > pages that use the database. > > Only full vacuum on non-peak times, you say? We tried that. The > datafiles exploded to 2GB within days. *DAYS* The hourly vacuum > brought it under control, but our insert script which runs every hour > and replaces about 10% of the data per run, really cries bloody murder > while the vacuum is running. As a result, this machine commonly has a > load approaching 20 almost all the time. Turn off postgres? Less than > 1, even with a concurrent Mysql DB that has a *valid* 2GB database that > contains 2GB of actual data. > > I'm not passing blame. I'm not even angry. I'm just frustrated and > tired of babying Postgres so it doesn't cause the server to burst into > flames, crash, or otherwise fail. I actually had less admin overhead > with an Oracle database. Anyone who has used Oracle knows just how hard > it is to administer, but I sure as hell never had to completely dump and > restore the data every month to keep it from eating my drives. I mean, > what good is MVCC to avoid locking, when frequent full vacuums are even > worse? I'd rather have a little locking contention, than have a > completely useless database for ten to twenty minutes, every hour. > > Heck, maybe it's just our special case, that such a high rate of data > turnover just bewilders postgres. But as of now, it's just plain > unusable. Until vacuum goes away completely, which it never will as > long as MVCC is in place, postgres is not an option for us. I just > can't see any way around it. I hate mysql when I want things like > foreign keys or subselects, so we still need a full DBMS. Hell, we're > even considering giving a look to freaking Interbase, for the love of > God. We're desperate, here. ^_^ > > As a side note, Sybase used to be our DBMS of choice, but it didn't play > nice with PHP (segfaults on connects, sometimes), and only old versions > fall under the free license, so we ditched it too. It also had weird > syntax (go, for crying out loud!?), so our developers hated it. > > If I could only take all the good things from the databases I liked, and > make them one database... Ease of administration of Mysql + Features of > Oracle, for instance. But that database is only available in the > wonderful, magical world of 'you must be kidding' land. Oh well. > >
pgsql-general by date: