Re: What popular, large commercial websites run - Mailing list pgsql-general

From Scott Marlowe
Subject Re: What popular, large commercial websites run
Date
Msg-id Pine.LNX.4.33.0205021452250.5658-100000@css120.ihs.com
Whole thread Raw
In response to Re: What popular, large commercial websites run  (Shaun Thomas <sthomas@townnews.com>)
List pgsql-general
I just tested this, and it's true.  On my 7.2.1 machine, with booles,
updating 50% of the 100,000 test rows I made, the data usage climbs about
1 Meg, and most of it does not get recovered by vacuum.  I'll compile
7.2.1 (I'm running 7.2) and test it for other datatypes as well.

On Thu, 2 May 2002, Shaun Thomas wrote:

> On Wed, 1 May 2002 postgres@vrane.com wrote:
>
> > I'm very curious to know why you have problem with growing
> > database.  Does the performance suffer significantly
> > if you don't do the FULL vacuum?  Surely if you can
> > afford the oracle you can afford relatively much
> > cheaper storage.  You must have other reasons
> > than just not liking large database
>
> Well, it's not the fact that it's growing that's the problem.  It's the
> fact that 100 actual MB of frequently changed data becomes 2gigs if not
> frequently vacuumed.  Even with hourly full vacuums, it still slowly
> bloats to 200mb in two weeks, with the same amount of data.  The worst
> part about this is that the more it bloats, the longer vacuum takes,
> and the speed of the bloating increases almost exponentially.  Given
> two months, it's back up to 2 gigs.
>
> Full dump and restore?  Back down to 100MB.  I'm sorry, but no amount of
> disk storage should have to compensate for a database growing to 60x
> larger than the actual data stored (the data dump is 30 megs, but you
> can expect a certain amount of bloat due to column sizes).  When the
> data files start hitting 2GB each, the Linux file-size limit comes into
> play, and you have no choice but to dump and restore.
>
> Even worse?  Since there is only about 100 MB of real data in that 2GB
> morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean
> up.  That makes all subsequent vacuums slower, which makes their locks
> last longer, which means all selects on the tables being vacuumed are
> stalled until the vacuum is done.  What happens when the vacuum takes
> half an hour, and it's a web application?  Oh darn, you stop serving
> pages that use the database.
>
> Only full vacuum on non-peak times, you say?  We tried that.  The
> datafiles exploded to 2GB within days.  *DAYS*  The hourly vacuum
> brought it under control, but our insert script which runs every hour
> and replaces about 10% of the data per run, really cries bloody murder
> while the vacuum is running.  As a result, this machine commonly has a
> load approaching 20 almost all the time.  Turn off postgres?  Less than
> 1, even with a concurrent Mysql DB that has a *valid* 2GB database that
> contains 2GB of actual data.
>
> I'm not passing blame.  I'm not even angry.  I'm just frustrated and
> tired of babying Postgres so it doesn't cause the server to burst into
> flames, crash, or otherwise fail.  I actually had less admin overhead
> with an Oracle database.  Anyone who has used Oracle knows just how hard
> it is to administer, but I sure as hell never had to completely dump and
> restore the data every month to keep it from eating my drives.  I mean,
> what good is MVCC to avoid locking, when frequent full vacuums are even
> worse?  I'd rather have a little locking contention, than have a
> completely useless database for ten to twenty minutes, every hour.
>
> Heck, maybe it's just our special case, that such a high rate of data
> turnover just bewilders postgres.  But as of now, it's just plain
> unusable.  Until vacuum goes away completely, which it never will as
> long as MVCC is in place, postgres is not an option for us.  I just
> can't see any way around it.  I hate mysql when I want things like
> foreign keys or subselects, so we still need a full DBMS.  Hell, we're
> even considering giving a look to freaking Interbase, for the love of
> God.  We're desperate, here.  ^_^
>
> As a side note, Sybase used to be our DBMS of choice, but it didn't play
> nice with PHP (segfaults on connects, sometimes), and only old versions
> fall under the free license, so we ditched it too.  It also had weird
> syntax (go, for crying out loud!?), so our developers hated it.
>
> If I could only take all the good things from the databases I liked, and
> make them one database...  Ease of administration of Mysql + Features of
> Oracle, for instance.  But that database is only available in the
> wonderful, magical world of 'you must be kidding' land.  Oh well.
>
>


pgsql-general by date:

Previous
From: Scott Marlowe
Date:
Subject: Re: FATAL: stuck spinlock
Next
From: Jean-Michel POURE
Date:
Subject: Re: PL/j - java stored procedures