Thread: Re: What popular, large commercial websites run

Re: What popular, large commercial websites run

From

pgsql-gen Newsgroup (@Basebeans.com)

Date:

30 April 2002, 00:10:02

Subject: Re: [GENERAL] What popular, large commercial websites run
From: Vic Cekvenich <vic@basebeans.com>
 ===
Would profit margin or cost of operation impress managment? Low cost =
profit.
The way I see it, some managers will buy Oracle. They will have low
profit margines. Some programers will use PostgreSQL. They will have
high margins.
One day the profitable company buys the non-profitable company and then
the managers of the company that got taken over are all fired.
Problem is that managers don't know IT, they will... fade away.

Vic

Steve Lane wrote:
> On 4/29/02 9:58 AM, "Fran Fabrizio" <ffabrizio@mmrd.com> wrote:
>
>
>>>Hi, I think this may also be interesting for the mailing list then :
>>>this is a copy of a message found in this list a few days or weeks
>>
>>ago. >I kept it because I thought it might be useful some day :-)
>>
>>>Arnaud
>>
>>Yes, this message was originally posted to this very same mailing list. =)
>>
>>It's good information, but again not the type for presenting to
>>management.  Andrew Sullivan does weigh in (he's the .info guy) but
>>other than that it's a lot of "we're not in production yet", "we use it
>>for our smaller clients", or things like the "regional foundation for
>>contemporary art" in France, which although interesting and good to
>>know, does not excite management.
>>
>>When I say it would be nice to have some large commercial corporations
>>to reference, I mean like Fortune 1000 companies or at least companies
>>business types have heard about.  They want to be able to say "If it's
>>good enough for IBM and Ford Motor Co., it's good enough for us."  It's
>>frustrating and stupid and the wrong way to think about things, but they
>>are the ones who make the decisions and so we have to speak their
>>language.  They want name recognition.
>>
>>I'm sure Pg is being used in some of these companies, but we just don't
>>know about it.  That info that was just posted about US Federal Govt
>>agencies using it was great, by the way.  Thanks!
>
>
> We are using Pg as the back end for two quite significant educational
> applications. One is in production, one still in development. That in
> production is a system to track the special education population of a large
> Midwestern state -- not a huge application in terms of data, but huge in
> terms of criticality. It is in pilot now, and is slowly being rolled out to
> the rest of the state. Ultimately it will track dozens of pages of
> documentation for each of the state's 40K special ed students.
>
> -- sgl
>
>
> =======================================================
> Steve Lane
>
> Vice President
> Chris Moyer Consulting, Inc.
> 833 West Chicago Ave Suite 203
>
> Voice: (312) 433-2421       Email: slane@fmpro.com
> Fax:   (312) 850-3930       Web:   http://www.fmpro.com
> =======================================================
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

01 May 2002, 16:20:42

On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote:

> The way I see it, some managers will buy Oracle. They will have low
> profit margines. Some programers will use PostgreSQL. They will have
> high margins.

That's all well and good provided postgres and Oracle were 100% feature
compatible.  They're not.  Want inter-database queries?  Too bad.
Replication?  Nope.  Parallel queries?  Scratch that.  Packages?  Big
goose-egg.  ADA-style error catching?  Zero.  In/Out variables?  Only in
your dreams, buddy.  Views that take parameters?  Zilch.  Wanna actually
drop your foreign keys, or change their status or triggering order for
loading purposes?  Not here.

Are these issues being addressed?  Sure they are.  But I think I've
pretty much proven that Oracle has things Postgres doesn't, and can do
things Postgres can't.  If you're a large corporation that needs
replication, parallel database queries, use of rollback segments instead
of MVCC (to avoid tainting the datafiles with invalid/old data), you
don't have a choice.  It's either Oracle or DB2, really.

The fact is, we've used postgres for about 5 years now.  I'm
recommending migrating off of it at this very moment.  Why?  MVCC.  When
I finally got sick of doing a full database dump and restore every
month, and a full vacuum every two hours to avoid rampant datafile
growth, I made the official decision to ditch Postgres.

Why are our databases bloating, even after hourly full vacuums?  Because
we have a database with a 50-100% data turnover rate at about 100,000
rows, and postgres just can't handle it.  I've watched our 100mb
database grow to 500mb, then 2gigs.  Full dump and restore?  70mb
again.  Oh, and the spiking load, and table locks that occur during
full vacuums?  Just take the hit, web-surfers be damned.

For us, Oracle keeps live statistics on the data, realtime.  No analyze.
It also uses rollback segments to serve old versions of data when locks
are present, instead of MVCC.  MVCC leaves the old version of the row
*in* the table data, right next to unchanged rows, making vacuum
necessary to clean up, and point to newest row versions without a
sequence scan.  Rollback segments just put the old versions in the
segment, if things change, they reapply the data, and no harm done.  No
datafile growth.  No old versions.  No table scans to find valid rows,
no vacuums.

Does it cost more?  Sure.  But until Postgres can solve these important
problems, we have no other choice; regardless of how much we want to go
the cheaper route.  It's not always about money.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

postgres@vrane.com

Date:

01 May 2002, 18:26:07

On Wed, May 01, 2002 at 02:52:21PM -0500, Shaun Thomas wrote:
> On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote:
>
> > The way I see it, some managers will buy Oracle. They will have low
> > profit margines. Some programers will use PostgreSQL. They will have
> > high margins.
>
>
> Why are our databases bloating, even after hourly full vacuums?  Because
> we have a database with a 50-100% data turnover rate at about 100,000
> rows, and postgres just can't handle it.  I've watched our 100mb
> database grow to 500mb, then 2gigs.  Full dump and restore?  70mb
> again.  Oh, and the spiking load, and table locks that occur during
> full vacuums?  Just take the hit, web-surfers be damned.
>

I'm very curious to know why you have problem with growing
database.  Does the performance suffer significantly
if you don't do the FULL vacuum?  Surely if you can
afford the oracle you can afford relatively much
cheaper storage.  You must have other reasons
than just not liking large database

Thanks

Re: What popular, large commercial websites run

From

"Nigel J. Andrews"

Date:

01 May 2002, 19:02:26

On Wed, 1 May 2002 postgres@vrane.com wrote:

> On Wed, May 01, 2002 at 02:52:21PM -0500, Shaun Thomas wrote:
> > On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote:
> >
> > > The way I see it, some managers will buy Oracle. They will have low
> > > profit margines. Some programers will use PostgreSQL. They will have
> > > high margins.
> >
> >
> > Why are our databases bloating, even after hourly full vacuums?  Because
> > we have a database with a 50-100% data turnover rate at about 100,000
> > rows, and postgres just can't handle it.  I've watched our 100mb
> > database grow to 500mb, then 2gigs.  Full dump and restore?  70mb
> > again.  Oh, and the spiking load, and table locks that occur during
> > full vacuums?  Just take the hit, web-surfers be damned.
> >
>
> I'm very curious to know why you have problem with growing
> database.  Does the performance suffer significantly
> if you don't do the FULL vacuum?  Surely if you can
> afford the oracle you can afford relatively much
> cheaper storage.  You must have other reasons
> than just not liking large database


What you want to do to see an issue is:

Create a table with a boolean column
Then fill it with 50000 rows with the boolean field set to true and 50000 rows
with the boolean field set to false;
Analyse the table

Then run
   EXPLAIN ANALYZE SELECT * FROM TABLE <mytable> WHERE <booleanfield> = true;

Then run
   UPDATE <mytable> SET <booleanfield> = false WHERE <booleanfield> = false;

(several times)

Then retry the EXPLAIN ANALYZE SELECT ... from above again, after analysing
the table again obviously.

What you should see is a seqscan in both plans with the final number of rows
the same but the second will have taken longer.

Now, I haven't tried that exact scenario myself so don't quote me :) and I'm
also assuming there's no caching of the results from the first explain through
to the second.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: What popular, large commercial websites run

From

postgres@vrane.com

Date:

01 May 2002, 19:21:31

On Wed, May 01, 2002 at 11:54:37PM +0100, Nigel J. Andrews wrote:
>
> On Wed, 1 May 2002 postgres@vrane.com wrote:
>
> Then run
>    EXPLAIN ANALYZE SELECT * FROM TABLE <mytable> WHERE <booleanfield> = true;
>
> Then run
>    UPDATE <mytable> SET <booleanfield> = false WHERE <booleanfield> = false;
>
> (several times)
>
> What you should see is a seqscan in both plans with the final number of rows
> the same but the second will have taken longer.

I haven't tried it yet but isn't creating an index here
the logical thing to do here.  Whenever I have "where"
clause in my queries I always create an index.

Re: What popular, large commercial websites run

From

"Nigel J. Andrews"

Date:

01 May 2002, 19:32:43

On Wed, 1 May 2002 postgres@vrane.com wrote:

> On Wed, May 01, 2002 at 11:54:37PM +0100, Nigel J. Andrews wrote:
> >
> > On Wed, 1 May 2002 postgres@vrane.com wrote:
> >
> > Then run
> >    EXPLAIN ANALYZE SELECT * FROM TABLE <mytable> WHERE <booleanfield> = true;
> >
> > Then run
> >    UPDATE <mytable> SET <booleanfield> = false WHERE <booleanfield> = false;
> >
> > (several times)
> >
> > What you should see is a seqscan in both plans with the final number of rows
> > the same but the second will have taken longer.
>
> I haven't tried it yet but isn't creating an index here
> the logical thing to do here.  Whenever I have "where"
> clause in my queries I always create an index.

Damn, knew I'd miss something out. Yes, it would seem a good idea to create an
index on that boolean column. I'd still expect the second query (in the explain
analyze) to take longer to complete. I think I might give this a go myself
sometime just to check.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: What popular, large commercial websites run

From

Tom Lane

Date:

01 May 2002, 22:41:50

On Wed, May 01, 2002 at 02:52:21PM -0500, Shaun Thomas wrote:
> Why are our databases bloating, even after hourly full vacuums?  Because
> we have a database with a 50-100% data turnover rate at about 100,000
> rows, and postgres just can't handle it.  I've watched our 100mb
> database grow to 500mb, then 2gigs.  Full dump and restore?  70mb
> again.  Oh, and the spiking load, and table locks that occur during
> full vacuums?  Just take the hit, web-surfers be damned.

Have you tried running frequent (more than hourly) non-full vacuums?

            regards, tom lane

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

02 May 2002, 11:46:33

On Wed, 1 May 2002 postgres@vrane.com wrote:

> I'm very curious to know why you have problem with growing
> database.  Does the performance suffer significantly
> if you don't do the FULL vacuum?  Surely if you can
> afford the oracle you can afford relatively much
> cheaper storage.  You must have other reasons
> than just not liking large database

Well, it's not the fact that it's growing that's the problem.  It's the
fact that 100 actual MB of frequently changed data becomes 2gigs if not
frequently vacuumed.  Even with hourly full vacuums, it still slowly
bloats to 200mb in two weeks, with the same amount of data.  The worst
part about this is that the more it bloats, the longer vacuum takes,
and the speed of the bloating increases almost exponentially.  Given
two months, it's back up to 2 gigs.

Full dump and restore?  Back down to 100MB.  I'm sorry, but no amount of
disk storage should have to compensate for a database growing to 60x
larger than the actual data stored (the data dump is 30 megs, but you
can expect a certain amount of bloat due to column sizes).  When the
data files start hitting 2GB each, the Linux file-size limit comes into
play, and you have no choice but to dump and restore.

Even worse?  Since there is only about 100 MB of real data in that 2GB
morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean
up.  That makes all subsequent vacuums slower, which makes their locks
last longer, which means all selects on the tables being vacuumed are
stalled until the vacuum is done.  What happens when the vacuum takes
half an hour, and it's a web application?  Oh darn, you stop serving
pages that use the database.

Only full vacuum on non-peak times, you say?  We tried that.  The
datafiles exploded to 2GB within days.  *DAYS*  The hourly vacuum
brought it under control, but our insert script which runs every hour
and replaces about 10% of the data per run, really cries bloody murder
while the vacuum is running.  As a result, this machine commonly has a
load approaching 20 almost all the time.  Turn off postgres?  Less than
1, even with a concurrent Mysql DB that has a *valid* 2GB database that
contains 2GB of actual data.

I'm not passing blame.  I'm not even angry.  I'm just frustrated and
tired of babying Postgres so it doesn't cause the server to burst into
flames, crash, or otherwise fail.  I actually had less admin overhead
with an Oracle database.  Anyone who has used Oracle knows just how hard
it is to administer, but I sure as hell never had to completely dump and
restore the data every month to keep it from eating my drives.  I mean,
what good is MVCC to avoid locking, when frequent full vacuums are even
worse?  I'd rather have a little locking contention, than have a
completely useless database for ten to twenty minutes, every hour.

Heck, maybe it's just our special case, that such a high rate of data
turnover just bewilders postgres.  But as of now, it's just plain
unusable.  Until vacuum goes away completely, which it never will as
long as MVCC is in place, postgres is not an option for us.  I just
can't see any way around it.  I hate mysql when I want things like
foreign keys or subselects, so we still need a full DBMS.  Hell, we're
even considering giving a look to freaking Interbase, for the love of
God.  We're desperate, here.  ^_^

As a side note, Sybase used to be our DBMS of choice, but it didn't play
nice with PHP (segfaults on connects, sometimes), and only old versions
fall under the free license, so we ditched it too.  It also had weird
syntax (go, for crying out loud!?), so our developers hated it.

If I could only take all the good things from the databases I liked, and
make them one database...  Ease of administration of Mysql + Features of
Oracle, for instance.  But that database is only available in the
wonderful, magical world of 'you must be kidding' land.  Oh well.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

02 May 2002, 11:47:23

On Wed, 1 May 2002 postgres@vrane.com wrote:

> > What you should see is a seqscan in both plans with the final number of rows
> > the same but the second will have taken longer.
>
> I haven't tried it yet but isn't creating an index here
> the logical thing to do here.  Whenever I have "where"
> clause in my queries I always create an index.

An index... on a boolean?  My head hurts, stop it.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

02 May 2002, 11:47:35

On Wed, 1 May 2002, Tom Lane wrote:

> Have you tried running frequent (more than hourly) non-full vacuums?

I'd love to.  But one of our customers is keeping us from upgrading to
7.2. (the bastards.)  Full vacuums are my only choice for the time
being.  But knowing how vacuum works, how would this help?  It would
keep our queries nice and speedy, but unless Postgres has a new, magical
way of knowing outdated versions of a row are actually outdated, the
overall file bloating will be the same, and we'd still need the full
vacuums every hour.

If vacuum in 7.2 marks the row as outdated, and new data can be
placed *over* it, or a full vacuum completely obliterates *all* rows
marked as invalid, sure.  But if it keeps some kind of list of all
recent row versions to avoid a sequence scan through all the invalid
rows, should that list grow large, I could see it losing track with
our large turnover rate.

What to do...

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

Tom Lane

Date:

02 May 2002, 13:37:43

Shaun Thomas <sthomas@townnews.com> writes:
>> Have you tried running frequent (more than hourly) non-full vacuums?

> I'd love to.  But one of our customers is keeping us from upgrading to
> 7.2. (the bastards.)  Full vacuums are my only choice for the time
> being.  But knowing how vacuum works, how would this help?  It would
> keep our queries nice and speedy, but unless Postgres has a new, magical
> way of knowing outdated versions of a row are actually outdated, the
> overall file bloating will be the same, and we'd still need the full
> vacuums every hour.

It's not any more or less magical than the old vacuum's way of knowing
that it can remove rows.  It is a different way of looking at things
though: you try to keep a steady-state percentage of free space in a
table.  7.1 and before could only append new rows to the end of a table,
so you had to full-vacuum down a table to minimum size and then let it
grow again with updates.  With the new style vacuum, we don't try to
compress the table, we just record where there's free space due to
removal of dead tuples.  Then insertions reuse that space.  So if you
run a new-style vacuum after updating say 10% or 25% of the rows, you
can maintain a steady state table size that's say 10% or 25% larger
than the theoretical minimum.

In an installation with a lot of update traffic, you may need to
increase the default size of the free space map to ensure that you
can keep track of all the free space in your heavily-updated tables.
We're still learning about the best way to tune those configuration
parameters.

            regards, tom lane

Re: What popular, large commercial websites run

From

postgres@vrane.com

Date:

02 May 2002, 14:31:13

On Thu, May 02, 2002 at 09:54:12AM -0500, Shaun Thomas wrote:
> On Wed, 1 May 2002 postgres@vrane.com wrote:
>
> > I'm very curious to know why you have problem with growing
> > database.  Does the performance suffer significantly
> > if you don't do the FULL vacuum?  Surely if you can
> > afford the oracle you can afford relatively much
> > cheaper storage.  You must have other reasons
> > than just not liking large database
>
> Well, it's not the fact that it's growing that's the problem.  It's the
> fact that 100 actual MB of frequently changed data becomes 2gigs if not
> frequently vacuumed.  Even with hourly full vacuums, it still slowly

Maybe I did not make myself clear. I don't care
whether it's growing or bloating or whatever.  If you don't
do the FULL vacuum does your performance suffer?   Considering
that 7.2 gives you non-full vacuum I want to know
whether there are cases out there only FULL
vacuum hourly will solve/not solve.

The only problem I can see your very long essay is linux file
size limit.  Well that is not postgres fault.  You surely
know how to get around that problem by upgrading the system
or even switching to a different system without that limit.
Have you compared the work involved in migrating from one db to another
with one os to another?  I would think they are both
equally involved.

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

02 May 2002, 14:32:13

On Thu, 2 May 2002, Tom Lane wrote:

> Shaun Thomas <sthomas@townnews.com> writes:
> >> Have you tried running frequent (more than hourly) non-full vacuums?
>
> > I'd love to.  But one of our customers is keeping us from upgrading to
> > 7.2. (the bastards.)  Full vacuums are my only choice for the time
> > being.  But knowing how vacuum works, how would this help?  It would
> > keep our queries nice and speedy, but unless Postgres has a new, magical
> > way of knowing outdated versions of a row are actually outdated, the
> > overall file bloating will be the same, and we'd still need the full
> > vacuums every hour.
>
> With the new style vacuum, we don't try to
> compress the table, we just record where there's free space due to
> removal of dead tuples.  Then insertions reuse that space.  So if you
> run a new-style vacuum after updating say 10% or 25% of the rows, you
> can maintain a steady state table size that's say 10% or 25% larger
> than the theoretical minimum.

See, that's what I figured.  I just had it backwards.  I thought you
were tracking valid rows, not invalid ones.  But, I was being stupid,
since the invalid ones aren't likely to outnumber the valid ones.  Duhh.

Either way, having such a lookup before every insert/update doesn't seem
terribly efficient.  It seems like this would displace the slowdown
caused by vacuum to inserts and updates.  For a system with a high
volume of inserts and updates, I'm not sure this would be much of an
improvement.  It would however, get rid of the 10-20 minutes of locked
tables during a full vacuum.

You have to keep a list to avoid a sequence scan for every insert or
update though... so I understand.  But It's also why I consider MVCC
fundamentally flawed.  If Postgres used rollback segments like Oracle,
DB2 or other mainstream RDBMS's, this wouldn't be an issue; only locked
rows are put in the rollback segment, so the versioning is still
available, and it would clean up after itself on commit or rollback.

I almost want to say MVCC was a nice idea for a research project, but it
doesn't really work in practice.  A database with a CVS like model
*seems* like a good idea until you throw vacuum into the mixture.

It all depends on what you need.  This of course wouldn't be an issue
for a data-stor, or data that doesn't change frequently.  For something
like a classified-ad system used by over 500 newspapers, it fails quite
spectacularly in our case.  In truth, this is the *one* thing we hate
about postgres.  That's all.  Just that one thing.  If there were some
way to resolve it, I'd be an enthusiastic supporter.

> In an installation with a lot of update traffic, you may need to
> increase the default size of the free space map to ensure that you
> can keep track of all the free space in your heavily-updated tables.
> We're still learning about the best way to tune those configuration
> parameters.

I'll do that.  I'm just trying to save my poor server.  I'll do almost
anything to reduce its suffering.

You've been a great help.  Thanks.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

Tom Lane

Date:

02 May 2002, 16:47:00

Shaun Thomas <sthomas@townnews.com> writes:
> Either way, having such a lookup before every insert/update doesn't seem
> terribly efficient.  It seems like this would displace the slowdown
> caused by vacuum to inserts and updates.

Actually, I believe not.  With the old scheme, the last page of an
active table was a hot-spot for contention, because every insert/update
on the table had to get an exclusive lock on that page.  With the new
scheme, different backends can be inserting tuples into different pages
concurrently.  (And yes, the FSM is designed to favor that happening;
if multiple pages are recorded as having free space, different backends
will get pointed to different pages to do insertions in.)

The in-memory FSM could itself become a hot-spot for contention, but
the profiles that I've been able to take do not show any indication
that that's a problem in practice.  Any given backend will touch the
FSM only when it's exhausted the free space in its current target
insert page, so the traffic to the FSM is significantly less than the
total insertion traffic.

Also, to the extent that this approach keeps the total table size
constant, we avoid a lot of OS-level overhead for repeated extensions
of the disk file.

So I have no reason to think that insert/update is measurably slower
in this scheme than before, and some reasons to think it may often be
faster.  I have not tried to benchmark the speed of insert/update taken
in isolation, though.

> But It's also why I consider MVCC
> fundamentally flawed.  If Postgres used rollback segments like Oracle,
> DB2 or other mainstream RDBMS's, this wouldn't be an issue; only locked
> rows are put in the rollback segment, so the versioning is still
> available, and it would clean up after itself on commit or rollback.

I do not agree with this reasoning, and I do *not* see that rollback
segments have any advantages over our approach.

            regards, tom lane

Re: What popular, large commercial websites run

From

Scott Marlowe

Date:

02 May 2002, 17:54:58

I just tested this, and it's true.  On my 7.2.1 machine, with booles,
updating 50% of the 100,000 test rows I made, the data usage climbs about
1 Meg, and most of it does not get recovered by vacuum.  I'll compile
7.2.1 (I'm running 7.2) and test it for other datatypes as well.

On Thu, 2 May 2002, Shaun Thomas wrote:

> On Wed, 1 May 2002 postgres@vrane.com wrote:
>
> > I'm very curious to know why you have problem with growing
> > database.  Does the performance suffer significantly
> > if you don't do the FULL vacuum?  Surely if you can
> > afford the oracle you can afford relatively much
> > cheaper storage.  You must have other reasons
> > than just not liking large database
>
> Well, it's not the fact that it's growing that's the problem.  It's the
> fact that 100 actual MB of frequently changed data becomes 2gigs if not
> frequently vacuumed.  Even with hourly full vacuums, it still slowly
> bloats to 200mb in two weeks, with the same amount of data.  The worst
> part about this is that the more it bloats, the longer vacuum takes,
> and the speed of the bloating increases almost exponentially.  Given
> two months, it's back up to 2 gigs.
>
> Full dump and restore?  Back down to 100MB.  I'm sorry, but no amount of
> disk storage should have to compensate for a database growing to 60x
> larger than the actual data stored (the data dump is 30 megs, but you
> can expect a certain amount of bloat due to column sizes).  When the
> data files start hitting 2GB each, the Linux file-size limit comes into
> play, and you have no choice but to dump and restore.
>
> Even worse?  Since there is only about 100 MB of real data in that 2GB
> morass, there's 1.9GB of old or invalid rows that Vacuum didn't clean
> up.  That makes all subsequent vacuums slower, which makes their locks
> last longer, which means all selects on the tables being vacuumed are
> stalled until the vacuum is done.  What happens when the vacuum takes
> half an hour, and it's a web application?  Oh darn, you stop serving
> pages that use the database.
>
> Only full vacuum on non-peak times, you say?  We tried that.  The
> datafiles exploded to 2GB within days.  *DAYS*  The hourly vacuum
> brought it under control, but our insert script which runs every hour
> and replaces about 10% of the data per run, really cries bloody murder
> while the vacuum is running.  As a result, this machine commonly has a
> load approaching 20 almost all the time.  Turn off postgres?  Less than
> 1, even with a concurrent Mysql DB that has a *valid* 2GB database that
> contains 2GB of actual data.
>
> I'm not passing blame.  I'm not even angry.  I'm just frustrated and
> tired of babying Postgres so it doesn't cause the server to burst into
> flames, crash, or otherwise fail.  I actually had less admin overhead
> with an Oracle database.  Anyone who has used Oracle knows just how hard
> it is to administer, but I sure as hell never had to completely dump and
> restore the data every month to keep it from eating my drives.  I mean,
> what good is MVCC to avoid locking, when frequent full vacuums are even
> worse?  I'd rather have a little locking contention, than have a
> completely useless database for ten to twenty minutes, every hour.
>
> Heck, maybe it's just our special case, that such a high rate of data
> turnover just bewilders postgres.  But as of now, it's just plain
> unusable.  Until vacuum goes away completely, which it never will as
> long as MVCC is in place, postgres is not an option for us.  I just
> can't see any way around it.  I hate mysql when I want things like
> foreign keys or subselects, so we still need a full DBMS.  Hell, we're
> even considering giving a look to freaking Interbase, for the love of
> God.  We're desperate, here.  ^_^
>
> As a side note, Sybase used to be our DBMS of choice, but it didn't play
> nice with PHP (segfaults on connects, sometimes), and only old versions
> fall under the free license, so we ditched it too.  It also had weird
> syntax (go, for crying out loud!?), so our developers hated it.
>
> If I could only take all the good things from the databases I liked, and
> make them one database...  Ease of administration of Mysql + Features of
> Oracle, for instance.  But that database is only available in the
> wonderful, magical world of 'you must be kidding' land.  Oh well.
>
>

bool / vacuum full bug.

From

Scott Marlowe

Date:

02 May 2002, 18:07:09

There HAS to be a bug here.  I'm running 7.2, and haven't tried 7.2.1, but
I'm guessing it won't fix this.

with a table with a text and int field, I get the standard behaviour.
Update a lot, file grows.  vacuum (full) it and it shrinks back to it's
original, or nearly so, the same size.  With a bool in the table, things
get ugly fast.

on a 100,000 row table:

table size in blocks: 48500
update test set yn=true where yn=true;
table size in blocks: 49716
vacuum;
table size in blocks: 49688
vacuum full;
table size in blocks: 50832

and it just keeps going and going and going and going...

Something is broken in vacuum and bools.  I haven't tested any types but
text, int, and bool so far.  I'll test varchar, float and numeric next and
let ya know.

Re: What popular, large commercial websites run

From

Grant Johnson

Date:

02 May 2002, 18:25:43

>
>
>
>So I have no reason to think that insert/update is measurably slower
>in this scheme than before, and some reasons to think it may often be
>faster.  I have not tried to benchmark the speed of insert/update taken
>in isolation, though.
>
It has been my experience that inserts/updates are FASTER when there is
some free space in the db from deleted records.

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

03 May 2002, 09:55:08

On Thu, 2 May 2002 postgres@vrane.com wrote:

> The only problem I can see your very long essay is linux file
> size limit.  Well that is not postgres fault.

Dude, wasting 1.9GB is certainly *not* the OS's fault.  Next time your
basement starts flooding, blame your basement for not holding all the
water you didn't ask for, and don't fix the pipe flooding it.  What I
currently have is a drain that drains slower than the pipe is flooding
my basement.  So, you would still think it was the OS's fault if I had
a 5TB cluster that was completely full, but once dumped and restored,
would only consume 100MB?

So yeah.  Throw space at it, that'll fix everything.  I'm off to go buy
the house next to mine so I can keep more of that water...

> I would think they are both equally involved.

Nope.  Only have to switch once if we pick the right replacement.  I've
upgraded postgres (the customer halting us has since relented) and we'll
see.  But I'm not convinced it'll make any difference.  MVCC is a losing
proposition in my opinion.  Period.

Besides, this all started by some twit saying Postgres and Oracle are
interchangable.  They're not.  That's all I'm saying.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

Tom Jenkins

Date:

03 May 2002, 10:47:57

Hi Shaun,

On Fri, 2002-05-03 at 09:54, Shaun Thomas wrote:
>
> Besides, this all started by some twit saying Postgres and Oracle are
> interchangable.  They're not.  That's all I'm saying.
>

Oh, I hope I'm not the "twit" that said Postgres and Oracle are
interchangable.  My point in an earlier post was that for _our_ clients'
applications there is no reason to use/require Oracle; that Postgres
fits.  Its a hammer/wrench type of thing - pick the tool that best
satisfies the job requirements.

--

Tom Jenkins
Development InfoStructure
http://www.devis.com

pgaccess

From

david blood

Date:

03 May 2002, 11:43:27

Is there a newer version of PgAccess? I get this error when I go to design an
existing table a figure that it has to do with the fact that postgres is
serverall version ahead of the last pgaccess version I have??
Thanks for any help
David Blood

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

Re: pgaccess

From

tony

Date:

04 May 2002, 02:17:35

On Fri, 2002-05-03 at 17:39, david blood wrote:
> Is there a newer version of PgAccess? I get this error when I go to design an
> existing table a figure that it has to do with the fact that postgres is
> serverall version ahead of the last pgaccess version I have??

pgaccess works just fine for me on 7.2. What is your problem?

Cheers

Tony Grant
--
RedHat Linux on Sony Vaio C1XD/S
http://www.animaproductions.com/linux2.html
Macromedia UltraDev with PostgreSQL
http://www.animaproductions.com/ultra.html

Re: pgaccess

From

david blood

Date:

06 May 2002, 10:30:17

I should have posted the error.

I get this when I click design on a table.  I also do not see the indexes that
are on a table.

David Blood

ERROR:  Attribute 'oid' not found

ERROR:  Attribute 'oid' not found

    while executing
"pg_select pgsql184 {select oid,indexrelid from pg_index where
(pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec {
        lappend Pg..."
    ("uplevel" body line 1)
    invoked from within
"uplevel pg_select $args"
    (procedure "wpg_select" line 3)
    invoked from within
"wpg_select $CurrentDB "select oid,indexrelid from pg_index where
(pg_class.relname='$PgAcVar(tblinfo,tablename)') and
(pg_class.oid=pg_index.indrelid)..."
    (procedure "refreshTableInformation" line 48)
    invoked from within
"refreshTableInformation"
    (procedure "Tables::design" line 5)
    invoked from within
"Tables::design $objname"
    ("Tables" arm line 2)
    invoked from within
"switch $PgAcVar(activetab) {
    Tables  {
        Tables::design $objname
    }
    Schema  {
        Schema::open $objname
    }
    Queries {
        Queries::design $objname
    }
    V..."
    (procedure "Mainlib::cmd_Design" line 7)
    invoked from within
"Mainlib::cmd_Design"
    invoked from within
".pgaw:Main.btndesign invoke"
    ("uplevel" body line 1)
    invoked from within
"uplevel #0 [list $w invoke]"
    (procedure "tkButtonUp" line 9)
    invoked from within
"tkButtonUp .pgaw:Main.btndesign
"
    (command bound to event)




--- tony <tony@animaproductions.com> wrote:
> On Fri, 2002-05-03 at 17:39, david blood wrote:
> > Is there a newer version of PgAccess? I get this error when I go to design
> an
> > existing table a figure that it has to do with the fact that postgres is
> > serverall version ahead of the last pgaccess version I have??
>
> pgaccess works just fine for me on 7.2. What is your problem?
>
> Cheers
>
> Tony Grant
> --
> RedHat Linux on Sony Vaio C1XD/S
> http://www.animaproductions.com/linux2.html
> Macromedia UltraDev with PostgreSQL
> http://www.animaproductions.com/ultra.html
>


__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

Re: pgaccess

From

Tom Lane

Date:

06 May 2002, 12:40:01

david blood <davidjblood@yahoo.com> writes:
> I get this when I click design on a table.  I also do not see the indexes that
> are on a table.

> ERROR:  Attribute 'oid' not found

>     while executing
> "pg_select pgsql184 {select oid,indexrelid from pg_index where
> (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec {
>         lappend Pg..."

Try using the 7.2 version of pgaccess.  That looks to be 7.1 code...

            regards, tom lane

Re: pgaccess

From

david blood

Date:

06 May 2002, 12:45:14

I am not sure where to get that.  I have gone to the pgaccess site and am using
the latest version downloadable there.

David Blood
--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> david blood <davidjblood@yahoo.com> writes:
> > I get this when I click design on a table.  I also do not see the indexes
> that
> > are on a table.
>
> > ERROR:  Attribute 'oid' not found
>
> >     while executing
> > "pg_select pgsql184 {select oid,indexrelid from pg_index where
> > (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec {
> >         lappend Pg..."
>
> Try using the 7.2 version of pgaccess.  That looks to be 7.1 code...
>
>             regards, tom lane

__________________________________________________
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

Re: pgaccess

From

"Nigel J. Andrews"

Date:

06 May 2002, 13:01:44

On Mon, 6 May 2002, david blood wrote:

> I should have posted the error.
>
> I get this when I click design on a table.  I also do not see the indexes that
> are on a table.
>
> David Blood
>
> ERROR:  Attribute 'oid' not found
>
> ERROR:  Attribute 'oid' not found
>
>     while executing
> "pg_select pgsql184 {select oid,indexrelid from pg_index where
> (pg_class.relname='tblbadword') and (pg_class.oid=pg_index.indrelid)} rec {
>         lappend Pg..."
>     ("uplevel" body line 1)
>     invoked from within
> "uplevel pg_select $args"
>     (procedure "wpg_select" line 3)
>     invoked from within
> "wpg_select $CurrentDB "select oid,indexrelid from pg_index where
> (pg_class.relname='$PgAcVar(tblinfo,tablename)') and
> (pg_class.oid=pg_index.indrelid)..."
>     (procedure "refreshTableInformation" line 48)
>     invoked from within
> "refreshTableInformation"
>
> ... [sniped]B
>
> --- tony <tony@animaproductions.com> wrote:
> > On Fri, 2002-05-03 at 17:39, david blood wrote:
> > > Is there a newer version of PgAccess? I get this error when I go to design
> > an
> > > existing table a figure that it has to do with the fact that postgres is
> > > serverall version ahead of the last pgaccess version I have??
> >
> > pgaccess works just fine for me on 7.2. What is your problem?
> >


I think the answer to this problem is that yes there is a newer version of
PgAccess. There must be since the version with 7.2.1 doesn't try to retrieve
the oid field, which indeed does not exist in pg_index as reported in the error
message.


--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: pgaccess

From

"Nigel J. Andrews"

Date:

06 May 2002, 14:26:46

On Mon, 6 May 2002, david blood wrote:

> I am not sure where to get that.  I have gone to the pgaccess site and am using
> the latest version downloadable there.
>
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > david blood <davidjblood@yahoo.com> writes:
> > > I get this when I click design on a table.  I also do not see the indexes
> > that
> > > are on a table.
> >  [sniped]
> >
> > Try using the 7.2 version of pgaccess.  That looks to be 7.1 code...

I see Tom got there ahead of me.

If you've got a problem you must be running a more recent backend, which means
you must have upgraded it, can't you just upgrade the client tools and things
from the same source you upgraded the backend from?

LOL, my first answer to this that Tom beat me to has just arrived in my mail
box.

--
Nigel J. Andrews
Director

---
Logictree Systems Limited
Computer Consultants

Re: What popular, large commercial websites run

From

Bruce Momjian

Date:

01 June 2002, 15:52:42

Shaun Thomas wrote:
> On Mon, 29 Apr 2002, pgsql-gen Newsgroup wrote:
>
> > The way I see it, some managers will buy Oracle. They will have low
> > profit margines. Some programers will use PostgreSQL. They will have
> > high margins.
>
> That's all well and good provided postgres and Oracle were 100% feature
> compatible.  They're not.  Want inter-database queries?  Too bad.
> Replication?  Nope.  Parallel queries?  Scratch that.  Packages?  Big
> goose-egg.  ADA-style error catching?  Zero.  In/Out variables?  Only in
> your dreams, buddy.  Views that take parameters?  Zilch.  Wanna actually
> drop your foreign keys, or change their status or triggering order for
> loading purposes?  Not here.

Agreed.  We are not 100% feature compatible with Oracle.  We do have
some things Oracle doesn't, and they have some things we don't.  I will
admit they have more things we don't have, and some of the things they
have are really useful, as you pointed out.  However, some of they stuff
they have is just bloat, which can be a negative.

> Why are our databases bloating, even after hourly full vacuums?  Because
> we have a database with a 50-100% data turnover rate at about 100,000
> rows, and postgres just can't handle it.  I've watched our 100mb
> database grow to 500mb, then 2gigs.  Full dump and restore?  70mb
> again.  Oh, and the spiking load, and table locks that occur during
> full vacuums?  Just take the hit, web-surfers be damned.

I have read the thread an no one seems to have addressed your problem
--- that a full vacuum doesn't reduce the db sizes back to 100mb.  There
are two possible causes, one is an old transaction that is keeping those
expired tuples visible (unlikely), and index growth, particularly for
ever-increasing keys, like primary keys.

We have this TODO item:

    * Certain indexes will not shrink, e.g. oid indexes with many inserts

I believe this has not been done because there are some problems with
doing this in a crash-safe manner (?).

Would you see if recreating some of the larger indexes helps reduce your
disk usage after vacuum?

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: What popular, large commercial websites run

From

Shaun Thomas

Date:

03 June 2002, 11:49:35

On Sat, 1 Jun 2002, Bruce Momjian wrote:

> I have read the thread an no one seems to have addressed your problem
> --- that a full vacuum doesn't reduce the db sizes back to 100mb.

Wow, this is an old thread, too.  ^_^  Actually, I copied the vacuumdb
script and changed the parameters/code to be a reindex script.  I
submitted it five times in various forms until it did basically
everything everyone was requesting.  I think I was hinting around that
it should be added to the official codebase because it's a necessary
utility.

Why?  Because that made my problem go away.  The problem was that vacuum
doesn't reindex, so indexes just kept growing.  Either way, there's no
longer a problem.  ^_^

Take care.

--
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+
| Shaun M. Thomas                INN Database Administrator           |
| Phone: (309) 743-0812          Fax  : (309) 743-0830                |
| Email: sthomas@townnews.com    AIM  : trifthen                      |
| Web  : www.townnews.com                                             |
|                                                                     |
|     "Most of our lives are about proving something, either to       |
|      ourselves or to someone else."                                 |
|                                           -- Anonymous              |
+-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-+

Re: What popular, large commercial websites run

From

Jan Wieck

Date:

03 June 2002, 12:43:52

Shaun Thomas wrote:
> On Sat, 1 Jun 2002, Bruce Momjian wrote:
>
> > I have read the thread an no one seems to have addressed your problem
> > --- that a full vacuum doesn't reduce the db sizes back to 100mb.
>
> Wow, this is an old thread, too. ...

Yeah,

    Bruce,  after  2  month  what  about  reading your mailbox in
    reverse order?

    ;-)

Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #

Re: What popular, large commercial websites run

From

Bruce Momjian

Date:

07 June 2002, 22:04:17

Jan Wieck wrote:
> Shaun Thomas wrote:
> > On Sat, 1 Jun 2002, Bruce Momjian wrote:
> >
> > > I have read the thread an no one seems to have addressed your problem
> > > --- that a full vacuum doesn't reduce the db sizes back to 100mb.
> >
> > Wow, this is an old thread, too. ...
>
> Yeah,
>
>     Bruce,  after  2  month  what  about  reading your mailbox in
>     reverse order?

The problem is that the threads are often resurected as new email
messages with no linkage to the old, and subject lines that are very
different, so I can't find if the issue was addressed later or not.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026