Re: 15,000 tables - Mailing list pgsql-performance

From Michael Riess
Subject Re: 15,000 tables
Date
Msg-id dmnj8g$1hrt$1@news.hub.org
Whole thread Raw
In response to Re: 15,000 tables  (Chris Browne <cbbrowne@acm.org>)
Responses Re: 15,000 tables
Re: 15,000 tables
Re: 15,000 tables
Re: 15,000 tables
List pgsql-performance
> Michael Riess <mlriess@gmx.de> writes:
>>> On 12/1/05, Michael Riess <mlriess@gmx.de> wrote:
>>>> we are currently running a postgres server (upgraded to 8.1) which
>>>> has one large database with approx. 15,000 tables. Unfortunately
>>>> performance suffers from that, because the internal tables
>>>> (especially that which holds the attribute info) get too large.
>>>>
>>>> (We NEED that many tables, please don't recommend to reduce them)
>>>>
>>> Have you ANALYZEd your database? VACUUMing?
>> Of course ... before 8.1 we routinely did a vacuum full analyze each
>> night. As of 8.1 we use autovacuum.
>
> VACUUM FULL was probably always overkill, unless "always" includes
> versions prior to 7.3...

Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL,
but the database got considerably slower near the end of the week.

>
>>> BTW, are you using some kind of weird ERP? I have one that treat
>>> informix as a fool and don't let me get all of informix potential...
>>> maybe the same is in your case...
>> No. Our database contains tables for we content management
>> systems. The server hosts approx. 500 cms applications, and each of
>> them has approx. 30 tables.
>>
>> That's why I'm asking if it was better to have 500 databases with 30
>> tables each. In previous Postgres versions this led to even worse
>> performance ...
>
> This has the feeling of fitting with Alan Perlis' dictum below...
>
> Supposing you have 500 databases, each with 30 tables, each with 4
> indices, then you'll find you have, on disk...
>
> # of files = 500 x 30 x 5 = 75000 files
>
> If each is regularly being accessed, that's bits of 75000 files
> getting shoved through OS and shared memory caches.  Oh, yes, and
> you'll also have regular participation of some of the pg_catalog
> files, with ~500 instances of THOSE, multiplied some number of ways...
>

Not all of the tables are frequently accessed. In fact I would estimate
that only 20% are actually used ... but there is no way to determine if
or when a table will be used. I thought about a way to "swap out" tables
which have not been used for a couple of days ... maybe I'll do just
that. But it would be cumbersome ... I had hoped that an unused table
does not hurt performance. But of course the internal tables which
contain the meta info get too large.

> An application with 15000 frequently accessed tables doesn't strike me
> as being something that can possibly turn out well.  You have, in
> effect, more tables than (arguably) bloated ERP systems like SAP R/3;
> it only has a few thousand tables, and since many are module-specific,
> and nobody ever implements *all* the modules, it is likely only a few
> hundred that are "hot spots."  No 15000 there..

I think that my systems confirms with the 80/20 rule ...
.

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Insert performance slows down in large batch
Next
From: Jaime Casanova
Date:
Subject: Re: 15,000 tables