Thread: 15,000 tables
Hi, we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Logically these tables could be grouped into 500 databases. My question is: Would performance be better if I had 500 databases (on one postgres server instance) which each contain 30 tables, or is it better to have one large database with 15,000 tables? In the old days of postgres 6.5 we tried that, but performance was horrible with many databases ... BTW: I searched the mailing list, but found nothing on the subject - and there also isn't any information in the documentation about the effects of the number of databases, tables or attributes on the performance. Now, what do you say? Thanks in advance for any comment! Mike
On Thu, 1 Dec 2005, Michael Riess wrote: > Hi, > > we are currently running a postgres server (upgraded to 8.1) which has one > large database with approx. 15,000 tables. Unfortunately performance suffers > from that, because the internal tables (especially that which holds the > attribute info) get too large. is it becouse the internal tables get large, or is it a problem with disk I/O? with 15,000 tables you are talking about a LOT of files to hold these (30,000 files with one index each and each database being small enough to not need more then one file to hold it), on linux ext2/3 this many files in one directory will slow you down horribly. try different filesystems (from my testing and from other posts it looks like XFS is a leading contender), and also play around with the tablespaces feature in 8.1 to move things out of the main data directory into multiple directories. if you do a ls -l on the parent directory you will see that the size of the directory is large if it's ever had lots of files in it, the only way to shrink it is to mv the old directory to a new name, create a new directory and move the files from the old directory to the new one. David Lang
Hi David, > > with 15,000 tables you are talking about a LOT of files to hold these > (30,000 files with one index each and each database being small enough > to not need more then one file to hold it), on linux ext2/3 this many > files in one directory will slow you down horribly. We use ReiserFS, and I don't think that this is causing the problem ... although it would probably help to split the directory up using tablespaces. But thanks for the suggestion!
Hi David, incidentally: The directory which holds our datbase currently contains 73883 files ... do I get a prize or something? ;-) Regards, Mike
On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: > Hi, > > we are currently running a postgres server (upgraded to 8.1) which has > one large database with approx. 15,000 tables. Unfortunately performance > suffers from that, because the internal tables (especially that which > holds the attribute info) get too large. > > (We NEED that many tables, please don't recommend to reduce them) > Have you ANALYZEd your database? VACUUMing? BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Michael Riess <mlriess@gmx.de> writes: > (We NEED that many tables, please don't recommend to reduce them) No, you don't. Add an additional key column to fold together different tables of the same structure. This will be much more efficient than managing that key at the filesystem level, which is what you're effectively doing now. (If you really have 15000 distinct rowtypes, I'd like to know what your database design is...) regards, tom lane
On 12/1/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Michael Riess <mlriess@gmx.de> writes: > > (We NEED that many tables, please don't recommend to reduce them) > > No, you don't. Add an additional key column to fold together different > tables of the same structure. This will be much more efficient than > managing that key at the filesystem level, which is what you're > effectively doing now. > > (If you really have 15000 distinct rowtypes, I'd like to know what > your database design is...) > > regards, tom lane > Maybe he is using some kind of weird ERP... take the case of BaaN (sadly i use it in my work): BaaN creates about 1200 tables per company and i have no control of it... we have about 12000 tables right now... -- regards, Jaime Casanova (DBA: DataBase Aniquilator ;)
Hi, > On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: >> Hi, >> >> we are currently running a postgres server (upgraded to 8.1) which has >> one large database with approx. 15,000 tables. Unfortunately performance >> suffers from that, because the internal tables (especially that which >> holds the attribute info) get too large. >> >> (We NEED that many tables, please don't recommend to reduce them) >> > > Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. > > BTW, are you using some kind of weird ERP? I have one that treat > informix as a fool and don't let me get all of informix potential... > maybe the same is in your case... No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. That's why I'm asking if it was better to have 500 databases with 30 tables each. In previous Postgres versions this led to even worse performance ... Mike
On 01.12.2005, at 17:04 Uhr, Michael Riess wrote: > No. Our database contains tables for we content management systems. > The server hosts approx. 500 cms applications, and each of them has > approx. 30 tables. Just for my curiosity: Are the "about 30 tables" with similar schemas or do they differ much? We have a small CMS system running here, where I have all information for all clients in tables with relationships to a client table. But I assume you are running a pre-build CMS which is not designed for "multi-client ability", right? cug -- PharmaLine, Essen, GERMANY Software and Database Development
Attachment
Hi Tom, > Michael Riess <mlriess@gmx.de> writes: >> (We NEED that many tables, please don't recommend to reduce them) > > No, you don't. Add an additional key column to fold together different > tables of the same structure. This will be much more efficient than > managing that key at the filesystem level, which is what you're > effectively doing now. Been there, done that. (see below) > > (If you really have 15000 distinct rowtypes, I'd like to know what > your database design is...) Sorry, I should have included that info in the initial post. You're right in that most of these tables have a similar structure. But they are independent and can be customized by the users. Think of it this way: On the server there are 500 applications, and each has 30 tables. One of these might be a table which contains the products of a webshop, another contains news items which are displayed on the website etc. etc.. The problem is that the customers can freely change the tables ... add columns, remove columns, change column types etc.. So I cannot use system wide tables with a key column. Mike
hi michael >> Have you ANALYZEd your database? VACUUMing? > > Of course ... before 8.1 we routinely did a vacuum full analyze each > night. As of 8.1 we use autovacuum. what i noticed is autovacuum not working properly as it should. i had 8.1 running with autovacuum for just 2 days or so and got warnings in pgadmin that my tables would need an vacuum. i've posted this behaviour some weeks ago to the novice list requesting more infos on how to "tweak" autovacuum properly - unfortunately without any respones. thats when i switched the nightly analyze job back on - everything runs smooth since then. maybe it helps in your case as well? cheers, thomas
Michael Riess <mlriess@gmx.de> writes: >> On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: >>> we are currently running a postgres server (upgraded to 8.1) which >>> has one large database with approx. 15,000 tables. Unfortunately >>> performance suffers from that, because the internal tables >>> (especially that which holds the attribute info) get too large. >>> >>> (We NEED that many tables, please don't recommend to reduce them) >>> >> Have you ANALYZEd your database? VACUUMing? > > Of course ... before 8.1 we routinely did a vacuum full analyze each > night. As of 8.1 we use autovacuum. VACUUM FULL was probably always overkill, unless "always" includes versions prior to 7.3... >> BTW, are you using some kind of weird ERP? I have one that treat >> informix as a fool and don't let me get all of informix potential... >> maybe the same is in your case... > > No. Our database contains tables for we content management > systems. The server hosts approx. 500 cms applications, and each of > them has approx. 30 tables. > > That's why I'm asking if it was better to have 500 databases with 30 > tables each. In previous Postgres versions this led to even worse > performance ... This has the feeling of fitting with Alan Perlis' dictum below... Supposing you have 500 databases, each with 30 tables, each with 4 indices, then you'll find you have, on disk... # of files = 500 x 30 x 5 = 75000 files If each is regularly being accessed, that's bits of 75000 files getting shoved through OS and shared memory caches. Oh, yes, and you'll also have regular participation of some of the pg_catalog files, with ~500 instances of THOSE, multiplied some number of ways... An application with 15000 frequently accessed tables doesn't strike me as being something that can possibly turn out well. You have, in effect, more tables than (arguably) bloated ERP systems like SAP R/3; it only has a few thousand tables, and since many are module-specific, and nobody ever implements *all* the modules, it is likely only a few hundred that are "hot spots." No 15000 there... -- (format nil "~S@~S" "cbbrowne" "cbbrowne.com") http://www3.sympatico.ca/cbbrowne/languages.html It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. -- Alan J. Perlis
Hi Michael, I'm a fan of ReiserFS, and I can be wrong, but I believe using a journaling filesystem for the PgSQL database could be slowing things down. Gavin On Dec 1, 2005, at 6:51 AM, Michael Riess wrote: > Hi David, > >> with 15,000 tables you are talking about a LOT of files to hold >> these (30,000 files with one index each and each database being >> small enough to not need more then one file to hold it), on linux >> ext2/3 this many files in one directory will slow you down horribly. > > We use ReiserFS, and I don't think that this is causing the > problem ... although it would probably help to split the directory > up using tablespaces. > > But thanks for the suggestion! > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings Gavin M. Roy 800 Pound Gorilla gmr@ehpg.net
Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: > Hi Michael, > > I'm a fan of ReiserFS, and I can be wrong, but I believe using a > journaling filesystem for the PgSQL database could be slowing things > down. Have a 200G+ database, someone pulling the power plug or a regular reboot after a year or so. Wait for the fsck to finish. Now think again :-) ++Tino
Agreed. Also the odds of fs corruption or data loss are higher in a non journaling fs. Best practice seems to be to use a journaling fs but to put the fs log on dedicated spindles separate from the actual fs or pg_xlog. Ron At 01:40 PM 12/1/2005, Tino Wildenhain wrote: >Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: > > Hi Michael, > > > > I'm a fan of ReiserFS, and I can be wrong, but I believe using a > > journaling filesystem for the PgSQL database could be slowing things > > down. > >Have a 200G+ database, someone pulling the power plug >or a regular reboot after a year or so. > >Wait for the fsck to finish. > >Now think again :-) > >++Tino > > >---------------------------(end of broadcast)--------------------------- >TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq
Here's a fairly recent post on reiserfs (and performance): http://archives.postgresql.org/pgsql-novice/2005-09/msg00007.php I'm still digging on performance of ext2 vrs journaled filesystems, as I know I've seen it before. Gavin My point was not in doing an fsck, but rather in On Dec 1, 2005, at 10:40 AM, Tino Wildenhain wrote: > Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: >> Hi Michael, >> >> I'm a fan of ReiserFS, and I can be wrong, but I believe using a >> journaling filesystem for the PgSQL database could be slowing things >> down. > > Have a 200G+ database, someone pulling the power plug > or a regular reboot after a year or so. > > Wait for the fsck to finish. > > Now think again :-) > > ++Tino > Gavin M. Roy 800 Pound Gorilla gmr@ehpg.net
Ron <rjpeace@earthlink.net> writes: > Agreed. Also the odds of fs corruption or data loss are higher in a > non journaling fs. Best practice seems to be to use a journaling fs > but to put the fs log on dedicated spindles separate from the actual > fs or pg_xlog. I think we've determined that best practice is to journal metadata only (not file contents) on PG data filesystems. PG does expect the filesystem to remember where the files are, so you need metadata protection, but journalling file content updates is redundant with PG's own WAL logging. On a filesystem dedicated to WAL, you probably do not need any filesystem journalling at all --- we manage the WAL files in a way that avoids changing metadata for a WAL file that's in active use. A conservative approach would be to journal metadata here too, though. regards, tom lane
Heh looks like I left a trailing thought... My post wasn't saying don't use journaled filesystems, but rather that it can be slower than non-journaled filesystems, and I don't consider recovery time from a crash to be a factor in determining the speed of reads and writes on the data. That being said, I think Tom's reply on what to journal and not to journal should really put an end to this side of the conversation. Gavin On Dec 1, 2005, at 10:49 AM, Gavin M. Roy wrote: > Here's a fairly recent post on reiserfs (and performance): > > http://archives.postgresql.org/pgsql-novice/2005-09/msg00007.php > > I'm still digging on performance of ext2 vrs journaled filesystems, > as I know I've seen it before. > > Gavin > > > My point was not in doing an fsck, but rather in > On Dec 1, 2005, at 10:40 AM, Tino Wildenhain wrote: > >> Am Donnerstag, den 01.12.2005, 10:07 -0800 schrieb Gavin M. Roy: >>> Hi Michael, >>> >>> I'm a fan of ReiserFS, and I can be wrong, but I believe using a >>> journaling filesystem for the PgSQL database could be slowing things >>> down. >> >> Have a 200G+ database, someone pulling the power plug >> or a regular reboot after a year or so. >> >> Wait for the fsck to finish. >> >> Now think again :-) >> >> ++Tino >> > > Gavin M. Roy > 800 Pound Gorilla > gmr@ehpg.net > > > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings Gavin M. Roy 800 Pound Gorilla gmr@ehpg.net
> Michael Riess <mlriess@gmx.de> writes: >>> On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: >>>> we are currently running a postgres server (upgraded to 8.1) which >>>> has one large database with approx. 15,000 tables. Unfortunately >>>> performance suffers from that, because the internal tables >>>> (especially that which holds the attribute info) get too large. >>>> >>>> (We NEED that many tables, please don't recommend to reduce them) >>>> >>> Have you ANALYZEd your database? VACUUMing? >> Of course ... before 8.1 we routinely did a vacuum full analyze each >> night. As of 8.1 we use autovacuum. > > VACUUM FULL was probably always overkill, unless "always" includes > versions prior to 7.3... Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, but the database got considerably slower near the end of the week. > >>> BTW, are you using some kind of weird ERP? I have one that treat >>> informix as a fool and don't let me get all of informix potential... >>> maybe the same is in your case... >> No. Our database contains tables for we content management >> systems. The server hosts approx. 500 cms applications, and each of >> them has approx. 30 tables. >> >> That's why I'm asking if it was better to have 500 databases with 30 >> tables each. In previous Postgres versions this led to even worse >> performance ... > > This has the feeling of fitting with Alan Perlis' dictum below... > > Supposing you have 500 databases, each with 30 tables, each with 4 > indices, then you'll find you have, on disk... > > # of files = 500 x 30 x 5 = 75000 files > > If each is regularly being accessed, that's bits of 75000 files > getting shoved through OS and shared memory caches. Oh, yes, and > you'll also have regular participation of some of the pg_catalog > files, with ~500 instances of THOSE, multiplied some number of ways... > Not all of the tables are frequently accessed. In fact I would estimate that only 20% are actually used ... but there is no way to determine if or when a table will be used. I thought about a way to "swap out" tables which have not been used for a couple of days ... maybe I'll do just that. But it would be cumbersome ... I had hoped that an unused table does not hurt performance. But of course the internal tables which contain the meta info get too large. > An application with 15000 frequently accessed tables doesn't strike me > as being something that can possibly turn out well. You have, in > effect, more tables than (arguably) bloated ERP systems like SAP R/3; > it only has a few thousand tables, and since many are module-specific, > and nobody ever implements *all* the modules, it is likely only a few > hundred that are "hot spots." No 15000 there.. I think that my systems confirms with the 80/20 rule ... .
On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: > > Michael Riess <mlriess@gmx.de> writes: > >>> On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: > >>>> we are currently running a postgres server (upgraded to 8.1) which > >>>> has one large database with approx. 15,000 tables. Unfortunately > >>>> performance suffers from that, because the internal tables > >>>> (especially that which holds the attribute info) get too large. > >>>> > >>>> (We NEED that many tables, please don't recommend to reduce them) > >>>> > >>> Have you ANALYZEd your database? VACUUMing? > >> Of course ... before 8.1 we routinely did a vacuum full analyze each > >> night. As of 8.1 we use autovacuum. > > > > VACUUM FULL was probably always overkill, unless "always" includes > > versions prior to 7.3... > > Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, > but the database got considerably slower near the end of the week. > > > > >>> BTW, are you using some kind of weird ERP? I have one that treat > >>> informix as a fool and don't let me get all of informix potential... > >>> maybe the same is in your case... > >> No. Our database contains tables for we content management > >> systems. The server hosts approx. 500 cms applications, and each of > >> them has approx. 30 tables. > >> > >> That's why I'm asking if it was better to have 500 databases with 30 > >> tables each. In previous Postgres versions this led to even worse > >> performance ... > > > > This has the feeling of fitting with Alan Perlis' dictum below... > > > > Supposing you have 500 databases, each with 30 tables, each with 4 > > indices, then you'll find you have, on disk... > > > > # of files = 500 x 30 x 5 = 75000 files > > > > If each is regularly being accessed, that's bits of 75000 files > > getting shoved through OS and shared memory caches. Oh, yes, and > > you'll also have regular participation of some of the pg_catalog > > files, with ~500 instances of THOSE, multiplied some number of ways... > > > > Not all of the tables are frequently accessed. In fact I would estimate > that only 20% are actually used ... but there is no way to determine if > or when a table will be used. I thought about a way to "swap out" tables > which have not been used for a couple of days ... maybe I'll do just > that. But it would be cumbersome ... I had hoped that an unused table > does not hurt performance. But of course the internal tables which > contain the meta info get too large. > > > An application with 15000 frequently accessed tables doesn't strike me > > as being something that can possibly turn out well. You have, in > > effect, more tables than (arguably) bloated ERP systems like SAP R/3; > > it only has a few thousand tables, and since many are module-specific, > > and nobody ever implements *all* the modules, it is likely only a few > > hundred that are "hot spots." No 15000 there.. > > I think that my systems confirms with the 80/20 rule ... > . > How many disks do you have i imagine you can put tables forming one logical database in a tablespace and have tables spread on various disks... -- Atentamente, Jaime Casanova (DBA: DataBase Aniquilator ;)
> we are currently running a postgres server (upgraded to 8.1) which has > one large database with approx. 15,000 tables. Unfortunately performance > suffers from that, because the internal tables (especially that which > holds the attribute info) get too large. > > (We NEED that many tables, please don't recommend to reduce them) > > Logically these tables could be grouped into 500 databases. My question > is: > > Would performance be better if I had 500 databases (on one postgres > server instance) which each contain 30 tables, or is it better to have > one large database with 15,000 tables? In the old days of postgres 6.5 > we tried that, but performance was horrible with many databases ... > > BTW: I searched the mailing list, but found nothing on the subject - and > there also isn't any information in the documentation about the effects > of the number of databases, tables or attributes on the performance. > > Now, what do you say? Thanks in advance for any comment! I've never run near that many databases on one box so I can't comment on the performance. But let's assume for the moment pg runs fine with 500 databases. The most important advantage of multi-schema approach is cross schema querying. I think as you are defining your problem this is a better way to do things. Merlin
On Thu, 2005-12-01 at 13:34, Michael Riess wrote: > > Michael Riess <mlriess@gmx.de> writes: > >>> On 12/1/05, Michael Riess <mlriess@gmx.de> wrote: > >>>> we are currently running a postgres server (upgraded to 8.1) which > >>>> has one large database with approx. 15,000 tables. Unfortunately > >>>> performance suffers from that, because the internal tables > >>>> (especially that which holds the attribute info) get too large. > >>>> > >>>> (We NEED that many tables, please don't recommend to reduce them) > >>>> > >>> Have you ANALYZEd your database? VACUUMing? > >> Of course ... before 8.1 we routinely did a vacuum full analyze each > >> night. As of 8.1 we use autovacuum. > > > > VACUUM FULL was probably always overkill, unless "always" includes > > versions prior to 7.3... > > Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, > but the database got considerably slower near the end of the week. Generally, this means either your vacuums are too infrequent, or your fsm settings are too small. Note that vacuum and analyze aren't "married" any more, like in the old days. You can issue either separately, depending on your usage conditions. Note that with the newest versions of PostgreSQL you can change the settings for vacuum priority so that while it takes longer to vacuum, it doesn't stomp on the other processes toes so much anymore, so more frequent plain vacuums may be the answer.
me@alternize.com wrote: > what i noticed is autovacuum not working properly as it should. i had 8.1 > running with autovacuum for just 2 days or so and got warnings in pgadmin > that my tables would need an vacuum. Hum, so how is autovacuum's documentation lacking? Please read it critically and let us know so we can improve it. http://www.postgresql.org/docs/8.1/static/maintenance.html#AUTOVACUUM Maybe what you need is to lower the "vacuum base threshold" for tables that are small. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Agreed, and I apologize for the imprecision of my post below. I should have written: "Best practice seems to be to use a journaling fs and log metadata only and put it on separate dedicated spindles." I've seen enough HD failures that I tend to be paranoid and log the metadata of fs dedicated to WAL as well, but that may very well be overkill. Ron At 01:57 PM 12/1/2005, Tom Lane wrote: >Ron <rjpeace@earthlink.net> writes: > > Agreed. Also the odds of fs corruption or data loss are higher in a > > non journaling fs. Best practice seems to be to use a journaling fs > > but to put the fs log on dedicated spindles separate from the actual > > fs or pg_xlog. > >I think we've determined that best practice is to journal metadata only >(not file contents) on PG data filesystems. PG does expect the filesystem >to remember where the files are, so you need metadata protection, but >journalling file content updates is redundant with PG's own WAL logging. > >On a filesystem dedicated to WAL, you probably do not need any >filesystem journalling at all --- we manage the WAL files in a way >that avoids changing metadata for a WAL file that's in active use. >A conservative approach would be to journal metadata here too, though. > > regards, tom lane
On Fri, Dec 02, 2005 at 03:15:00AM -0500, Ron wrote: >I've seen enough HD failures that I tend to be paranoid and log the >metadata of fs dedicated to WAL as well, but that may very well be overkill. Especially since it wouldn't gain anything. Journalling doesn't give you any advantage whatsoever in the face of a HD failure. Mike Stone
On 1 Dec 2005, at 16:03, Tom Lane wrote: > Michael Riess <mlriess@gmx.de> writes: >> (We NEED that many tables, please don't recommend to reduce them) > > No, you don't. Add an additional key column to fold together > different > tables of the same structure. This will be much more efficient than > managing that key at the filesystem level, which is what you're > effectively doing now. > > (If you really have 15000 distinct rowtypes, I'd like to know what > your database design is...) > Won't you end up with awful seek times if you just want data which previously been stored in a single table? E.g. whilst before you wanted 1000 contiguous rows from the table, now you want 1000 rows which now have 1000 rows you don't care about in between each one you do want.
On 2 Dec 2005, at 14:16, Alex Stapleton wrote: > > On 1 Dec 2005, at 16:03, Tom Lane wrote: > >> Michael Riess <mlriess@gmx.de> writes: >>> (We NEED that many tables, please don't recommend to reduce them) >> >> No, you don't. Add an additional key column to fold together >> different >> tables of the same structure. This will be much more efficient than >> managing that key at the filesystem level, which is what you're >> effectively doing now. >> >> (If you really have 15000 distinct rowtypes, I'd like to know what >> your database design is...) >> > > Won't you end up with awful seek times if you just want data which > previously been stored in a single table? E.g. whilst before you > wanted 1000 contiguous rows from the table, now you want 1000 rows > which now have 1000 rows you don't care about in between each one > you do want. > I must of had a total and utter failure of intellect for a moment there. Please ignore that :P
On 12/1/2005 2:34 PM, Michael Riess wrote: >> VACUUM FULL was probably always overkill, unless "always" includes >> versions prior to 7.3... > > Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, > but the database got considerably slower near the end of the week. This indicates that you have FSM settings that are inadequate for that many tables and eventually the overall size of your database. Try setting those to max_fsm_relations = 80000 max_fsm_pages = (select sum(relpages) / 2 from pg_class) Another thing you might be suffering from (depending on the rest of your architecture) is file descriptor limits. Especially if you use some sort of connection pooling or persistent connections like PHP, you will have all the backends serving multiple of your logical applications (sets of 30 tables). If on average one backend is called for 50 different apps, then we are talking 50*30*4=6000 files accessed by that backend. 80/20 rule leaves 1200 files in access per backend, thus 100 active backends lead to 120,000 open (virtual) file descriptors. Now add to that any files that a backend would have to open in order to evict an arbitrary dirty block. With a large shared buffer pool and little more aggressive background writer settings, you can avoid mostly that regular backends would have to evict dirty blocks. If the kernel settings allow Postgres to keep that many file descriptors open, you avoid directory lookups. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
On Thu, Dec 01, 2005 at 08:34:43PM +0100, Michael Riess wrote: > Well, we tried switching to daily VACUUM ANALYZE and weekly VACUUM FULL, > but the database got considerably slower near the end of the week. If you have your FSM configured correctly and you are vacuuming tables often enough for your turnover, than in regular operation you should _never_ need VACUUM FULL. So it sounds like your first problem is that. With the 15000 tables you were talking about, though, that doesn't surprise me. Are you sure more back ends wouldn't be a better answer, if you're really wedded to this design? (I have a feeling that something along the lines of what Tom Lane said would be a better answer -- I think you need to be more clever, because I don't think this will ever work well, on any system.) A -- Andrew Sullivan | ajs@crankycanuck.ca This work was visionary and imaginative, and goes to show that visionary and imaginative work need not end up well. --Dennis Ritchie
Michael Riess writes: > Sorry, I should have included that info in the initial post. You're > right in that most of these tables have a similar structure. But they > are independent and can be customized by the users. > How about creating 50 databases and give each it's own tablespace? It's not only whether PostgreSQL can be optimized, but also how well your filesystem is handling the directory with large number of files. by splitting the directories you will likely help the OS and will be able to perhaps better determine if the OS or the DB is at fault for the slowness.