Re: Large # of Tables, Getting ready for the enterprise - Mailing list pgsql-hackers
From | Ross J. Reedstrom |
---|---|
Subject | Re: Large # of Tables, Getting ready for the enterprise |
Date | |
Msg-id | 20000817103547.C17023@rice.edu Whole thread Raw |
In response to | Large # of Tables, Getting ready for the enterprise ("carl garland" <carlhgarland@hotmail.com>) |
List | pgsql-hackers |
On Thu, Aug 17, 2000 at 06:17:01AM -0400, carl garland wrote: > As postgres becomes better and more in the spotlight there are a couple of > issues I think that the hacker group might want to address to better prepare > it for the enterprise and highend production systems. Currently postgres > will support an incredible amount of tables whereas Interbase only supports > 64K, but the efficiency and performance of the pg backend quickly > degenerates after 1000 tables. I know that most people will think that > filesystem will be the bottleneck but as XFS nears completion the problem Realize that pg is a cross platform program. The existance (or not) of a particular filesystem cannot be assumed, nor does the release of a new FS on linux, for example, impact the entire postgresql community. > will shift back to pg. It is my understanding that the system tables where > lookups on tables occur are always done sequentially and not using any more > optimized (btree etc) solution. I also think this may be applicable to the Nope. The pg_class table has indicices on oid and relname, which are used when appropriate. In addition, there is a cache for system tables (syscache). > toastable objects where large # of objects occur. I want to start to look > at the code to maybe help out but have a few questions: > 1) When referencing a table is it only looked up once and then cached or > does a scan of the system table occur only once per session. This would be the syscache, for the system tables, or the relcache, which caches relation (i.e. table) descriptors for the general case. There's been some discussion and work regarding cleaning up the seperation between accessing relations and storing them in particular files. At one point, Tom Lane mentioned that he was wondering if the knowledge of the relcache needs to be move out of the bufmgr/smgr interfaces (that's the buffer manager and storage manager, respectively) as part of that cleanup. > 2) Which files should I look at in tree. Hmm, an awful lot of them. find . -name \*.[chyl] |xargs grep -l syscache | wc -l 98 find . -name \*.[chyl] |xargs grep -l relcache | wc -l 43 And that's not nearly exhaustive. > 3) Any tips, suggestions, pitfalls I should remember. Take a look at: http://postgresql.org/docs/faq-dev-english.html and file:/where/ever/you/put/pgsql/src/tools/backend/index.html Very useful for getting the broad overview and links into your local filetree, with descriptions. This is a very complex bit of code, right at the heart of the database. I've found it a might steep learning curve. Perhaps not the best place to start for a first backend coding project. But dig in: for me, there's no better way to learn than breaking code, and fixing it. The core developers are good at not letting stupid ideas get commited to CVS. One thing not mentioned above: memory management is quite complex, as well, and underlies all the other code (of course). Take a look at the mmgr code. In fact, if you've got any skill at code analysis and documentation, further diagramming and describing of the interrelationships between the different caches and managers (syscache, relcache, bufcache, bufmgr, smgr, mmgr etc.) would be a welcome addition to the work Bruce has done with the developers FAQ, above. Ross -- Ross J. Reedstrom, Ph.D., <reedstrm@rice.edu> NSBRI Research Scientist/Programmer Computer and Information Technology Institute Rice University, 6100 S. Main St., Houston, TX 77005
pgsql-hackers by date: