Some ideas for comment - Mailing list pgsql-performance
From | Chris Hoover |
---|---|
Subject | Some ideas for comment |
Date | |
Msg-id | 1d219a6f05082409563775a8a7@mail.gmail.com Whole thread Raw |
Responses |
Re: Some ideas for comment
|
List | pgsql-performance |
Ok, there is always a lot of talk about tuning PostgreSQL on linux and how PostgreSQL uses the linux kernel cache to cache the tables and indexes. My question is, is there anyway to see what files linux is caching at this moment? My reasoning behind this question is: I have several database systems each with 1 PostgreSQL cluster. However, each cluster has a large number of identical databases on it. Since there can be a great amount of size disparity between the databases, I am wondering if some of slowness we see might be caused by the kernel cache having to reload a lot of data that keeps getting swapped out. (most systems have at least 100GB of data/indexes on them with 8 or 12GB ram). If this is the case, what sort of symptoms would you expect? To help mitigate this potential, I have been researching the following, and am thinking of proposing it to management. Any comments would be appreciated. 1. Implement a partition type layout using views and rules - This will allow me to have one table in each view with the "active" data, and the inactive data stored by year in other tables. So I would have the following (for each major table): Table View as select * from active_table union all select * from table_2005 union all select * from table_2004 etc. Each table would have identical indexes, however only the "active_table" would have data that is actively being worked. The rules and a nightly job can keep the data correctly stored. I am thinking that with this setup, the active table indexes should almost always be in memory. And, if they do happen to be pushed out, they are much smaller than the indexes I have today (where all data is in one table), so they should load faster with less i/o pressure. From the testing I have done so far, I believe I can implement this system with out having to ask for developer time. This is a "Good Thing". Also, the database is not normalized and is very ugly, by using the view to partition and abstract the actual data, I will be in a better position to start normalizing some of the tables w/o developer time (once again, a "Good Thing") 2. I am also thinking of recommending we collapse all databases in a cluster into one "mega" database. I can then use schema's and views to control database access and ensure that no customer can see another customers data. This would mean that there are only one set of indexes being loaded into the cache. While they would be larger, I think in combination with the partition from idea 1, we would be ahead of the ball game. Since there would only be one set of indexes, everyone would be sharing them so they should always be in memory. I don't have real numbers to give you, but we know that our systems are hurting i/o wise and we are growing by about 2GB+ per week (net). We actually grow by about 5GB/week/server. However, when I run my weekly maintenance of vacuum full, reindex, and the vacuum analyze, we end up getting about 3GB back. Unfortunately, I do not have the i/o bandwidth to vacuum during the day as it causes major slowdowns on our system. Each night, I do run a vacuum analyze across all db's to try and help. I also have my fsm parameters set high (8000000 fsm pages, and 5000 fsm relations) to try and compensate. I believe this is only hurting us as any queries that choose to tablescan are only getting slower and slower. Also, obviously, our indexes are continually growing. The partitioning should help as the actual number of records being worked on each table is a very small percentage ( a site may have 1 million records, but only load and work a few thousand each day). The archive tables would be doing the most growing while the active tables should stay small. Most of the queries that are tablescanning can not be fixed as the database clusters have been initialized with a non-C locale and won't use indexes on our queries that are using like with a wild card. Right now, we are still on 7.3.4. However, these ideas would be implemented as part of an upgrade to 8.x (plus, we'll initialize the new clusters with a C locale). Anyway, I hope this makes since, and any comments, ideas, and/or suggestions would be appreciated. Thanks, Chris
pgsql-performance by date: