Thread: Some ideas for comment
Ok, there is always a lot of talk about tuning PostgreSQL on linux and how PostgreSQL uses the linux kernel cache to cache the tables and indexes. My question is, is there anyway to see what files linux is caching at this moment? My reasoning behind this question is: I have several database systems each with 1 PostgreSQL cluster. However, each cluster has a large number of identical databases on it. Since there can be a great amount of size disparity between the databases, I am wondering if some of slowness we see might be caused by the kernel cache having to reload a lot of data that keeps getting swapped out. (most systems have at least 100GB of data/indexes on them with 8 or 12GB ram). If this is the case, what sort of symptoms would you expect? To help mitigate this potential, I have been researching the following, and am thinking of proposing it to management. Any comments would be appreciated. 1. Implement a partition type layout using views and rules - This will allow me to have one table in each view with the "active" data, and the inactive data stored by year in other tables. So I would have the following (for each major table): Table View as select * from active_table union all select * from table_2005 union all select * from table_2004 etc. Each table would have identical indexes, however only the "active_table" would have data that is actively being worked. The rules and a nightly job can keep the data correctly stored. I am thinking that with this setup, the active table indexes should almost always be in memory. And, if they do happen to be pushed out, they are much smaller than the indexes I have today (where all data is in one table), so they should load faster with less i/o pressure. From the testing I have done so far, I believe I can implement this system with out having to ask for developer time. This is a "Good Thing". Also, the database is not normalized and is very ugly, by using the view to partition and abstract the actual data, I will be in a better position to start normalizing some of the tables w/o developer time (once again, a "Good Thing") 2. I am also thinking of recommending we collapse all databases in a cluster into one "mega" database. I can then use schema's and views to control database access and ensure that no customer can see another customers data. This would mean that there are only one set of indexes being loaded into the cache. While they would be larger, I think in combination with the partition from idea 1, we would be ahead of the ball game. Since there would only be one set of indexes, everyone would be sharing them so they should always be in memory. I don't have real numbers to give you, but we know that our systems are hurting i/o wise and we are growing by about 2GB+ per week (net). We actually grow by about 5GB/week/server. However, when I run my weekly maintenance of vacuum full, reindex, and the vacuum analyze, we end up getting about 3GB back. Unfortunately, I do not have the i/o bandwidth to vacuum during the day as it causes major slowdowns on our system. Each night, I do run a vacuum analyze across all db's to try and help. I also have my fsm parameters set high (8000000 fsm pages, and 5000 fsm relations) to try and compensate. I believe this is only hurting us as any queries that choose to tablescan are only getting slower and slower. Also, obviously, our indexes are continually growing. The partitioning should help as the actual number of records being worked on each table is a very small percentage ( a site may have 1 million records, but only load and work a few thousand each day). The archive tables would be doing the most growing while the active tables should stay small. Most of the queries that are tablescanning can not be fixed as the database clusters have been initialized with a non-C locale and won't use indexes on our queries that are using like with a wild card. Right now, we are still on 7.3.4. However, these ideas would be implemented as part of an upgrade to 8.x (plus, we'll initialize the new clusters with a C locale). Anyway, I hope this makes since, and any comments, ideas, and/or suggestions would be appreciated. Thanks, Chris
> Ok, there is always a lot of talk about tuning PostgreSQL on linux and > how PostgreSQL uses the linux kernel cache to cache the tables and > indexes. [...] > > 1. Implement a partition type layout using views and rules - This > will allow me to have one table in each view with the "active" data, > and the inactive data stored by year in other tables. > > So I would have the following (for each major table): > > Table View as > select * from active_table > union all > select * from table_2005 > union all > select * from table_2004 > etc. Linux does a pretty good job of deciding what to cache. I don't think this will help much. You can always look at partial indexes too. > 2. I am also thinking of recommending we collapse all databases in a > cluster into one "mega" database. I can then use schema's and views > to control database access and ensure that no customer can see another > customers data. hm. keep in mind views are tightly bound to the tables they are created with (views can't 'float' over tables in different schemas). pl/pgsql functions can, though. This is a more efficient use of server resources, IMO, but not a windfall. > This would mean that there are only one set of indexes being loaded > into the cache. While they would be larger, I think in combination > with the partition from idea 1, we would be ahead of the ball game. > Since there would only be one set of indexes, everyone would be > sharing them so they should always be in memory. I would strongly consider adding more memory :). > I don't have real numbers to give you, but we know that our systems > are hurting i/o wise and we are growing by about 2GB+ per week (net). > We actually grow by about 5GB/week/server. However, when I run my > weekly maintenance of vacuum full, reindex, and the vacuum analyze, we > end up getting about 3GB back. Unfortunately, I do not have the i/o > bandwidth to vacuum during the day as it causes major slowdowns on our > system. Each night, I do run a vacuum analyze across all db's to try > and help. I also have my fsm parameters set high (8000000 fsm pages, > and 5000 fsm relations) to try and compensate. Generally, you can reduce data turnover for the same workload by normalizing your database. IOW, try and make your database more efficient in the way it stores data. > Right now, we are still on 7.3.4. However, these ideas would be > implemented as part of an upgrade to 8.x (plus, we'll initialize the > new clusters with a C locale). yes, do this! Merlin
On Wed, Aug 24, 2005 at 12:56:54PM -0400, Chris Hoover wrote: > I don't have real numbers to give you, but we know that our systems > are hurting i/o wise and we are growing by about 2GB+ per week (net). > We actually grow by about 5GB/week/server. However, when I run my > weekly maintenance of vacuum full, reindex, and the vacuum analyze, we > end up getting about 3GB back. Unfortunately, I do not have the i/o > bandwidth to vacuum during the day as it causes major slowdowns on our > system. Each night, I do run a vacuum analyze across all db's to try > and help. I also have my fsm parameters set high (8000000 fsm pages, > and 5000 fsm relations) to try and compensate. [...] > Right now, we are still on 7.3.4. However, these ideas would be > implemented as part of an upgrade to 8.x (plus, we'll initialize the > new clusters with a C locale). If you were on a newer version, I'd suggest that you use the cost-based vacuum delay, and vacuum at least some of the tables more often. This way you can reduce the continual growth of the data files without affecting day-to-day performance, because you allow the VACUUM-inflicted I/O to be interleaved by normal query execution. Sadly (for you), I think the cost-based vacuum delay feature was only introduced in 8.0. -- Alvaro Herrera (<alvherre[a]alvh.no-ip.org>) Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
"Merlin Moncure" <merlin.moncure@rcsonline.com> writes: >> Right now, we are still on 7.3.4. However, these ideas would be >> implemented as part of an upgrade to 8.x (plus, we'll initialize the >> new clusters with a C locale). > yes, do this! Moving from 7.3 to 8.0 is alone likely to give you a noticeable performance boost. regards, tom lane
On 8/24/05, Merlin Moncure <merlin.moncure@rcsonline.com> wrote: > > Ok, there is always a lot of talk about tuning PostgreSQL on linux and > > how PostgreSQL uses the linux kernel cache to cache the tables and > > indexes. > [...] > > > > 1. Implement a partition type layout using views and rules - This > > will allow me to have one table in each view with the "active" data, > > and the inactive data stored by year in other tables. > > > > So I would have the following (for each major table): > > > > Table View as > > select * from active_table > > union all > > select * from table_2005 > > union all > > select * from table_2004 > > etc. > > Linux does a pretty good job of deciding what to cache. I don't think > this will help much. You can always look at partial indexes too. > Yes, but won't this help create the need to store less? If I have 1,000.000 rows in a table, but only 4,000 are active, if I move those 4 to another table and link the tables via a view, should that not help keep the 9,996,000 rows out of the kernel cache (the majority of the time at least)? This would mean I have more room for other objects and hopefully less turn over in the cache, and less disk i/o. Yes? [...] > I would strongly consider adding more memory :). Unfortunately, it looks like 12GB is all our Dell servers can handle. :( > > > I don't have real numbers to give you, but we know that our systems > > are hurting i/o wise and we are growing by about 2GB+ per week (net). > > We actually grow by about 5GB/week/server. However, when I run my > > weekly maintenance of vacuum full, reindex, and the vacuum analyze, we > > end up getting about 3GB back. Unfortunately, I do not have the i/o > > bandwidth to vacuum during the day as it causes major slowdowns on our > > system. Each night, I do run a vacuum analyze across all db's to try > > and help. I also have my fsm parameters set high (8000000 fsm pages, > > and 5000 fsm relations) to try and compensate. > > Generally, you can reduce data turnover for the same workload by > normalizing your database. IOW, try and make your database more > efficient in the way it stores data. > That's the ultimate goal, but this database structure was developed and released into production before I started work here. I'm trying to slowly change it into a better db, but it is a slow process. Normalization does not make it at the top of the priority list, unfortunately. > > Right now, we are still on 7.3.4. However, these ideas would be > > implemented as part of an upgrade to 8.x (plus, we'll initialize the > > new clusters with a C locale). > > > 2. I am also thinking of recommending we collapse all databases in a > > cluster into one "mega" database. I can then use schema's and views > > to control database access and ensure that no customer can see another > > customers data. > > hm. keep in mind views are tightly bound to the tables they are created > with (views can't 'float' over tables in different schemas). pl/pgsql > functions can, though. This is a more efficient use of server > resources, IMO, but not a windfall. This I know. Each schema would have to have a "custom" set of views replacing the tables with the view programmed to only return that customers data. I was thinking all of the tables in schema my_tables and the views all querying the tables stored in the my_tables schema. I would add an identifying column to each table so that I can differentiate the data. Chris
--On Mittwoch, August 24, 2005 16:26:40 -0400 Chris Hoover <revoohc@gmail.com> wrote: > On 8/24/05, Merlin Moncure <merlin.moncure@rcsonline.com> wrote: >> Linux does a pretty good job of deciding what to cache. I don't think >> this will help much. You can always look at partial indexes too. >> > Yes, but won't this help create the need to store less? If I have > 1,000.000 rows in a table, but only 4,000 are active, if I move those > 4 to another table and link the tables via a view, should that not > help keep the 9,996,000 rows out of the kernel cache (the majority of > the time at least)? The kernel caches per page, not per file. It is likely linux only caches those pages which contain active rows, as long as no statement does a seq-scan on that table. To optimize the thing, you could consider to cluster by some index which sorts by the "activity" of the rows first. That way pages with active rows are likely to contain more than only 1 active row and so the cache is utilized better. Cluster is rather slow however and tables need to be reclustered from time to time. Mit freundlichem Gruß Jens Schicke -- Jens Schicke j.schicke@asco.de asco GmbH http://www.asco.de Mittelweg 7 Tel 0531/3906-127 38106 Braunschweig Fax 0531/3906-400