Re: Reindex taking forever, and 99% CPU - Mailing list pgsql-general
From | Kevin Grittner |
---|---|
Subject | Re: Reindex taking forever, and 99% CPU |
Date | |
Msg-id | 1407417059.87618.YahooMailNeo@web122304.mail.ne1.yahoo.com Whole thread Raw |
In response to | Re: Reindex taking forever, and 99% CPU (Phoenix Kiula <phoenix.kiula@gmail.com>) |
List | pgsql-general |
Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > We spent some time to do some massive cleaning of the data from > this table. Brought it down to around 630 million rows. Overall > size of the table including indexes is about 120GB anyway. Deleting rows that you don't need is good, and once a vacuum has a chance to run (to completion) against the table it should help with performance, but unless there is a lot of free space right at the end of the table, it won't release any disk space back to the OS -- it will be tracked as free space within the table, and reused for future inserts and updates. This is generally a good thing, because it is faster to use space already allocated to the table than to request that new space is added to the table before using it. > modify_date | timestamp without time zone | default now() Sooner or later you will realize that this should have been timestamp with time zone, but that will be a different discussion. > Rules: > __track_bigtb_deleted AS > ON DELETE TO bigtb It is usually safer to create triggers rather than rules. > Suppose we might have to explore partitioning, which would > probably be via first letter of the alias? This would lead to > around 26 + 9 = 35 sub-tables. Is this too many? No; I wouldn't worry about less than about 100. As has already been mentioned, though, this machine is very underpowered for what you seem to want to do with it. Don't expect miracles. In particular, this is not likely to make most queries any faster, but will help a lot with maintenance operations, like vacuuming and indexing. > max_connections = 180 > temp_buffers = 32MB > work_mem = 64MB I just want to make sure you realize that temp_buffers is how much RAM *each connection* is allowed to reserve indefinitely for caching temporary tables. So if all 180 allowed connections were in use, and they had all used temporary tables of significant size, then *even when all connections are idle* they would have 5.76GB of RAM reserved exclusively for caching temp tables. On a machine with 4GB RAM that would probably cause things to crash. Also, work_mem is questionable. This is not limited to one per connection; there can be one allocation of that size for each plan node of an active query which needs working memory (sorts, hash maps, etc.). So one connection can be using a number of these at one time, although only when a query is active. Because one connection may be using many, while others are using none, it is often a good idea to start from the assumption that it should be sized on the assumption of one allocation per connection. 64MB * 180 = 11.52GB. This is in addition to the 5.76GB you allow for temp_buffers. It is no wonder you are seeing crashes -- you have configured the database so that it is allowed to use 4x the machine's RAM just for these two things! In my experience, a good starting point for work_mem is 25% of machine RAM / max_connections. You can adjust from there based on workload. That suggests 5.5MB would be about right on your machine. I would probably set temp_buffers = 2MB or maybe 3MB. > enable_indexscan = on These should all be on in the config file, always. (That is the default if the entries are commented out, of course.) The enable_* settings are mostly intended for diagnostic purposes, although in extreme cases people have been known to disable a specific setting just for the duration of a specific query; there is usually a better solution than that, however. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-general by date: