Re: create index concurrently - duplicate index to reduce time without an index - Mailing list pgsql-general
From | Alban Hertroys |
---|---|
Subject | Re: create index concurrently - duplicate index to reduce time without an index |
Date | |
Msg-id | A1D7243C-E3CA-47C1-93C8-A7CF1AF3EF5E@solfertje.student.utwente.nl Whole thread Raw |
In response to | Re: create index concurrently - duplicate index to reduce time without an index (<Gareth.Williams@csiro.au>) |
List | pgsql-general |
> Thanks Greg, Alban and others, > > This has cleared up a misunderstanding I had about why one should reindex. Re-reading the documentation http://www.postgresql.org/docs/8.4/interactive/sql-reindex.htmlit is clear now that reindex or recreating and index shouldnot normally be needed - certainly not to keep an index up-to-date. I would have guessed that VACUUM or VACUUM ANALYSEon the table that the index is associated would have been sufficient to reclaim space for a 'bloated' index (maybeonly VACUUM FULL would help). In any case we can leave reindexing or full vacuum for outages where we are interruptingservice anyway. VACUUM FULL actually causes bloat to indexes. It rearranges the data in the tables so that any gaps get used, but while doingthat it also needs to update the indices related to those tables. Normal VACUUM and VACUUM ANALYSE don't have this problem though, they just mark table space that's no longer in use (transactionthat deleted rows has committed them) as reusable, so that later INSERTs can put their data in there. This isa bit of a simplification of what's really going on - for the details check the documentation. Autovacuum does VACUUM ANALYSE in the background, using multiple threads in recent versions. You can (and seeing your useof the database you probably should) tune how often it vacuums tables through several configuration parameters. Of course, running ANALYSE when you _know_ data in a table has changed significantly means you don't have to wait for autovacto get around to analysing that table. > I was heartened by the responses and tried further testing (if it could not hurt, why not try and see if it could be faster),but ran into a problem. A few times when I was trying to drop an index (before or after creating a duplicate indexwith 'concurrently'), the dropping of the index stalled. It seems that this was because of existing connection: > postgres: rods ICAT 130.102.163.141(58061) idle in transaction > And new clients block. > > Is that expected? Should we look for 'bugs' in out client that seems to be holding a connection? I'm not exactly sure why that is (I can't look into your database), but my guess is that the index is locked by a transaction.Apparently the transaction you refer to has uncommitted work that depends on the index at some point. Keeping transactions open for a long time is usually a bad idea. You saw that you can't drop an index in use by a transaction for example, but autovacuum is running into similar issues -it can't reclaim space until the transaction finishes as the transaction locks things that autovacuum will want to touch. That probably means (I'm not sure it works that way, but it seems likely) that that autovacuum thread gets stuck at a lockand can't continue until the transaction holding the lock frees it. > For the application we have, I'm ready to give up on this train of investigation for optimization and just vacuum analysekey tables regularly and vaccuum and maybe reindex more completely during outages - though we aim for outages to beinfrequent. The database holds data representing a virtual filesystem structure with millions of file (and associatedaccess controls, and information on underlying storage resources and replication). There is probably not muchupdate or delete of the main data - at least compared with the total holdings and the new data/files which are regularlybeing added to the system. In practice VACUUM FULL and REINDEX are used to reclaim disk space. That of itself doesn't look much like it'd improve performance,but using less disk space also means that data gets more tightly packed in your disk cache, for example. REINDEXcan mean an index that didn't fit into RAM now does. They're both rather intrusive operations though, so it's a matterof balancing the costs and benefits. Many databases don't need to bother with VACUUM FULL or REINDEX. > Ps. Greg, I don't understand the issue with 'concurrently rebuild (of) an index that enforces a constraint or unique index'. I don't think I care much right at the moment, but I'm generally interested and others might be too.. Would you expectthe create index to fail or to cause locking or just transient performance degradation? I think what Greg was getting at is that there's a dependency tree between indexes and constraints: A primary key is implementedusing a unique index. You can create a new (unique) index on the same columns concurrently, but you can't replacethe primary key index with it as you're not allowed to drop the index without dropping the PK constraint. If you haveany FK constraints pointing to that table, you can't drop the PK constraint without also dropping the FK constraints. Quite a bit of trouble to go through to replace one index. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4c08c88410157954111193!
pgsql-general by date: