Thread: Best practice when reindexing in production
Hi, I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach wouldyou suggest that I take on this? Regards Niels Kristian
On Wed, May 29, 2013 at 8:24 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > Hi, > > I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach wouldyou suggest that I take on this? If you have the diskspaec, it's generally a good idea to do a CREATE INDEX CONCURRENTLY, and then rename the new one into place (typically in a transaction). (If your app, documentation or dba doesn't mind the index changing names, you don't need to rename of course, you can just drop the old one). -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wed, May 29, 2013 at 2:26 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, May 29, 2013 at 8:24 AM, Niels Kristian SchjødtIf you have the diskspaec, it's generally a good idea to do a CREATE
<nielskristian@autouncle.com> wrote:
> Hi,
>
> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this?
INDEX CONCURRENTLY, and then rename the new one into place (typically
in a transaction). (If your app, documentation or dba doesn't mind the
index changing names, you don't need to rename of course, you can just
drop the old one).
If you wish to recluster it online you can also look into pg_repack - https://github.com/reorg/pg_repack Great tool allows you to repack and reindex your database without going offline.
I looked at pg_repack - however - is it "safe" for production?
It seems very intrusive and black-box-like to me...
Den 29/05/2013 kl. 14.30 skrev Armand du Plessis <adp@bank.io>:
On Wed, May 29, 2013 at 2:26 PM, Magnus Hagander <magnus@hagander.net> wrote:On Wed, May 29, 2013 at 8:24 AM, Niels Kristian SchjødtIf you have the diskspaec, it's generally a good idea to do a CREATE
<nielskristian@autouncle.com> wrote:
> Hi,
>
> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this?
INDEX CONCURRENTLY, and then rename the new one into place (typically
in a transaction). (If your app, documentation or dba doesn't mind the
index changing names, you don't need to rename of course, you can just
drop the old one).If you wish to recluster it online you can also look into pg_repack - https://github.com/reorg/pg_repack Great tool allows you to repack and reindex your database without going offline.
Thanks Can you think of a way to select all the indexes programmatically from a table and run CREATE INDEX CONCURRENTLY for eachof them, so that I don't have to hardcode every index name + create statement ? Den 29/05/2013 kl. 14.26 skrev Magnus Hagander <magnus@hagander.net>: > On Wed, May 29, 2013 at 8:24 AM, Niels Kristian Schjødt > <nielskristian@autouncle.com> wrote: >> Hi, >> >> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach wouldyou suggest that I take on this? > > If you have the diskspaec, it's generally a good idea to do a CREATE > INDEX CONCURRENTLY, and then rename the new one into place (typically > in a transaction). (If your app, documentation or dba doesn't mind the > index changing names, you don't need to rename of course, you can just > drop the old one). > > > -- > Magnus Hagander > Me: http://www.hagander.net/ > Work: http://www.redpill-linpro.com/
On Wed, May 29, 2013 at 8:41 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote: > Thanks > > Can you think of a way to select all the indexes programmatically from a table and run CREATE INDEX CONCURRENTLY for eachof them, so that I don't have to hardcode every index name + create statement ? You can use something like SELECT pg_get_indexdef(indexrelid) FROM pg_index. You will need to filter it not to include system indexes, toast, etc, and then insert the CONCURRENCY part, but it should give you a good startingpoint. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
On Wed, May 29, 2013 at 9:41 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:
Thanks
Can you think of a way to select all the indexes programmatically from a table and run CREATE INDEX CONCURRENTLY for each of them, so that I don't have to hardcode every index name + create statement ?
You could do something like this (which considers you use simple names for your indexes, where simple ~ [a-z_][a-z0-9_]*):
SELECT
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';
Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that are constraints and index that has FK depending on it. For the first case, you only need to change the constraint to use the index and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate the FK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and perhaps faster.
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
You could do something like this (which considers you use simple names for your indexes, where simple ~ [a-z_][a-z0-9_]*): SELECT regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n' || E'BEGIN;\n' || 'DROP INDEX ' || i.indexname || E';\n' || 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n' || E'COMMIT;\n' FROM pg_indexes i WHERE schemaname !~ '^(pg_|information_schema$)'; Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that areconstraints and index that has FK depending on it. For the first case, you only need to change the constraint to use theindex and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate theFK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and perhapsfaster. Regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres I must be missing something here. But, how is that FK depends on the index? I understand FK lookup works much faster with the index supporting FK than without it, but you could have FK without index(on the "child" table). So, what gives? Regards, Igor Neyman
On Wed, May 29, 2013 at 10:55 AM, Igor Neyman <ineyman@perceptron.com> wrote:
I must be missing something here.
You could do something like this (which considers you use simple names for your indexes, where simple ~ [a-z_][a-z0-9_]*):
SELECT
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';
Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that are constraints and index that has FK depending on it. For the first case, you only need to change the constraint to use the index and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate the FK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and perhaps faster.
=================
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than without it, but you could have FK without index (on the "child" table).
So, what gives?
AFAIK, when you create a FK, PostgreSQL associate it with an UNIQUE INDEX on the target table. It creates an entry on pg_depends (I don't know if somewhere else), and when you try to drop the index, even if there is an identical one that PGs could use, it will throw an error.
You can easily check this:
postgres=# CREATE TABLE parent(id int);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx1 ON parent (id);
CREATE INDEX
postgres=# CREATE TABLE child(idparent int REFERENCES parent (id));
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx2 ON parent (id);
CREATE INDEX
postgres=# DROP INDEX parent_idx1;
ERROR: cannot drop index parent_idx1 because other objects depend on it
DETAIL: constraint child_idparent_fkey on table child depends on index parent_idx1
HINT: Use DROP ... CASCADE to drop the dependent objects too.
BTW, I do think PostgreSQL could verify if there is another candidate to this FK. Is it in TODO list? Should it be?
Regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres
From: Matheus de Oliveira [mailto:matioli.matheus@gmail.com] Sent: Wednesday, May 29, 2013 10:19 AM To: Igor Neyman Cc: Niels Kristian Schjødt; Magnus Hagander; pgsql-performance@postgresql.org list Subject: Re: [PERFORM] Best practice when reindexing in production On Wed, May 29, 2013 at 10:55 AM, Igor Neyman <ineyman@perceptron.com> wrote: You could do something like this (which considers you use simple names for your indexes, where simple ~ [a-z_][a-z0-9_]*): SELECT regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n' || E'BEGIN;\n' || 'DROP INDEX ' || i.indexname || E';\n' || 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n' || E'COMMIT;\n' FROM pg_indexes i WHERE schemaname !~ '^(pg_|information_schema$)'; Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that areconstraints and index that has FK depending on it. For the first case, you only need to change the constraint to use theindex and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate theFK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and perhapsfaster. ================= I must be missing something here. But, how is that FK depends on the index? I understand FK lookup works much faster with the index supporting FK than without it, but you could have FK without index(on the "child" table). So, what gives? AFAIK, when you create a FK, PostgreSQL associate it with an UNIQUE INDEX on the target table. It creates an entry on pg_depends(I don't know if somewhere else), and when you try to drop the index, even if there is an identical one that PGscould use, it will throw an error. You can easily check this: postgres=# CREATE TABLE parent(id int); CREATE TABLE postgres=# CREATE UNIQUE INDEX parent_idx1 ON parent (id); CREATE INDEX postgres=# CREATE TABLE child(idparent int REFERENCES parent (id)); CREATE TABLE postgres=# CREATE UNIQUE INDEX parent_idx2 ON parent (id); CREATE INDEX postgres=# DROP INDEX parent_idx1; ERROR: cannot drop index parent_idx1 because other objects depend on it DETAIL: constraint child_idparent_fkey on table child depends on index parent_idx1 HINT: Use DROP ... CASCADE to drop the dependent objects too. BTW, I do think PostgreSQL could verify if there is another candidate to this FK. Is it in TODO list? Should it be? Regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres So, it's about index on parent table that's used for unique (or PK) constraint and referenced by FK on child table. From your previous email I thought that index on child table supporting FK (which is mostly created for performance purposes)cannot be dropped without disabling FK. My bad. Igor Neyman
On 29/05/13 14:24, Niels Kristian Schjødt wrote:On 29/05/13 14:24, Niels Kristian Schjødt wrote: > Hi, > > I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach wouldyou suggest that I take on this? Hi. Since you still dont know wether it is worth it or not, I would strongly suggest that you test this out before. Simply just creating an index next to the old one with the same options (but different name) and compare sizes would be simple. Second, if the new index is significantly smaller than the old on, I suggest that you try to crank up the autovacuum daemon instead of blindly dropping and creating indexes, this will help to mitigate the bloat you're seeing accumulating in above test. Cranking up autovacuum is going to have significan less impact on the concurrent queries while doing it and can help to maintain the database in a shape where regular re-indexings shouldnt be nessesary. Autovacuum has build in logic to sleep inbetween operations in order to reduce the IO-load of you system for the benefit of concurrent users. The approach of duplicate indices will pull all the resources it can get and concurrent users may suffer while you do it.. Jesper -- Jesper
On Wed, 2013-05-29 at 19:12 +0200, Jesper Krogh wrote: > Second, if the new index is significantly smaller than the old on, I > suggest that you try to crank up the autovacuum daemon instead of > blindly dropping and creating indexes, this will help to mitigate the > bloat you're seeing accumulating in above test. In my experience vacuum/autovacuum just don't reclaim any space from the indexes, which accumulate bloat indefinitely. I've tried to work around that in so many ways: the show-stopper has been the impossibility to drop FK indexes in a concurrent way, coupled with VALIDATE CONSTRAINT not doing what advertised and taking an exclusive lock. My solution has been to become pg_repack maintainer. YMMV. Just don't expect vacuum to reduce the indexes size: it doesn't. -- Daniele
On Wednesday, May 29, 2013 06:25:21 PM Daniele Varrazzo wrote: > My solution has been to become pg_repack maintainer. YMMV. Just don't > expect vacuum to reduce the indexes size: it doesn't. It's not supposed to. It is supposed to keep them from indefinitely growing, though, which it does reasonably well at.
On Wed, May 29, 2013 at 6:47 PM, Alan Hodgson <ahodgson@simkin.ca> wrote: > On Wednesday, May 29, 2013 06:25:21 PM Daniele Varrazzo wrote: >> My solution has been to become pg_repack maintainer. YMMV. Just don't >> expect vacuum to reduce the indexes size: it doesn't. > > It's not supposed to. It is supposed to keep them from indefinitely growing, > though, which it does reasonably well at. My experience is different. I've repeated this test often. This is PG 9.1: piro=# create table test (id serial primary key); NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id" NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "test_pkey" for table "test" CREATE TABLE piro=# insert into test (id) select generate_series(1,10000000); INSERT 0 10000000 The table size is: piro=# select pg_size_pretty(pg_relation_size('test'::regclass)); pg_size_pretty ---------------- 306 MB (1 row) ...and the index size is: piro=# select pg_size_pretty(pg_relation_size('test_pkey'::regclass)); pg_size_pretty ---------------- 171 MB (1 row) piro=# delete from test where id <= 9900000; DELETE 9900000 piro=# select pg_size_pretty(pg_relation_size('test'::regclass)), pg_size_pretty(pg_relation_size('test_pkey'::regclass)); pg_size_pretty | pg_size_pretty ----------------+---------------- 306 MB | 171 MB (1 row) My statement is that vacuum doesn't reclaim any space. Maybe sometimes in the tables, but never in the index, in my experience. piro=# vacuum test; VACUUM piro=# select pg_size_pretty(pg_relation_size('test'::regclass)), pg_size_pretty(pg_relation_size('test_pkey'::regclass)); pg_size_pretty | pg_size_pretty ----------------+---------------- 306 MB | 171 MB (1 row) Vacuum full is a different story, but doesn't work online. piro=# vacuum full test; VACUUM piro=# select pg_size_pretty(pg_relation_size('test'::regclass)), pg_size_pretty(pg_relation_size('test_pkey'::regclass)); pg_size_pretty | pg_size_pretty ----------------+---------------- 3144 kB | 1768 kB In our live system we have a small table of active records in a transient state. No record stages there for a long time. The size of the table stays reasonable (but not really stable) but not the indexes. One of them (friendly labeled "the index of death") is 5-6 columns wide and, given enough time, regularly grows into the gigabytes for a table in the order of the ~100k records, only tamed by a pg_repack treatment (previously by a create concurrently and drop). -- Daniele
On Wednesday, May 29, 2013, Niels Kristian Schjødt wrote:
Hi,
I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this?
I think the "best practice" is...not to do it in the first place.
There are some cases where it probably makes sense to reindex on a regular schedule. But unless you can specifically identify why you have one of those cases, then you probably don't have one.
Cheers,
Jeff