Thread: Best practice when reindexing in production

Best practice when reindexing in production

From

Niels Kristian Schjødt

Date:

29 May 2013, 12:24:27

Hi,

I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would
liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be
usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach
wouldyou suggest that I take on this? 

Regards Niels Kristian

Re: Best practice when reindexing in production

From

Magnus Hagander

Date:

29 May 2013, 12:26:25

On Wed, May 29, 2013 at 8:24 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> Hi,
>
> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would
liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be
usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach
wouldyou suggest that I take on this? 

If you have the diskspaec, it's generally a good idea to do a CREATE
INDEX CONCURRENTLY, and then rename the new one into place (typically
in a transaction). (If your app, documentation or dba doesn't mind the
index changing names, you don't need to rename of course, you can just
drop the old one).

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Best practice when reindexing in production

From

Armand du Plessis

Date:

29 May 2013, 12:31:14

On Wed, May 29, 2013 at 2:26 PM, Magnus Hagander <magnus@hagander.net> wrote:

On Wed, May 29, 2013 at 8:24 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> Hi,
>
> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this?

If you have the diskspaec, it's generally a good idea to do a CREATE
INDEX CONCURRENTLY, and then rename the new one into place (typically
in a transaction). (If your app, documentation or dba doesn't mind the
index changing names, you don't need to rename of course, you can just
drop the old one).

If you wish to recluster it online you can also look into pg_repack - https://github.com/reorg/pg_repack Great tool allows you to repack and reindex your database without going offline.

Re: Best practice when reindexing in production

From

Niels Kristian Schjødt

Date:

29 May 2013, 12:38:50

I looked at pg_repack - however - is it "safe" for production?

It seems very intrusive and black-box-like to me...

Den 29/05/2013 kl. 14.30 skrev Armand du Plessis <adp@bank.io>:

On Wed, May 29, 2013 at 2:26 PM, Magnus Hagander <magnus@hagander.net> wrote:
On Wed, May 29, 2013 at 8:24 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> Hi,
>
> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this?

If you have the diskspaec, it's generally a good idea to do a CREATE
INDEX CONCURRENTLY, and then rename the new one into place (typically
in a transaction). (If your app, documentation or dba doesn't mind the
index changing names, you don't need to rename of course, you can just
drop the old one).

If you wish to recluster it online you can also look into pg_repack - https://github.com/reorg/pg_repack Great tool allows you to repack and reindex your database without going offline.

Re: Best practice when reindexing in production

From

Niels Kristian Schjødt

Date:

29 May 2013, 12:42:06

Thanks

Can you think of a way to select all the indexes programmatically from a table and run CREATE INDEX CONCURRENTLY for
eachof them, so that I don't have to hardcode every index name + create statement ? 



Den 29/05/2013 kl. 14.26 skrev Magnus Hagander <magnus@hagander.net>:

> On Wed, May 29, 2013 at 8:24 AM, Niels Kristian Schjødt
> <nielskristian@autouncle.com> wrote:
>> Hi,
>>
>> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would
liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be
usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach
wouldyou suggest that I take on this? 
>
> If you have the diskspaec, it's generally a good idea to do a CREATE
> INDEX CONCURRENTLY, and then rename the new one into place (typically
> in a transaction). (If your app, documentation or dba doesn't mind the
> index changing names, you don't need to rename of course, you can just
> drop the old one).
>
>
> --
> Magnus Hagander
> Me: http://www.hagander.net/
> Work: http://www.redpill-linpro.com/

Re: Best practice when reindexing in production

From

Magnus Hagander

Date:

29 May 2013, 13:08:51

On Wed, May 29, 2013 at 8:41 AM, Niels Kristian Schjødt
<nielskristian@autouncle.com> wrote:
> Thanks
>
> Can you think of a way to select all the indexes programmatically from a table and run CREATE INDEX CONCURRENTLY for
eachof them, so that I don't have to hardcode every index name + create statement ? 

You can use something like SELECT pg_get_indexdef(indexrelid) FROM
pg_index. You will need to filter it not to include system indexes,
toast, etc, and then insert the CONCURRENCY part, but it should give
you a good startingpoint.

--
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: Best practice when reindexing in production

From

Matheus de Oliveira

Date:

29 May 2013, 13:12:44

On Wed, May 29, 2013 at 9:41 AM, Niels Kristian Schjødt <nielskristian@autouncle.com> wrote:

Thanks

Can you think of a way to select all the indexes programmatically from a table and run CREATE INDEX CONCURRENTLY for each of them, so that I don't have to hardcode every index name + create statement ?

You could do something like this (which considers you use simple names for your indexes, where simple ~ [a-z_][a-z0-9_]*):

SELECT

regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'

|| E'BEGIN;\n'

|| 'DROP INDEX ' || i.indexname || E';\n'

|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'

|| E'COMMIT;\n'

FROM pg_indexes i

WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that are constraints and index that has FK depending on it. For the first case, you only need to change the constraint to use the index and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate the FK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and perhaps faster.

Regards,

Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: Best practice when reindexing in production

From

Igor Neyman

Date:

29 May 2013, 13:56:02


You could do something like this (which considers you use simple names for your indexes, where simple
~ [a-z_][a-z0-9_]*):

SELECT 
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') ||
E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that
areconstraints and index that has FK depending on it. For the first case, you only need to change the constraint to use
theindex and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate
theFK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and
perhapsfaster. 


Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than without it, but you could have FK without
index(on the "child" table). 
So, what gives?

Regards,
Igor Neyman

Re: Best practice when reindexing in production

From

Matheus de Oliveira

Date:

29 May 2013, 14:19:34

On Wed, May 29, 2013 at 10:55 AM, Igor Neyman <ineyman@perceptron.com> wrote:

You could do something like this (which considers you use simple names for your indexes, where simple ~ [a-z_][a-z0-9_]*):

SELECT
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') || E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that are constraints and index that has FK depending on it. For the first case, you only need to change the constraint to use the index and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate the FK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and perhaps faster.

=================

I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than without it, but you could have FK without index (on the "child" table).
So, what gives?

AFAIK, when you create a FK, PostgreSQL associate it with an UNIQUE INDEX on the target table. It creates an entry on pg_depends (I don't know if somewhere else), and when you try to drop the index, even if there is an identical one that PGs could use, it will throw an error.

You can easily check this:

postgres=# CREATE TABLE parent(id int);

CREATE TABLE

postgres=# CREATE UNIQUE INDEX parent_idx1 ON parent (id);

CREATE INDEX

postgres=# CREATE TABLE child(idparent int REFERENCES parent (id));

CREATE TABLE

postgres=# CREATE UNIQUE INDEX parent_idx2 ON parent (id);

CREATE INDEX

postgres=# DROP INDEX parent_idx1;

ERROR: cannot drop index parent_idx1 because other objects depend on it

DETAIL: constraint child_idparent_fkey on table child depends on index parent_idx1

HINT: Use DROP ... CASCADE to drop the dependent objects too.

BTW, I do think PostgreSQL could verify if there is another candidate to this FK. Is it in TODO list? Should it be?

Regards,

--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

Re: Best practice when reindexing in production

From

Igor Neyman

Date:

29 May 2013, 16:35:53

From: Matheus de Oliveira [mailto:matioli.matheus@gmail.com]
Sent: Wednesday, May 29, 2013 10:19 AM
To: Igor Neyman
Cc: Niels Kristian Schjødt; Magnus Hagander; pgsql-performance@postgresql.org list
Subject: Re: [PERFORM] Best practice when reindexing in production

On Wed, May 29, 2013 at 10:55 AM, Igor Neyman <ineyman@perceptron.com> wrote:

You could do something like this (which considers you use simple names for your indexes, where simple
~ [a-z_][a-z0-9_]*):

SELECT 
regexp_replace(i.indexdef, '^CREATE( UNIQUE)? INDEX (.*) ON (.*)', 'CREATE\1 INDEX CONCURRENTLY tmp_\2 ON \3;') ||
E'\n'
|| E'BEGIN;\n'
|| 'DROP INDEX ' || i.indexname || E';\n'
|| 'ALTER INDEX tmp_' || i.indexname || ' RENAME TO ' || i.indexname || E';\n'
|| E'COMMIT;\n'
FROM pg_indexes i
WHERE schemaname !~ '^(pg_|information_schema$)';

Although this one is *really simple* and *error phrone*, because it does not consider at least two things: index that
areconstraints and index that has FK depending on it. For the first case, you only need to change the constraint to use
theindex and the DROP command. As for the second case, you would need to remove the FKs, drop the old one and recreate
theFK (inside a transaction, of course), but this could be really slow, a reindex for this case would be simpler and
perhapsfaster. 

=================
I must be missing something here.
But, how is that FK depends on the index?
I understand FK lookup works much faster with the index supporting FK than without it, but you could have FK without
index(on the "child" table). 
So, what gives?

AFAIK, when you create a FK, PostgreSQL associate it with an UNIQUE INDEX on the target table. It creates an entry on
pg_depends(I don't know if somewhere else), and when you try to drop the index, even if there is an identical one that
PGscould use, it will throw an error. 

You can easily check this:

postgres=# CREATE TABLE parent(id int);
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx1 ON parent (id);
CREATE INDEX
postgres=# CREATE TABLE child(idparent int REFERENCES parent (id));
CREATE TABLE
postgres=# CREATE UNIQUE INDEX parent_idx2 ON parent (id);
CREATE INDEX
postgres=# DROP INDEX parent_idx1;
ERROR:  cannot drop index parent_idx1 because other objects depend on it
DETAIL:  constraint child_idparent_fkey on table child depends on index parent_idx1
HINT:  Use DROP ... CASCADE to drop the dependent objects too.

BTW, I do think PostgreSQL could verify if there is another candidate to this FK. Is it in TODO list? Should it be?

Regards,
--
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres

So, it's about index on parent table that's used for unique (or PK) constraint and referenced by FK on child table.
From your previous email I thought that index on child table supporting FK (which is mostly created for performance
purposes)cannot be dropped without disabling FK. My bad. 

Igor Neyman

Re: Best practice when reindexing in production

From

Jesper Krogh

Date:

29 May 2013, 17:13:07

On 29/05/13 14:24, Niels Kristian Schjødt wrote:On 29/05/13 14:24, Niels
Kristian Schjødt wrote:
> Hi,
>
> I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would
liketo reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be
usablewhile doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach
wouldyou suggest that I take on this? 

Hi.

Since you still dont know wether it is worth it or not, I would strongly
suggest that you test this out before. Simply just creating an index
next to the old one with the same options (but different name) and
compare sizes would be simple.

Second, if the new index is significantly smaller than the old on, I
suggest that you try to crank up the autovacuum daemon instead of
blindly dropping and creating indexes, this will help to mitigate the
bloat you're seeing accumulating in above test.

Cranking up autovacuum is going to have significan less impact on the
concurrent queries while doing it and can help to maintain the database
in a shape where regular re-indexings shouldnt be nessesary. Autovacuum
has build in logic to sleep inbetween operations in order to reduce the
IO-load of you system for the benefit of concurrent users. The approach
of duplicate indices will pull all the resources it can get and
concurrent users may suffer while you do it..

Jesper

--
Jesper

Re: Best practice when reindexing in production

From

Daniele Varrazzo

Date:

29 May 2013, 17:25:29

On Wed, 2013-05-29 at 19:12 +0200, Jesper Krogh wrote:

> Second, if the new index is significantly smaller than the old on, I
> suggest that you try to crank up the autovacuum daemon instead of
> blindly dropping and creating indexes, this will help to mitigate the
> bloat you're seeing accumulating in above test.

In my experience vacuum/autovacuum just don't reclaim any space from the
indexes, which accumulate bloat indefinitely. I've tried to work around
that in so many ways: the show-stopper has been the impossibility to
drop FK indexes in a concurrent way, coupled with VALIDATE CONSTRAINT
not doing what advertised and taking an exclusive lock.

My solution has been to become pg_repack maintainer. YMMV. Just don't
expect vacuum to reduce the indexes size: it doesn't.

--
Daniele

Re: Best practice when reindexing in production

From

Alan Hodgson

Date:

29 May 2013, 17:47:12

On Wednesday, May 29, 2013 06:25:21 PM Daniele Varrazzo wrote:
> My solution has been to become pg_repack maintainer. YMMV. Just don't
> expect vacuum to reduce the indexes size: it doesn't.

It's not supposed to. It is supposed to keep them from indefinitely growing,
though, which it does reasonably well at.

Re: Best practice when reindexing in production

From

Daniele Varrazzo

Date:

29 May 2013, 19:22:47

On Wed, May 29, 2013 at 6:47 PM, Alan Hodgson <ahodgson@simkin.ca> wrote:
> On Wednesday, May 29, 2013 06:25:21 PM Daniele Varrazzo wrote:
>> My solution has been to become pg_repack maintainer. YMMV. Just don't
>> expect vacuum to reduce the indexes size: it doesn't.
>
> It's not supposed to. It is supposed to keep them from indefinitely growing,
> though, which it does reasonably well at.

My experience is different. I've repeated this test often. This is PG 9.1:

piro=# create table test (id serial primary key);
NOTICE:  CREATE TABLE will create implicit sequence "test_id_seq" for
serial column "test.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"test_pkey" for table "test"
CREATE TABLE
piro=# insert into test (id) select generate_series(1,10000000);
INSERT 0 10000000

The table size is:

piro=# select pg_size_pretty(pg_relation_size('test'::regclass));
 pg_size_pretty
----------------
 306 MB
(1 row)

...and the index size is:

piro=# select pg_size_pretty(pg_relation_size('test_pkey'::regclass));
 pg_size_pretty
----------------
 171 MB
(1 row)

piro=# delete from test where id <= 9900000;
DELETE 9900000

piro=# select pg_size_pretty(pg_relation_size('test'::regclass)),
pg_size_pretty(pg_relation_size('test_pkey'::regclass));
 pg_size_pretty | pg_size_pretty
----------------+----------------
 306 MB         | 171 MB
(1 row)

My statement is that vacuum doesn't reclaim any space. Maybe sometimes
in the tables, but never in the index, in my experience.

piro=# vacuum test;
VACUUM
piro=# select pg_size_pretty(pg_relation_size('test'::regclass)),
pg_size_pretty(pg_relation_size('test_pkey'::regclass));
 pg_size_pretty | pg_size_pretty
----------------+----------------
 306 MB         | 171 MB
(1 row)

Vacuum full is a different story, but doesn't work online.

piro=# vacuum full test;
VACUUM
piro=# select pg_size_pretty(pg_relation_size('test'::regclass)),
pg_size_pretty(pg_relation_size('test_pkey'::regclass));
 pg_size_pretty | pg_size_pretty
----------------+----------------
 3144 kB        | 1768 kB


In our live system we have a small table of active records in a
transient state. No record stages there for a long time. The size of
the table stays reasonable (but not really stable) but not the
indexes. One of them (friendly labeled "the index of death") is 5-6
columns wide and, given enough time, regularly grows into the
gigabytes for a table in the order of the ~100k records, only tamed by
a pg_repack treatment (previously by a create concurrently and drop).


-- Daniele

Re: Best practice when reindexing in production

From

Jeff Janes

Date:

01 June 2013, 02:38:29

On Wednesday, May 29, 2013, Niels Kristian Schjødt wrote:

Hi,

I have a database with quite some data (millions of rows), that is heavily updated all the time. Once a day I would like to reindex my database (and maybe re cluster it - don't know if that's worth it yet?). I need the database to be usable while doing this (both read and write). I see that there is no way to REINDEX CONCURRENTLY - So what approach would you suggest that I take on this?

I think the "best practice" is...not to do it in the first place.

There are some cases where it probably makes sense to reindex on a regular schedule. But unless you can specifically identify why you have one of those cases, then you probably don't have one.

Cheers,

Jeff