Thread: Question about clustering indexes and restores

Question about clustering indexes and restores

From
Harold A. Giménez Ch.
Date:
Hi list,

Clustering my indexes dramatically improves the query performance of many of my queries. Also, the actual clustering takes a very long time for big databases, roughly 20 hours. I have two questions about how to improve this:

1. I've tweaked maintenance_mem_max and effective_cache_size to a point where the cluster operation uses a good chunk of my physical RAM, and the OS does not start swapping. Is there any other parameter I should look at?

2. Reading the documentation for cluster at http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html, I see that all clustering does is reorder the data on disk to 'match' the order of the clustered index. My question is, if I dump a clustered database using pg_dump in custom format, is it necessary to cluster after restoring it? Or does a dump/restore not guarantee that the order of the data restored is the same as the original dumped database?

3. Somewhat related to #2, what is the best way to move data from a staging database on one server, to the production environment on a different server? I've been using pg_dump/pg_restore, but there must be a better way...


Thanks for any pointers,

-Harold

Re: Question about clustering indexes and restores

From
Kenneth Marshall
Date:
On Thu, Jan 22, 2009 at 02:52:12PM -0500, Harold A. Gim?nez Ch. wrote:
> Hi list,
>
> Clustering my indexes dramatically improves the query performance of many of
> my queries. Also, the actual clustering takes a very long time for big
> databases, roughly 20 hours. I have two questions about how to improve this:
>
> 1. I've tweaked maintenance_mem_max and effective_cache_size to a point
> where the cluster operation uses a good chunk of my physical RAM, and the OS
> does not start swapping. Is there any other parameter I should look at?
>
> 2. Reading the documentation for cluster at
> http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html, I see that
> all clustering does is reorder the data on disk to 'match' the order of the
> clustered index. My question is, if I dump a clustered database using
> pg_dump in custom format, is it necessary to cluster after restoring it? Or
> does a dump/restore not guarantee that the order of the data restored is the
> same as the original dumped database?
>
> 3. Somewhat related to #2, what is the best way to move data from a staging
> database on one server, to the production environment on a different server?
> I've been using pg_dump/pg_restore, but there must be a better way...
>
>
> Thanks for any pointers,
>
> -Harold

Harold,

There have been discussions on the hackers list about the pessimal
cluster performance. Here is a pointer to the discussion, it seems
that a faster way is to build a new table with the desired orderwith
"CREATE TABLE AS ... ORDER BY ...":

http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg121205.html

Cheers,
Ken

Re: Question about clustering indexes and restores

From
Harold A. Giménez Ch.
Date:
Many thanks for your answer. I did see a comment about this in the documentation on the link I posted below.

My main question remains though: Is it necessary to cluster after a restore?

Thanks again!

On Thu, Jan 22, 2009 at 2:58 PM, Kenneth Marshall <ktm@rice.edu> wrote:
On Thu, Jan 22, 2009 at 02:52:12PM -0500, Harold A. Gim?nez Ch. wrote:
> Hi list,
>
> Clustering my indexes dramatically improves the query performance of many of
> my queries. Also, the actual clustering takes a very long time for big
> databases, roughly 20 hours. I have two questions about how to improve this:
>
> 1. I've tweaked maintenance_mem_max and effective_cache_size to a point
> where the cluster operation uses a good chunk of my physical RAM, and the OS
> does not start swapping. Is there any other parameter I should look at?
>
> 2. Reading the documentation for cluster at
> http://www.postgresql.org/docs/8.3/interactive/sql-cluster.html, I see that
> all clustering does is reorder the data on disk to 'match' the order of the
> clustered index. My question is, if I dump a clustered database using
> pg_dump in custom format, is it necessary to cluster after restoring it? Or
> does a dump/restore not guarantee that the order of the data restored is the
> same as the original dumped database?
>
> 3. Somewhat related to #2, what is the best way to move data from a staging
> database on one server, to the production environment on a different server?
> I've been using pg_dump/pg_restore, but there must be a better way...
>
>
> Thanks for any pointers,
>
> -Harold

Harold,

There have been discussions on the hackers list about the pessimal
cluster performance. Here is a pointer to the discussion, it seems
that a faster way is to build a new table with the desired orderwith
"CREATE TABLE AS ... ORDER BY ...":

http://www.mail-archive.com/pgsql-hackers@postgresql.org/msg121205.html

Cheers,
Ken