Re: [ SOLVED ] select count(*) very slow on an already - Mailing list pgsql-performance

From Shea,Dan [CIS]
Subject Re: [ SOLVED ] select count(*) very slow on an already
Date
Msg-id F2D63B916C88C14D9B59F93C2A5DD33F0B9233@cisxa.cmc.int.ec.gc.ca
Whole thread Raw
List pgsql-performance
Just a note, I was trying the cluster command and was short on space.  I
figured I had enough space for the new table and index.  It failed on me
twice.
The reason is that I noticed for the command to complete, it needed the
space of the new table and 2x the space of the new index.
It looks like it creates the new table, then a new index. Afterwards it
looked like it creates another index in the DB pgsql_tmp.  So for me this is
an important consideration, since the new index size was about 7GB.
I had not anticipated the second index size so that is why it failed.  I
ended up creating a link of pgsql_tmp to another parttion to successfully
complete.

Dan.

-----Original Message-----
From: Bill Moran [mailto:wmoran@potentialtech.com]
Sent: Thursday, April 15, 2004 4:14 PM
To: Shea,Dan [CIS]
Cc: Postgres Performance
Subject: Re: [PERFORM] [ SOLVED ] select count(*) very slow on an
already


Shea,Dan [CIS] wrote:
> Bill, if you had alot of updates and deletions and wanted to optimize your
> table, can you just issue the cluster command.
> Will the cluster command rewrite the table without the obsolete data that
a
> vacuum flags or do you need to issue a vacuum first?

 From the reference docs:

"During the cluster operation, a temporary copy of the table is created that
contains the table data in the index order. Temporary copies of each index
on the table are created as well. Therefore, you need free space on disk at
least equal to the sum of the table size and the index sizes.

"CLUSTER preserves GRANT, inheritance, index, foreign key, and other
ancillary
information about the table.

"Because the optimizer records statistics about the ordering of tables, it
is
advisable to run ANALYZE on the newly clustered table. Otherwise, the
optimizer
may make poor choices of query plans."

The primary reason CLUSTER exists is to allow you to physically reorder a
table
based on a key.  This should provide a performance improvement if data with
the same key is accessed all at once.  (i.e. if you do "SELECT * FROM table
WHERE
key=5" and it returns 100 rows, those 100 rows are guaranteed to be all on
the
same part of the disk after CLUSTER, thus a performance improvement should
result.)

Updates and inserts will add data in the next available space in a table
with no
regard for any keys, and _may_ require running all over the disk to retrieve
the data in the previous example query.

I doubt if CLUSTER is an end-all optimization tool.  The specific reason I
suggested it was because the original poster was asking for an easier way to
drop/recreate a table (as prior experimentation had shown this to improve
performance)  I can't think of anything easier than "CLUSTER <tablename> ON
<keyname>"

Since CLUSTER recreates the table, it implicitly removes the dead tuples.
However, it's going to be a LOT slower than vacuum, so if dead tuples are
the
main problem, vacuum is still the way to go.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com

pgsql-performance by date:

Previous
From: Dirk Lutzebäck
Date:
Subject: RESOLVED: Re: Wierd context-switching issue on Xeon
Next
From: Tom Lane
Date:
Subject: Re: RESOLVED: Re: Wierd context-switching issue on Xeon