Thread: Re: [ SOLVED ] select count(*) very slow on an already

Re: [ SOLVED ] select count(*) very slow on an already

From
"Shea,Dan [CIS]"
Date:
Bill, if you had alot of updates and deletions and wanted to optimize your
table, can you just issue the cluster command.
Will the cluster command rewrite the table without the obsolete data that a
vacuum flags or do you need to issue a vacuum first?
Dan.

-----Original Message-----
From: Bill Moran [mailto:wmoran@potentialtech.com]
Sent: Thursday, April 15, 2004 2:49 PM
To: Rajesh Kumar Mallah
Cc: Postgres Performance
Subject: Re: [PERFORM] [ SOLVED ] select count(*) very slow on an
already


Rajesh Kumar Mallah wrote:
> Bill Moran wrote:
>
>> Rajesh Kumar Mallah wrote:
>>
>>> Hi,
>>>
>>> The problem was solved by reloading the Table.
>>> the query now takes only 3 seconds. But that is
>>> not a solution.
>>
>> If dropping/recreating the table improves things, then we can reasonably
>> assume that the table is pretty active with updates/inserts.  Correct?
>
> Yes the table results from an import process and under goes lots
> of inserts and updates , but thats before the vacuum full operation.
> the table is not accessed during vacuum. What i want to know is
> is there any wat to automate the dumping and reload of a table
> individually. will the below be safe and effective:

The CLUSTER command I described is one way of doing this.  It
essentially automates the task of copying the table, dropping
the old one, and recreating it.

>> If the data gets too fragmented, a vacuum may not be enough.  Also, read
>> up on the recommendations _against_ vacuum full (recommending only using
>> vacuum on databases)  With full, vacuum condenses the database, which may
>> actually hurt performance.  A regular vacuum just fixes things up, and
>> may leave unused space lying around.  However, this should apparently
>> achieve a balance between usage and vacuum.  See the docs, they are much
>> better at describing this than I am.
>>
> i understand simultaneous vacuum and usage detoriates performance mostly.
> but this case is different.

Just want to make sure we're on the same page here.  I'm not talking about
vacuuming simultaneous with anything.  I'm simply saying that "vacuum full"
isn't always the best choice.  You should probably only be doing "vacuum".
The reason and details for this are in the admin docs.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com


---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly

Re: [ SOLVED ] select count(*) very slow on an already

From
Bill Moran
Date:
Shea,Dan [CIS] wrote:
> Bill, if you had alot of updates and deletions and wanted to optimize your
> table, can you just issue the cluster command.
> Will the cluster command rewrite the table without the obsolete data that a
> vacuum flags or do you need to issue a vacuum first?

 From the reference docs:

"During the cluster operation, a temporary copy of the table is created that
contains the table data in the index order. Temporary copies of each index
on the table are created as well. Therefore, you need free space on disk at
least equal to the sum of the table size and the index sizes.

"CLUSTER preserves GRANT, inheritance, index, foreign key, and other ancillary
information about the table.

"Because the optimizer records statistics about the ordering of tables, it is
advisable to run ANALYZE on the newly clustered table. Otherwise, the optimizer
may make poor choices of query plans."

The primary reason CLUSTER exists is to allow you to physically reorder a table
based on a key.  This should provide a performance improvement if data with
the same key is accessed all at once.  (i.e. if you do "SELECT * FROM table WHERE
key=5" and it returns 100 rows, those 100 rows are guaranteed to be all on the
same part of the disk after CLUSTER, thus a performance improvement should result.)

Updates and inserts will add data in the next available space in a table with no
regard for any keys, and _may_ require running all over the disk to retrieve
the data in the previous example query.

I doubt if CLUSTER is an end-all optimization tool.  The specific reason I
suggested it was because the original poster was asking for an easier way to
drop/recreate a table (as prior experimentation had shown this to improve
performance)  I can't think of anything easier than "CLUSTER <tablename> ON
<keyname>"

Since CLUSTER recreates the table, it implicitly removes the dead tuples.
However, it's going to be a LOT slower than vacuum, so if dead tuples are the
main problem, vacuum is still the way to go.

--
Bill Moran
Potential Technologies
http://www.potentialtech.com