Re: Tuning to speed select - Mailing list pgsql-general

From Merlin Moncure
Subject Re: Tuning to speed select
Date
Msg-id b42b73150608110741i742d4ab1v8f1d6e4fa169c33d@mail.gmail.com
Whole thread Raw
In response to Re: Tuning to speed select  (Tom Laudeman <twl8n@virginia.edu>)
List pgsql-general
On 8/11/06, Tom Laudeman <twl8n@virginia.edu> wrote:
> Merlin,
> The problem is disk. I've got a WD Caviar. hdparm says it does 44MB/sec
> (I ran that in single user mode so there was nothing interfering). A WD
> Caviar SE SATA in one of my servers at home gets 56MB/sec on a quiescent
> system at runlevel 3. What kind of values does hdparm give for a SATA
> Raptor?

i dont have one handy on a linux box to test, but all raptors are 10k
drives with 16 mb cache. they feel about twice as fast as 7200 rpm ide
drives in general use.

> I think my Dell Precision 650 has SATA on the motherboard. The boss says
> I can order one drive, so what should I get? How much faster is RAID 0+1
> than a single drive?

depends on a lot of factors, near 100% improvement is realistic even
with software raid.  I would tell your boss that you could buy 2 36g
raptors (110$ each) do a simple raid 0.  just be aware that either
drive failing will take you out.  or, you could do raid 1 for
redundancy.

> Aside from size, I can't see much difference between these drives (WD
> Raptors at NewEgg):
>
http://www.newegg.com/Product/ProductList.asp?DEPA=0&type=&Description=raptor&Submit=ENE&Ntk=all&N=0&minPrice=&maxPrice=&Go.x=0&Go.y=0

the retail parts are more expensive as is the silly drive that you can
look into.  buy the cheapest part at the size level you need.

> CLUSTER certainly helped. Each of the following queries would have
> returned roughly 50,000 records. Note that selecting a single record
> from blast_result using an index is plenty fast ( ~ 50 ms), so my
> primary concern is pulling back larger subsets of data.

maybe. you may have had table bloat as well, cluster does a full table
rebuild like vacuum fuul.

> It appears that count(*) on a CLUSTERed table uses the index (as opposed
> to the old way of doing a sequential scan). Count on the table after
> CLUSTER appears to be a *lot* faster, maybe almost 100x. I know we
> shouldn't count, but we've been too lazy to keep the record counts in
> another table, and our customers occasionally want to know how many
> records are in a certain subset.

no, afaik count(*) uses the table still (try explain analyze). you
just compacted and optimized the table for efficient sequential scans.
 are you vacuuming regulary?

merlin

pgsql-general by date:

Previous
From: Tom Laudeman
Date:
Subject: Re: Tuning to speed select
Next
From: John Purser
Date:
Subject: Re: VACUUM VERBOSE output to STDERR