Thread: Tuning to speed select

Tuning to speed select

From
Tom Laudeman
Date:
Hi,

I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM
and an IDE hard drive. My big table has around 9 million records.

Is there a tuning parameter I can change to increase speed of selects?
Clearly, there's already some buffering going on since selecting an
indexed ~50,000 records takes 17 seconds on the first try, and only 0.5
seconds on the second try (from pgsql).

cowpea=> explain analyze select bs_fk from blast_result where
si_fk=11843254;
                                                                 QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using si_fk_index on blast_result  (cost=0.00..22874.87
rows=58118 width=4) (actual time=112.249..17472.935 rows=50283 loops=1)
   Index Cond: (si_fk = 11843254)
 Total runtime: 17642.522 ms
(3 rows)

cowpea=>  explain analyze select bs_fk from blast_result where
si_fk=11843254;
                                                               QUERY
PLAN

----------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using si_fk_index on blast_result  (cost=0.00..22874.87
rows=58118 width=4) (actual time=0.178..341.643 rows=50283 loops=1)
   Index Cond: (si_fk = 11843254)
 Total runtime: 505.011 ms
(3 rows)

cowpea=>



Thanks,
Tom


--
Tom Laudeman
twl8n@virginia.edu
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/


Re: Tuning to speed select

From
Martijn van Oosterhout
Date:
On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote:
> Hi,
>
> I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM
> and an IDE hard drive. My big table has around 9 million records.
>
> Is there a tuning parameter I can change to increase speed of selects?
> Clearly, there's already some buffering going on since selecting an
> indexed ~50,000 records takes 17 seconds on the first try, and only 0.5
> seconds on the second try (from pgsql).

Your OS is probably buffering, 1GB of RAM holds a lot of data. You can
try increasing the shared_buffers parameter, but if the delay is
getting data from the disk, that won't really help you.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Attachment

Re: Tuning to speed select

From
Michael Fuhr
Date:
On Wed, Aug 09, 2006 at 03:46:38PM +0200, Martijn van Oosterhout wrote:
> On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote:
> > Is there a tuning parameter I can change to increase speed of selects?
> > Clearly, there's already some buffering going on since selecting an
> > indexed ~50,000 records takes 17 seconds on the first try, and only 0.5
> > seconds on the second try (from pgsql).
>
> Your OS is probably buffering, 1GB of RAM holds a lot of data. You can
> try increasing the shared_buffers parameter, but if the delay is
> getting data from the disk, that won't really help you.

If most of your queries use the same index then clustering on that
index might speed up initial (i.e., not-cached) queries by reducing
the number of disk pages that need to be read.  See the documentation
for more information.

http://www.postgresql.org/docs/8.1/interactive/sql-cluster.html

--
Michael Fuhr

Re: Tuning to speed select

From
louis gonzales
Date:
What about creating views on areas of the table that are queried often?
I don't know if you have access or the ability to find what type of
trends the table has, in terms of queries, but if you create some views
on frequently visited information, this could also help.

Tom Laudeman wrote:

> Hi,
>
> I'm running PostgreSQL version 8 on a dual 2.4GHz Xeon with 1GB of RAM
> and an IDE hard drive. My big table has around 9 million records.
>
> Is there a tuning parameter I can change to increase speed of selects?
> Clearly, there's already some buffering going on since selecting an
> indexed ~50,000 records takes 17 seconds on the first try, and only
> 0.5 seconds on the second try (from pgsql).
>
> cowpea=> explain analyze select bs_fk from blast_result where
> si_fk=11843254;
>                                                                 QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------

>
> Index Scan using si_fk_index on blast_result  (cost=0.00..22874.87
> rows=58118 width=4) (actual time=112.249..17472.935 rows=50283 loops=1)
>   Index Cond: (si_fk = 11843254)
> Total runtime: 17642.522 ms
> (3 rows)
>
> cowpea=>  explain analyze select bs_fk from blast_result where
> si_fk=11843254;
>                                                               QUERY
> PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------

>
> Index Scan using si_fk_index on blast_result  (cost=0.00..22874.87
> rows=58118 width=4) (actual time=0.178..341.643 rows=50283 loops=1)
>   Index Cond: (si_fk = 11843254)
> Total runtime: 505.011 ms
> (3 rows)
>
> cowpea=>
>
>
> Thanks,
> Tom
>
>


Re: Tuning to speed select

From
"Harald Armin Massa"
Date:
Louis,

Views certainly help in managing complexity. They do nothing to improve query-speed.

Querying a view gets rewritten to queries to the underlying tables on the fly.
(as long as there are no materialized views, which are still on a the TODO list)

--
GHUM Harald Massa
persuadere et programmare
Harald Armin Massa
Reinsburgstraße 202b
70197 Stuttgart
0173/9409607
-
Let's set so double the killer delete select all.

Re: Tuning to speed select

From
Richard Broersma Jr
Date:
> Views certainly help in managing complexity. They do nothing to improve
> query-speed.
>
> Querying a view gets rewritten to queries to the underlying tables on the
> fly.
> (as long as there are no materialized views, which are still on a the TODO
> list)

Would partial indexs on the most queried regions of the table help in query speed?

Regards,

Richard Broersma Jr.

Re: Tuning to speed select

From
louis gonzales
Date:
I'm not so sure about that, when you create a view on a table - at least
with Oracle - which is a subset(the trivial or 'proper' subset is the
entire table view) of the information on a table, when a select is
issued against a table, Oracle at least, determines if there is a view
already on a the table which potentially has a smaller amount of
information to process - as long as the view contains the proper
constraints that meet your 'select' criteria, the RDBMS engine will have
fewer records to process - which I'd say, certainly constitutes a time
benefit, in terms of 'performance gain.'

Hence my reasoning behind determining IF there is a subset of the 'big
table' that is frequented, I'd create a view on this, assuming
postgresql does this too?  Maybe somebody else can answer that for the
pgsql-general's general information?

query-speed itself is going to be as fast/slow as your system is
configured for, however my point was to shave some time off of a 1M+
record table, but implementing views of 'frequently' visisted/hit
records meeting the same specifications.

Harald Armin Massa wrote:

> Louis,
>
> Views certainly help in managing complexity. They do nothing to
> improve query-speed.
>
> Querying a view gets rewritten to queries to the underlying tables on
> the fly.
> (as long as there are no materialized views, which are still on a the
> TODO list)
>
> --
> GHUM Harald Massa
> persuadere et programmare
> Harald Armin Massa
> Reinsburgstraße 202b
> 70197 Stuttgart
> 0173/9409607
> -
> Let's set so double the killer delete select all.



Re: Tuning to speed select

From
"Scott Marlowe"
Date:

On Wed, 2006-08-09 at 14:58, louis gonzales wrote:
> I'm not so sure about that, when you create a view on a table - at least
> with Oracle - which is a subset(the trivial or 'proper' subset is the
> entire table view) of the information on a table, when a select is
> issued against a table, Oracle at least, determines if there is a view
> already on a the table which potentially has a smaller amount of
> information to process - as long as the view contains the proper
> constraints that meet your 'select' criteria, the RDBMS engine will have
> fewer records to process - which I'd say, certainly constitutes a time
> benefit, in terms of 'performance gain.'
>
> Hence my reasoning behind determining IF there is a subset of the 'big
> table' that is frequented, I'd create a view on this, assuming
> postgresql does this too?  Maybe somebody else can answer that for the
> pgsql-general's general information?
>
> query-speed itself is going to be as fast/slow as your system is
> configured for, however my point was to shave some time off of a 1M+
> record table, but implementing views of 'frequently' visisted/hit
> records meeting the same specifications.

There are basically two ways to do views.  The simple way, is to have a
view represent a query that gets run everytime you call it.  The more
complex way is to "materialize" the view data, and put it into a new
table, and then update that table whenever the source table changes.

PostgreSQL has native support for the first type.  They're cheap and
easy, and work for most of the things people need views for (i.e. hiding
complexity).

PostgreSQL is extensible, and therefore you can institute the second
type (i.e. materialized views) on your own.  Thanksfully, someone else
has already done most of the work for us, by the name of Jonathan
Gardner, and you can find his nifty guide by typing "materialized views
postgresql" into google.

Gardner's materialized views support several update methods depending on
what you need from your mat views.  It's also a danged fine tutorial on
how to write some simple plpgsql functions.

Re: Tuning to speed select

From
Tom Laudeman
Date:
Michael,
Great suggestion. I've read about CLUSTER, but never had a chance to use it. The only problem is that this table with 9 million records has 5 or 6 indexes. It is hard to pick the most used, but I'll bet CLUSTER will make at least one of the queries run very fast, especially for an index with a small number of distinct values.

The speed of the query is (as Michael implies) limited to the rate at which the disk can seek and read.  I have done experiments with views and cursors; there was no improvement in speed. I've also tried only pulling back  primary keys in the hope that a smaller amount of data would more quickly be read into memory. No speed increase. I have also raised all the usual memory limits, with the expected results (slight speed improvements).

I'll try CLUSTER (I'm looking forward to that test), but if we really need speed, it will probably be necessary to create copies of the table, or copy portions of the table elsewhere (essentially creating materialized views, I suppose). I'm still trying to get my science compatriot here to tell me which index he most wants to improve, then I'll CLUSTER the table on that index.

Thanks!
Tom

Michael Fuhr wrote:
On Wed, Aug 09, 2006 at 03:46:38PM +0200, Martijn van Oosterhout wrote: 
On Wed, Aug 09, 2006 at 09:19:31AM -0400, Tom Laudeman wrote:   
Is there a tuning parameter I can change to increase speed of selects? 
Clearly, there's already some buffering going on since selecting an 
indexed ~50,000 records takes 17 seconds on the first try, and only 0.5 
seconds on the second try (from pgsql).     
Your OS is probably buffering, 1GB of RAM holds a lot of data. You can
try increasing the shared_buffers parameter, but if the delay is
getting data from the disk, that won't really help you.   
If most of your queries use the same index then clustering on that
index might speed up initial (i.e., not-cached) queries by reducing
the number of disk pages that need to be read.  See the documentation
for more information.

http://www.postgresql.org/docs/8.1/interactive/sql-cluster.html
 

-- 
Tom Laudeman
twl8n@virginia.edu
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/

Re: Tuning to speed select

From
Michael Fuhr
Date:
On Wed, Aug 09, 2006 at 04:54:00PM -0400, Tom Laudeman wrote:
> I'll try CLUSTER (I'm looking forward to that test), but if we really
> need speed, it will probably be necessary to create copies of the table,
> or copy portions of the table elsewhere (essentially creating
> materialized views, I suppose). I'm still trying to get my science
> compatriot here to tell me which index he most wants to improve, then
> I'll CLUSTER the table on that index.

If you enable statistics collection then you could use those
statistics to see which indexes are used the most.  Those indexes
might be good candidates for clustering.

http://www.postgresql.org/docs/8.1/interactive/monitoring-stats.html

--
Michael Fuhr

Re: Tuning to speed select

From
Reece Hart
Date:
On Wed, 2006-08-09 at 16:54 -0400, Tom Laudeman wrote:
Great suggestion. I've read about CLUSTER, but never had a chance to use it. The only problem is that this table with 9 million records has 5 or 6 indexes. It is hard to pick the most used, but I'll bet CLUSTER will make at least one of the queries run very fast, especially for an index with a small number of distinct values.

Tom-

I found multi-column indexes and clustering to be extremely beneficial in 7.4. I still use them in 8.1, but I haven't compared them extensively with equivalent queries that use bitmap index scans. The obvious downside of having more indexes is the additional time and space overhead during insert, update, or delete.

The approach I took to design multi-column indexes was to run explain on representative queries and look for seq scans. The seq scans indicate which columns /might/ be well-served by indexes. In 7.4, the order of indexed columns was important. (I saw something in the 8.1 release notes that made me think that this was no longer true, but I haven't verified that.)

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0

Re: Tuning to speed select

From
Alban Hertroys
Date:
Richard Broersma Jr wrote:
>> Views certainly help in managing complexity. They do nothing to improve
>> query-speed.
>>
>> Querying a view gets rewritten to queries to the underlying tables on the
>> fly.
>> (as long as there are no materialized views, which are still on a the TODO
>> list)
>
> Would partial indexs on the most queried regions of the table help in query speed?

They would, as long as they can be used in the client application
queries. If all the data is often used (partially or not), then there's
little point to partial indices. But that's a border-case. In common,
they can make major differences.

Also, make sure you add enough constraints to your query so that the
partial index constraints match your query constraints, otherwise the
index will not be used.

Another point to check; make sure you don't cause type casts over
indexed columns. That might (under certain conditions) cause a type cast
on the column value of every indexed row, which does slow down things
significantly. I believe this was solved in PostgreSQL 8.something.

--
Alban Hertroys
alban@magproductions.nl

magproductions b.v.

T: ++31(0)534346874
F: ++31(0)534346876
M:
I: www.magproductions.nl
A: Postbus 416
    7500 AK Enschede

// Integrate Your World //

Re: Tuning to speed select

From
"Merlin Moncure"
Date:
On 8/9/06, Tom Laudeman <twl8n@virginia.edu> wrote:

>  The speed of the query is (as Michael implies) limited to the rate at which
> the disk can seek and read.  I have done experiments with views and cursors;
> there was no improvement in speed. I've also tried only pulling back
> primary keys in the hope that a smaller amount of data would more quickly be
> read into memory. No speed increase. I have also raised all the usual memory
> limits, with the expected results (slight speed improvements).
>

Are your data structures normalized?  Performance problems queying a
single giganto table is  usually (but not necessirly in your case) a
sign of a poorly designed table structure.

otherwise it's pretty clear you get the most bang for the buck with
hardware.  consider upping ram and/or buying better disks.  you could
buy cheap sata controller and 4 raptors in raid 0+1 configuration for
<1000$ and you will feel like you have supercomputer relative to what
you have now :)

merlin

Re: Tuning to speed select

From
Tom Laudeman
Date:
Reece,
We have some multi-column indexes. Speed of insert, update, and delete are not an issue since this data is essentially static: write-once, read-many.

As far as I can tell (from running actual tests) Postgres will not use a multi-column index when the SELECT is constraining on only one of the columns in the index. If I need a single column index, I create one. If I need three columns in an index, I create a specific 3 column index. Granted, my memory is fuzzy, and some of my testing was done on version 7.x and there may be improvements on version 8.x

-Tom


...

I found multi-column indexes and clustering to be extremely beneficial in 7.4. I still use them in 8.1, but I haven't compared them extensively with equivalent queries that use bitmap index scans. The obvious downside of having more indexes is the additional time and space overhead during insert, update, or delete.

...

-Reece

-- 
Reece Hart, http://harts.net/reece/, GPG:0x25EC91A0       

-- 
Tom Laudeman
twl8n@virginia.edu
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/

Re: Tuning to speed select

From
Michael Fuhr
Date:
On Thu, Aug 10, 2006 at 11:00:00AM -0400, Tom Laudeman wrote:
> As far as I can tell (from running actual tests) Postgres will not use a
> multi-column index when the SELECT is constraining on only one of the
> columns in the index.

Excerpt from the 8.1 Release Notes:

  * Allow nonconsecutive index columns to be used in a multicolumn
    index (Tom)

    For example, this allows an index on columns a,b,c to be used in
    a query with WHERE a = 4 and c = 10.

If you're querying only on c then an index on (a,b,c) will probably
be slower than an index on (c), but if you have only (a,b,c) then
the planner will consider using it.

--
Michael Fuhr

Re: Tuning to speed select

From
Tom Lane
Date:
Michael Fuhr <mike@fuhr.org> writes:
> Excerpt from the 8.1 Release Notes:
>   * Allow nonconsecutive index columns to be used in a multicolumn
>     index (Tom)

>     For example, this allows an index on columns a,b,c to be used in
>     a query with WHERE a = 4 and c = 10.

> If you're querying only on c then an index on (a,b,c) will probably
> be slower than an index on (c), but if you have only (a,b,c) then
> the planner will consider using it.

Note that if your query doesn't have a constraint on the first index
column, it's unlikely that that index will get chosen, because the
entire index would have to be scanned.  Before 8.1 it was guaranteed
not to be chosen.

            regards, tom lane

Re: Tuning to speed select

From
Tom Laudeman
Date:
Merlin,
The problem is disk. I've got a WD Caviar. hdparm says it does 44MB/sec
(I ran that in single user mode so there was nothing interfering). A WD
Caviar SE SATA in one of my servers at home gets 56MB/sec on a quiescent
system at runlevel 3. What kind of values does hdparm give for a SATA
Raptor?

I think my Dell Precision 650 has SATA on the motherboard. The boss says
I can order one drive, so what should I get? How much faster is RAID 0+1
than a single drive?

Aside from size, I can't see much difference between these drives (WD
Raptors at NewEgg):

http://www.newegg.com/Product/ProductList.asp?DEPA=0&type=&Description=raptor&Submit=ENE&Ntk=all&N=0&minPrice=&maxPrice=&Go.x=0&Go.y=0


CLUSTER certainly helped. Each of the following queries would have
returned roughly 50,000 records. Note that selecting a single record
from blast_result using an index is plenty fast ( ~ 50 ms), so my
primary concern is pulling back larger subsets of data.

It appears that count(*) on a CLUSTERed table uses the index (as opposed
to the old way of doing a sequential scan). Count on the table after
CLUSTER appears to be a *lot* faster, maybe almost 100x. I know we
shouldn't count, but we've been too lazy to keep the record counts in
another table, and our customers occasionally want to know how many
records are in a certain subset.

Before CLUSTER:
explain analyze select * from blast_result where si_fk=11843253;
Total runtime: 16334.539 ms

explain analyze select * from blast_result where si_fk=11843248;
Total runtime: 31406.999 ms

explain analyze select * from blast_result where si_fk=11218929;
Total runtime: 15319.440 ms


After CLUSTER and vacuum analyze:
explain analyze select * from blast_result where si_fk=11843253;
 Total runtime: 2343.893 ms

explain analyze select * from blast_result where si_fk=11843248;
 Total runtime: 2158.395 ms

explain analyze select * from blast_result where si_fk=11218929;
 Total runtime: 1880.586 ms

explain analyze select * from blast_result where si_fk=11843250;
 Total runtime: 2085.253 ms


Thanks,
Tom



>
> Are your data structures normalized?  Performance problems queying a
> single giganto table is  usually (but not necessirly in your case) a
> sign of a poorly designed table structure.
>
> otherwise it's pretty clear you get the most bang for the buck with
> hardware.  consider upping ram and/or buying better disks.  you could
> buy cheap sata controller and 4 raptors in raid 0+1 configuration for
> <1000$ and you will feel like you have supercomputer relative to what
> you have now :)
>
> merlin


--
Tom Laudeman
twl8n@virginia.edu
(434) 924-2456
http://www.people.virginia.edu/~twl8n/
http://laudeman.com/


Re: Tuning to speed select

From
"Merlin Moncure"
Date:
On 8/11/06, Tom Laudeman <twl8n@virginia.edu> wrote:
> Merlin,
> The problem is disk. I've got a WD Caviar. hdparm says it does 44MB/sec
> (I ran that in single user mode so there was nothing interfering). A WD
> Caviar SE SATA in one of my servers at home gets 56MB/sec on a quiescent
> system at runlevel 3. What kind of values does hdparm give for a SATA
> Raptor?

i dont have one handy on a linux box to test, but all raptors are 10k
drives with 16 mb cache. they feel about twice as fast as 7200 rpm ide
drives in general use.

> I think my Dell Precision 650 has SATA on the motherboard. The boss says
> I can order one drive, so what should I get? How much faster is RAID 0+1
> than a single drive?

depends on a lot of factors, near 100% improvement is realistic even
with software raid.  I would tell your boss that you could buy 2 36g
raptors (110$ each) do a simple raid 0.  just be aware that either
drive failing will take you out.  or, you could do raid 1 for
redundancy.

> Aside from size, I can't see much difference between these drives (WD
> Raptors at NewEgg):
>
http://www.newegg.com/Product/ProductList.asp?DEPA=0&type=&Description=raptor&Submit=ENE&Ntk=all&N=0&minPrice=&maxPrice=&Go.x=0&Go.y=0

the retail parts are more expensive as is the silly drive that you can
look into.  buy the cheapest part at the size level you need.

> CLUSTER certainly helped. Each of the following queries would have
> returned roughly 50,000 records. Note that selecting a single record
> from blast_result using an index is plenty fast ( ~ 50 ms), so my
> primary concern is pulling back larger subsets of data.

maybe. you may have had table bloat as well, cluster does a full table
rebuild like vacuum fuul.

> It appears that count(*) on a CLUSTERed table uses the index (as opposed
> to the old way of doing a sequential scan). Count on the table after
> CLUSTER appears to be a *lot* faster, maybe almost 100x. I know we
> shouldn't count, but we've been too lazy to keep the record counts in
> another table, and our customers occasionally want to know how many
> records are in a certain subset.

no, afaik count(*) uses the table still (try explain analyze). you
just compacted and optimized the table for efficient sequential scans.
 are you vacuuming regulary?

merlin

Re: Tuning to speed select

From
Roman Neuhauser
Date:
# twl8n@virginia.edu / 2006-08-11 10:12:40 -0400:
> I think my Dell Precision 650 has SATA on the motherboard. The boss says
> I can order one drive, so what should I get? How much faster is RAID 0+1
> than a single drive?

    You need 4 disks for 0+1 (or 1+0, also called 10).

--
How many Vietnam vets does it take to screw in a light bulb?
You don't know, man.  You don't KNOW.
Cause you weren't THERE.             http://bash.org/?255991

Re: Tuning to speed select

From
"Joshua D. Drake"
Date:
Roman Neuhauser wrote:
> # twl8n@virginia.edu / 2006-08-11 10:12:40 -0400:
>> I think my Dell Precision 650 has SATA on the motherboard. The boss says
>> I can order one drive, so what should I get? How much faster is RAID 0+1
>> than a single drive?

If you can order one drive, get a drive that is twice the size of what
you have. Use one partition of that drive to create a raid 1. Use the
other partition to push off nightly rsyncs :)

Sincerely,

Joshua D. Drake


>
>     You need 4 disks for 0+1 (or 1+0, also called 10).
>


--

    === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
    Providing the most comprehensive  PostgreSQL solutions since 1997
              http://www.commandprompt.com/