Home > mailing lists

Re: Query Optimization - Mailing list pgsql-performance

From	John A Meinel
Subject	Re: Query Optimization
Date	March 9, 2005 01:46:50
Msg-id	422E557D.7030500@arbash-meinel.com Whole thread Raw
In response to	Query Optimization (James G Wilkinson <jgw@alpinegeophysics.com>)
List	pgsql-performance

Tree view

James G Wilkinson wrote:

> All,
>
...

> Firstly, I  am frankly mystified on how to interpret all this.  If anyone
> could point me to a document or two that will help me decipher this,
> I will greatly appreciate it.
>
I assume you have looked at:
http://www.postgresql.org/docs/8.0/static/performance-tips.html
And didn't find it helpful enough. I'm not really sure what help you are
asking. Are you saying that this query is performing slowly and you want
to speed it up? Or you just want to understand how to interpret the
output of explain?

> Secondly, I have figured out that SEQ SCANs are typically bad.  I am
> concerned that a SEQ SCAN is being performed on 'biogenic_beld3_data'
> which is the largest table in the query.  I would rather have a SEQ SCAN
> be performed on 'tmpgrid' which contains the keys that subset the data
> from 'biogenic_beld3_data.'  Is this naive on my part?

It depends how much data is being extracted. If you have 1,000,000 rows,
and only need 10, then an index scan is wonderful. If you need 999,999,
then a sequential scan is much better (the break even point is <10%)
 From the explain, it thinks it is going to be needing 5,637 rows from
biogenic_beld3_data, what is that portion relative to the total?

The values at least look like you've run vacuum analyze. Have you tried
running "explain analyze" instead of just explain? Then you can see if
the planners estimates are accurate.

If you want some help to force it, you could try a subselect query.
Something like:

select * from biogenic_beld3_data b where b.beld3_icell = (select
b_icell from tmpgrid_pk) and b.beld3_jcell = (select b_jcell from
tmpgrid_pk);

>
> Thirdly, I have run EXPLAIN on other queries that report back a
> GroupAggregate Cost=<low 300,000s> that runs in about 30 minutes
> on my relatively highend linux machine.  But when I run this particular
> query, it takes on the order of 90 minutes to complete.  Any thoughts
> on why this happens will be appreciated.
>
Remember cost is in terms of page fetches, not in seconds.
Probably it is just an issue of postgres mis-estimating the selectivity
of one of your queries.
Also, you have a fairly complex SUM occurring involving 4
multiplications on an estimated 150,000 rows. While doesn't seem like it
should take 90 minutes, it also isn't a trivial operation.

> Finally, if anyone can be so kind as to provide insight on how to better
> optimize this query, I will, again, be deeply grateful.
>
> Thanks in advance.
>
> terrakit
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster

John
=:->

Attachment

signature.asc

pgsql-performance by date:

From: John A Meinel
Date: 09 March 2005, 01:28:18
Subject: Re: pl/pgsql faster than raw SQL?

From: Michael Fuhr
Date: 09 March 2005, 02:15:40
Subject: Re: vacuum full, why multiple times ?

Re: Query Optimization - Mailing list pgsql-performance

Attachment

Previous

Next