Thread: Slow Query

Slow Query

From

"Ozer, Pam"

Date:

27 August 2010, 00:03:40

I am new to Postgres and I am trying to understand the Explain Analyze so I can tune the following query. I run the same query using mysql and it takes less than 50ms. I run it on postgres and it takes 10 seconds. I feel like I am missing something very obvious. (VehicleUsed is a big table over 750,000records) and datasetgroupyearmakemodel has 150000 records.

It looks like the cost is highest in the Hash Join on Postalcode. Am I reading this correctly.? I do have indexes on the lower(postalcode) in both tables. Why wouldn’t be using the index? Thanks in advance for any help.

Here is my query:

select distinct VehicleMake.VehicleMake

from VehicleUsed

inner join PostalCodeRegionCountyCity on ( lower ( VehicleUsed.PostalCode ) = lower ( PostalCodeRegionCountyCity.PostalCode ) )

INNER JOIN DATASETGROUPYEARMAKEMODEL ON ( VEHICLEUSED.VEHICLEYEAR = DATASETGROUPYEARMAKEMODEL.VEHICLEYEAR )

AND ( VEHICLEUSED.VEHICLEMAKEID = DATASETGROUPYEARMAKEMODEL.VEHICLEMAKEID )

AND ( VEHICLEUSED.VEHICLEMODELID = DATASETGROUPYEARMAKEMODEL.VEHICLEMODELID )

inner join VehicleMake on ( VehicleUsed.VehicleMakeId = VehicleMake.VehicleMakeId )

where

( DatasetGroupYearMakeModel.DatasetGroupId = 3 ) and

( VehicleUsed.DatasetId <> 113 )

and ( VehicleUsed.ProductGroupId <> 13 )

and ( PostalCodeRegionCountyCity.RegionId = 36 )

order by VehicleMake.VehicleMake

limit 500000

Here is the explain analyze

"Limit (cost=38292.53..38293.19 rows=261 width=8) (actual time=10675.857..10675.892 rows=42 loops=1)"

" -> Sort (cost=38292.53..38293.19 rows=261 width=8) (actual time=10675.855..10675.868 rows=42 loops=1)"

" Sort Key: vehiclemake.vehiclemake"

" Sort Method: quicksort Memory: 18kB"

" -> HashAggregate (cost=38279.45..38282.06 rows=261 width=8) (actual time=10675.710..10675.728 rows=42 loops=1)"

" -> Hash Join (cost=436.31..38270.51 rows=3576 width=8) (actual time=4.471..10658.291 rows=10425 loops=1)"

" Hash Cond: (vehicleused.vehiclemakeid = vehiclemake.vehiclemakeid)"

" -> Hash Join (cost=428.43..38213.47 rows=3576 width=4) (actual time=4.152..10639.742 rows=10425 loops=1)"

" Hash Cond: (lower((vehicleused.postalcode)::text) = lower((postalcoderegioncountycity.postalcode)::text))"

" -> Nested Loop (cost=101.81..37776.78 rows=11887 width=10) (actual time=1.172..9876.586 rows=382528 loops=1)"

" -> Bitmap Heap Scan on datasetgroupyearmakemodel (cost=101.81..948.81 rows=5360 width=6) (actual time=0.988..17.800 rows=5377 loops=1)"

" Recheck Cond: (datasetgroupid = 3)"

" -> Bitmap Index Scan on datasetgroupyearmakemodel_i04 (cost=0.00..100.47 rows=5360 width=0) (actual time=0.830..0.830 rows=5377 loops=1)"

" Index Cond: (datasetgroupid = 3)"

" -> Index Scan using vehicleused_i10 on vehicleused (cost=0.00..6.85 rows=1 width=12) (actual time=0.049..1.775 rows=71 loops=5377)"

" Index Cond: ((vehicleused.vehiclemodelid = datasetgroupyearmakemodel.vehiclemodelid) AND (vehicleused.vehiclemakeid = datasetgroupyearmakemodel.vehiclemakeid) AND (vehicleused.vehicleyear = datasetgroupyearmakemodel.vehicleyear))"

" Filter: ((vehicleused.datasetid <> 113) AND (vehicleused.productgroupid <> 13))"

" -> Hash (cost=308.93..308.93 rows=1416 width=6) (actual time=2.738..2.738 rows=1435 loops=1)"

" -> Bitmap Heap Scan on postalcoderegioncountycity (cost=27.23..308.93 rows=1416 width=6) (actual time=0.222..0.955 rows=1435 loops=1)"

" Recheck Cond: (regionid = 36)"

" -> Bitmap Index Scan on postalcoderegioncountycity_i05 (cost=0.00..26.87 rows=1416 width=0) (actual time=0.202..0.202 rows=1435 loops=1)"

" Index Cond: (regionid = 36)"

" -> Hash (cost=4.61..4.61 rows=261 width=10) (actual time=0.307..0.307 rows=261 loops=1)"

" -> Seq Scan on vehiclemake (cost=0.00..4.61 rows=261 width=10) (actual time=0.033..0.154 rows=261 loops=1)"

"Total runtime: 10676.058 ms"

Pam Ozer

Data Architect

pozer@automotive.com

tel. 949.705.3468

Source Interlink Media logo

Source Interlink Media

1733 Alton Pkwy Suite 100, Irvine, CA 92606

www.simautomotive.com

Confidentiality Notice- This electronic communication, and all information herein, including files attached hereto, is private, and is the property of the sender. This communication is intended only for the use of the individual or entity named above. If you are not the intended recipient, you are hereby notified that any disclosure of; dissemination of; distribution of; copying of; or, taking any action in reliance upon this communication, is strictly prohibited. If you have received this communication in error, please immediately notify us by telephone, (949)-705-3000, and destroy all copies of this communication. Thank you.

Attachment

Re: Slow Query

From

Scott Marlowe

Date:

27 August 2010, 00:19:05

On Thu, Aug 26, 2010 at 6:03 PM, Ozer, Pam <pozer@automotive.com> wrote:
>
> I am new to Postgres and I am trying to understand the Explain Analyze so I can tune the following query.  I run the
samequery using mysql and it takes less than 50ms.  I run it on postgres and it takes 10 seconds. I feel like I am
missingsomething very obvious. (VehicleUsed is a big table over 750,000records) and datasetgroupyearmakemodel has
150000records. 
>
> It looks like the cost is highest in the Hash Join  on Postalcode.   Am I reading this correctly.?  I do have indexes
onthe lower(postalcode) in both tables.  Why wouldn’t be using the index? 

No, it's spending most of its time here:



> "                          ->  Nested Loop  (cost=101.81..37776.78 rows=11887 width=10) (actual time=1.172..9876.586
rows=382528loops=1)" 

Note that it expects 11,887 rows but gets 382k rows.

Try turning up default stats target and running analyze again and see
how it runs.

Re: Slow Query

From

Bob Lunney

Date:

27 August 2010, 00:19:48

We need more information than that, like:

What version of PostgreSQL?
What does the hardware look like?
What does the disk and tablespace layout look like?
How are your configuration variables set?

Other than that, are the statistics up to date on the VehicleMake table?

Bob Lunney

--- On Thu, 8/26/10, Ozer, Pam <pozer@automotive.com> wrote:

From: Ozer, Pam <pozer@automotive.com>
Subject: [PERFORM] Slow Query
To: pgsql-performance@postgresql.org
Date: Thursday, August 26, 2010, 8:03 PM

I am new to Postgres and I am trying to understand the Explain Analyze so I can tune the following query. I run the same query using mysql and it takes less than 50ms. I run it on postgres and it takes 10 seconds. I feel like I am missing something very obvious. (VehicleUsed is a big table over 750,000records) and datasetgroupyearmakemodel has 150000 records.

It looks like the cost is highest in the Hash Join on Postalcode. Am I reading this correctly.? I do have indexes on the lower(postalcode) in both tables. Why wouldn’t be using the index? Thanks in advance for any help.

Here is my query:

select distinct VehicleMake.VehicleMake
from VehicleUsed
inner join PostalCodeRegionCountyCity on ( lower ( VehicleUsed.PostalCode ) = lower ( PostalCodeRegionCountyCity.PostalCode ) )
INNER JOIN DATASETGROUPYEARMAKEMODEL ON ( VEHICLEUSED.VEHICLEYEAR = DATASETGROUPYEARMAKEMODEL.VEHICLEYEAR )
AND ( VEHICLEUSED.VEHICLEMAKEID = DATASETGROUPYEARMAKEMODEL.VEHICLEMAKEID )
AND ( VEHICLEUSED.VEHICLEMODELID = DATASETGROUPYEARMAKEMODEL.VEHICLEMODELID )
inner join VehicleMake on ( VehicleUsed.VehicleMakeId = VehicleMake.VehicleMakeId )
where
( DatasetGroupYearMakeModel.DatasetGroupId = 3 ) and
( VehicleUsed.DatasetId <> 113 )
and ( VehicleUsed.ProductGroupId <> 13 )
and ( PostalCodeRegionCountyCity.RegionId = 36 )
order by VehicleMake.VehicleMake
limit 500000

Here is the explain analyze

"Limit (cost=38292.53..38293.19 rows=261 width=8) (actual time=10675.857..10675.892 rows=42 loops=1)"
" -> Sort (cost=38292.53..38293.19 rows=261 width=8) (actual time=10675.855..10675.868 rows=42 loops=1)"
"        Sort Key: vehiclemake.vehiclemake"
"        Sort Method: quicksort Memory: 18kB"
"        -> HashAggregate (cost=38279.45..38282.06 rows=261 width=8) (actual time=10675.710..10675.728 rows=42 loops=1)"
"              -> Hash Join (cost=436.31..38270.51 rows=3576 width=8) (actual time=4.471..10658.291 rows=10425 loops=1)"
"                    Hash Cond: (vehicleused.vehiclemakeid = vehiclemake.vehiclemakeid)"
"                    -> Hash Join (cost=428.43..38213.47 rows=3576 width=4) (actual time=4.152..10639.742 rows=10425 loops=1)"
"                          Hash Cond: (lower((vehicleused.postalcode)::text) = lower((postalcoderegioncountycity.postalcode)::text))"
"                          -> Nested Loop (cost=101.81..37776.78 rows=11887 width=10) (actual time=1.172..9876.586 rows=382528 loops=1)"
"                                -> Bitmap Heap Scan on datasetgroupyearmakemodel (cost=101.81..948.81 rows=5360 width=6) (actual time=0.988..17.800 rows=5377 loops=1)"
"                                      Recheck Cond: (datasetgroupid = 3)"
"                                      -> Bitmap Index Scan on datasetgroupyearmakemodel_i04 (cost=0.00..100.47 rows=5360 width=0) (actual time=0.830..0.830 rows=5377 loops=1)"
"                                            Index Cond: (datasetgroupid = 3)"
"                                -> Index Scan using vehicleused_i10 on vehicleused (cost=0.00..6.85 rows=1 width=12) (actual time=0.049..1.775 rows=71 loops=5377)"
"                                      Index Cond: ((vehicleused.vehiclemodelid = datasetgroupyearmakemodel.vehiclemodelid) AND (vehicleused.vehiclemakeid = datasetgroupyearmakemodel.vehiclemakeid) AND (vehicleused.vehicleyear = datasetgroupyearmakemodel.vehicleyear))"
"                                      Filter: ((vehicleused.datasetid <> 113) AND (vehicleused.productgroupid <> 13))"
"                          -> Hash (cost=308.93..308.93 rows=1416 width=6) (actual time=2.738..2.738 rows=1435 loops=1)"
"                                -> Bitmap Heap Scan on postalcoderegioncountycity (cost=27.23..308.93 rows=1416 width=6) (actual time=0.222..0.955 rows=1435 loops=1)"
"                                      Recheck Cond: (regionid = 36)"
"                                      -> Bitmap Index Scan on postalcoderegioncountycity_i05 (cost=0.00..26.87 rows=1416 width=0) (actual time=0.202..0.202 rows=1435 loops=1)"
"                                            Index Cond: (regionid = 36)"
"                    -> Hash (cost=4.61..4.61 rows=261 width=10) (actual time=0.307..0.307 rows=261 loops=1)"
"                          -> Seq Scan on vehiclemake (cost=0.00..4.61 rows=261 width=10) (actual time=0.033..0.154 rows=261 loops=1)"
"Total runtime: 10676.058 ms"

Pam Ozer
Data Architect
pozer@automotive.com
tel. 949.705.3468
Source Interlink Media
1733 Alton Pkwy Suite 100, Irvine, CA 92606
www.simautomotive.com
Confidentiality Notice- This electronic communication, and all information herein, including files attached hereto, is private, and is the property of the sender. This communication is intended only for the use of the individual or entity named above. If you are not the intended recipient, you are hereby notified that any disclosure of; dissemination of; distribution of; copying of; or, taking any action in reliance upon this communication, is strictly prohibited. If you have received this communication in error, please immediately notify us by telephone, (949)-705-3000, and destroy all copies of this communication. Thank you.

Attachment

Re: Slow Query

From

"Kevin Grittner"

Date:

27 August 2010, 14:17:59

"Ozer, Pam" <pozer@automotive.com> wrote:
> I am new to Postgres and I am trying to understand the Explain
> Analyze so I can tune the following query.

Fair enough.  If you wanted advice from others on how to tune the
query, you would need to provide additional information, as
described here:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> It looks like the cost is highest in the Hash Join on Postalcode.
> Am I reading this correctly.?

No.

Here's how I read it:

It's scanning the datasetgroupyearmakemodel_i04 index for
datasetgroupid = 3, and finding 5377 index entries in 1.775 ms.
Then it read the related tuples (in the order they exist in the
table's heap space).  Added to the index scan, that brings us to
17.8 ms.

For each of the preceding rows, it uses the vehicleused_i10 index to
find matching vehicleused rows.  On average it finds 71 rows in
1.775 ms, but it does this 5377 times, which brings us to 382528
rows in 9876.586 ms.

Independently of the above, it uses the
postalcoderegioncountycity_i05 index to find
postalcoderegioncountycity rows where regionid = 36, and puts them
into a hash table.  It takes 2.738 ms to hash the 1435 matching
rows.  Probing this hash for each of the 382528 rows leaves us with
10425 rows, and brings the run time to 10639.742 ms.

Independently of the above, the vehiclemake table is sequentially
scanned to create a hash table with 261 rows.  That takes 0.307 ms.
Probing that for each of the 10425 rows doesn't eliminate anything
and bring the run time to 10658.291 ms.

Since DISTINCT was specified, the results of the preceding are fed
into a hash table, leaving 42 distinct values.  Run time has now
reached 10675.728 ms.

These 42 rows are sorted, based on the ORDER BY clause, bringing us
to 10675.868 ms.

The output of the sort goes through a filter which will cut off
output after 500000 rows, per the LIMIT clause, bringing run time to
10675.892 ms.

> I do have indexes on the lower(postalcode)
> in both tables.  Why wouldn't be using the index?

Either it doesn't think it can use that index for some reason (e.g.,
one column is char(n) and the other is varchar(n)), or it thinks
that using that index will be slower.  The latter could be due to
bad statistics or improper configuration.  Without more information,
it's hard to guess why it thinks that or whether it's right.

I suggest you read the SlowQueryQuestions link and post with more
information.

I will offer the observation that the numbers above suggest a fair
amount of the data for this query came from cache, and unless you
have tuned your database's costing factors to allow it to expect
that, it's not surprising that it makes bad choices.  Just as an
experiment, try running these statements in a connection before your
query on that same connection -- if they work out well, it might
make sense to make similar changes to the postgresql.conf file:

set effective_cache_size = '3GB'; -- or 1GB less than total RAM
set seq_page_cost = '0.1';
set random_page_cost = '0.1';

-Kevin

Re: Slow Query

From

Greg Smith

Date:

27 August 2010, 17:31:02

Ozer, Pam wrote:

I am new to Postgres and I am trying to understand the Explain Analyze so I can tune the following query.

You should try http://explain.depesz.com/ , where you can post query plans like this and see where the time is going in a very easy to work with form. It will highlight row estimation mistakes for you too.

-- 
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: Slow Query

From

"Ozer, Pam"

Date:

27 August 2010, 17:44:30

Thank you all for your help. I will also get the rest of the information so it will be more clear.

From: Greg Smith [mailto:greg@2ndquadrant.com]
Sent: Friday, August 27, 2010 10:31 AM
To: Ozer, Pam
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow Query

Ozer, Pam wrote:

I am new to Postgres and I am trying to understand the Explain Analyze so I can tune the following query.

--

Greg Smith  2ndQuadrant US  Baltimore, MD

PostgreSQL Training, Services and Support

greg@2ndQuadrant.com   www.2ndQuadrant.us