Re: Slow Query - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: Slow Query
Date
Msg-id 4C7782B30200002500034D0C@gw.wicourts.gov
Whole thread Raw
In response to Slow Query  ("Ozer, Pam" <pozer@automotive.com>)
List pgsql-performance
"Ozer, Pam" <pozer@automotive.com> wrote:
> I am new to Postgres and I am trying to understand the Explain
> Analyze so I can tune the following query.

Fair enough.  If you wanted advice from others on how to tune the
query, you would need to provide additional information, as
described here:

http://wiki.postgresql.org/wiki/SlowQueryQuestions

> It looks like the cost is highest in the Hash Join on Postalcode.
> Am I reading this correctly.?

No.

Here's how I read it:

It's scanning the datasetgroupyearmakemodel_i04 index for
datasetgroupid = 3, and finding 5377 index entries in 1.775 ms.
Then it read the related tuples (in the order they exist in the
table's heap space).  Added to the index scan, that brings us to
17.8 ms.

For each of the preceding rows, it uses the vehicleused_i10 index to
find matching vehicleused rows.  On average it finds 71 rows in
1.775 ms, but it does this 5377 times, which brings us to 382528
rows in 9876.586 ms.

Independently of the above, it uses the
postalcoderegioncountycity_i05 index to find
postalcoderegioncountycity rows where regionid = 36, and puts them
into a hash table.  It takes 2.738 ms to hash the 1435 matching
rows.  Probing this hash for each of the 382528 rows leaves us with
10425 rows, and brings the run time to 10639.742 ms.

Independently of the above, the vehiclemake table is sequentially
scanned to create a hash table with 261 rows.  That takes 0.307 ms.
Probing that for each of the 10425 rows doesn't eliminate anything
and bring the run time to 10658.291 ms.

Since DISTINCT was specified, the results of the preceding are fed
into a hash table, leaving 42 distinct values.  Run time has now
reached 10675.728 ms.

These 42 rows are sorted, based on the ORDER BY clause, bringing us
to 10675.868 ms.

The output of the sort goes through a filter which will cut off
output after 500000 rows, per the LIMIT clause, bringing run time to
10675.892 ms.

> I do have indexes on the lower(postalcode)
> in both tables.  Why wouldn't be using the index?

Either it doesn't think it can use that index for some reason (e.g.,
one column is char(n) and the other is varchar(n)), or it thinks
that using that index will be slower.  The latter could be due to
bad statistics or improper configuration.  Without more information,
it's hard to guess why it thinks that or whether it's right.

I suggest you read the SlowQueryQuestions link and post with more
information.

I will offer the observation that the numbers above suggest a fair
amount of the data for this query came from cache, and unless you
have tuned your database's costing factors to allow it to expect
that, it's not surprising that it makes bad choices.  Just as an
experiment, try running these statements in a connection before your
query on that same connection -- if they work out well, it might
make sense to make similar changes to the postgresql.conf file:

set effective_cache_size = '3GB'; -- or 1GB less than total RAM
set seq_page_cost = '0.1';
set random_page_cost = '0.1';

-Kevin

pgsql-performance by date:

Previous
From: Willy-Bas Loos
Date:
Subject: Re: turn off caching for performance test
Next
From: Greg Smith
Date:
Subject: Re: Performance on new 64bit server compared to my 32bit desktop