Re: INDEX Performance Issue - Mailing list pgsql-performance

From Jeff Janes
Subject Re: INDEX Performance Issue
Date
Msg-id CAMkU=1zbqMMxpJ+nkE=WyCiGiYscDikv10T45O9HnWrnss95=A@mail.gmail.com
Whole thread Raw
In response to Re: INDEX Performance Issue  (Mark Davidson <mark@4each.co.uk>)
List pgsql-performance
On Mon, Apr 8, 2013 at 10:02 AM, Mark Davidson <mark@4each.co.uk> wrote:
Been trying to progress with this today. Decided to setup the database on my local machine to try a few things and I'm getting much more sensible results and a totally different query plan http://explain.depesz.com/s/KGd in this case the query took about a minute but does sometimes take around 80 seconds.

The config is exactly the same between the two database. The databases them selves are identical with all indexes the same on the tables.

The server has an 2 x Intel Xeon E5420 running at 2.5Ghz each, 16GB RAM and the database is just on a SATA HDD which is a Western Digital WD5000AAKS.
My desktop has a single i5-3570K running at 3.4Ghz, 16GB RAM and the database is running on a SATA HDD which is a Western Digital WD1002FAEX-0

Could anyone offer any reasoning as to why the plan would be so different across the two machines?


The estimated costs of the two plans are very close to each other, so it doesn't take much to cause a switch to happen.

Is the test instance a binary copy of the production one (i.e. created from a base backup) or is it only a logical copy (e.g. pg_dump followed by a restore)?  A logical copy will probably be more compact than the original and so will have different slightly estimates.

You could check pg_class for relpages on all relevant tables and indexes on both servers.

Also, since ANALYZE uses a random sampling for large tables, the estimates can move around just by chance. If you repeat the query several times with an ANALYZE in between, does the plan change, or if not how much does the estimated cost change within the plan?  You could check pg_stats for the relevant tables and columns between the two servers to see how similar they are.

The estimated cost of a hash join is very dependent on how frequent the most common value of the hashed column is thought to be.  And the estimate of this number can be very fragile if ANALYZE is based on a small fraction of the table.  Turning up the statistics for those columns might be worthwhile.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Jeff Janes
Date:
Subject: Re: INDEX Performance Issue
Next
From: Anne Rosset
Date:
Subject: Poor performance on an aggregate query