Thread: Slow Query- Bad Row Estimate

Slow Query- Bad Row Estimate

From
"Ozer, Pam"
Date:

Unfortunately I have not received a response on this question.  Is more information needed?  Does anyone have any ideas why the estimates may be bad?  Or what I might be able to do to speed this up?

 

thanks

 

From: Ozer, Pam
Sent: Tuesday, October 26, 2010 4:27 PM
To: 'pgsql-performance@postgresql.org'
Subject: Slow Query- Bad Row Estimate

 

I have the following query:

 

select distinct Region.RegionShort as RegionShort

,County.County as County

from Region

join PostalCodeRegionCountyCity on (PostalCodeRegionCountyCity.RegionId=Region.RegionId)

join DealerGroupGeoCache on (DealerGroupGeoCache.RegionId=PostalCodeRegionCountyCity.RegionId)

                                and (DealerGroupGeoCache.CountyId=PostalCodeRegionCountyCity.CountyId)

                                and (DealerGroupGeoCache.CityId=PostalCodeRegionCountyCity.CityId)

join County on (PostalCodeRegionCountyCity.CountyId=County.CountyId)

where (DealerGroupGeoCache.DealerGroupId=13) and (PostalCodeRegionCountyCity.RegionId=5)

 

With the following Explain:

 

"HashAggregate  (cost=6743.96..6747.36 rows=34 width=11) (actual time=854.407..854.425 rows=57 loops=1)"

"  ->  Nested Loop  (cost=0.00..6743.28 rows=34 width=11) (actual time=0.062..762.698 rows=163491 loops=1)"

"        ->  Nested Loop  (cost=0.00..6679.19 rows=34 width=11) (actual time=0.053..260.001 rows=163491 loops=1)"

"              ->  Index Scan using region_i00 on region  (cost=0.00..3.36 rows=1 width=5) (actual time=0.009..0.011 rows=1 loops=1)"

"                    Index Cond: (regionid = 5)"

"              ->  Merge Join  (cost=0.00..6672.43 rows=34 width=10) (actual time=0.040..189.654 rows=163491 loops=1)"

"                    Merge Cond: ((postalcoderegioncountycity.countyid = dealergroupgeocache.countyid) AND (postalcoderegioncountycity.cityid = dealergroupgeocache.cityid))"

"                    ->  Index Scan using postalcoderegioncountycity_i06 on postalcoderegioncountycity  (cost=0.00..716.05 rows=2616 width=10) (actual time=0.018..1.591 rows=2615 loops=1)"

"                          Index Cond: (regionid = 5)"

"                    ->  Index Scan using dealergroupgeocache_i01 on dealergroupgeocache  (cost=0.00..5719.56 rows=9055 width=10) (actual time=0.015..87.689 rows=163491 loops=1)"

"                          Index Cond: ((dealergroupgeocache.dealergroupid = 13) AND (dealergroupgeocache.regionid = 5))"

"        ->  Index Scan using county_i00 on county  (cost=0.00..1.77 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=163491)"

"              Index Cond: (county.countyid = dealergroupgeocache.countyid)"

"Total runtime: 854.513 ms"

 

The statistics have been recently updated and it does not change the bad estimates. 

 

The DealerGroupGeoCache Table has 765392 Rows,  And the query returns 57 rows. 

 

I am not at all involved in the way the server is set up so being able to change the settings is not very likely unless it will make a huge difference.

 

Is there any way for me to speed up this query without changing the settings?

 

If not what would you think the changes that would be needed?

 

We are currently running Postgres8.4  with the following settings.

 

shared_buffers = 500MB                                              # min 128kB

effective_cache_size = 1000MB

 

max_connections = 100

temp_buffers = 100MB

work_mem = 100MB

maintenance_work_mem = 500MB

max_files_per_process = 10000

seq_page_cost = 1.0

random_page_cost = 1.1

cpu_tuple_cost = 0.1

cpu_index_tuple_cost = 0.05

cpu_operator_cost = 0.01

default_statistics_target = 1000

autovacuum_max_workers = 1

 

#log_min_messages = DEBUG1

#log_min_duration_statement = 1000

#log_statement = all

#log_temp_files = 128

#log_lock_waits = on

#log_line_prefix = '%m %u %d %h %p %i %c %l %s'

#log_duration = on

#debug_print_plan = on

 

Any help is appreciated,

 

Pam

 

 

 

 

Re: Slow Query- Bad Row Estimate

From
Josh Berkus
Date:
On 10/29/10 1:54 PM, Ozer, Pam wrote:
> "                    ->  Index Scan using dealergroupgeocache_i01 on
> dealergroupgeocache  (cost=0.00..5719.56 rows=9055 width=10) (actual
> time=0.015..87.689 rows=163491 loops=1)"

This appears to be your problem here.

a) when was dealergroupgeocache last ANALYZED?

b) try increasing the stats_target on dealergroupid and regionid, to say
500 and re-analyzing.

--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Slow Query- Bad Row Estimate

From
Tom Lane
Date:
"Ozer, Pam" <pozer@automotive.com> writes:
> Unfortunately I have not received a response on this question.  Is more
> information needed?  Does anyone have any ideas why the estimates may be
> bad?  Or what I might be able to do to speed this up?

The most likely explanation for the bad rowcount estimates is that there
is correlation between the regionid/countyid/cityid columns, only the
planner doesn't know it.  Can you reformulate that data representation
at all, or at least avoid depending on it as a join key?

            regards, tom lane

Re: Slow Query- Bad Row Estimate

From
"Kevin Grittner"
Date:
"Ozer, Pam" <pozer@automotive.com> wrote:

> Is more information needed?

Table layouts of the tables involved (including indexes) would be
interesting.  A description of the machine would be useful,
including OS, CPUs, RAM, and disk system.

I know you said you might have trouble changing the config, but some
of these seem problematic.

> shared_buffers = 500MB
> effective_cache_size = 1000MB
> max_connections = 100
> temp_buffers = 100MB

So you will allow up to 10GB to be tied up in space reserved for
temporary tables, but only expect to cache 1GB of your database?
That hardly seems optimal.

> work_mem = 100MB

That could be another 10GB or more in work memory at any moment, if
each connection was running a query which needed one work_mem
allocation.

> seq_page_cost = 1.0
> random_page_cost = 1.1
> cpu_tuple_cost = 0.1
> cpu_index_tuple_cost = 0.05
> cpu_operator_cost = 0.01

Those settings are OK if the active portion of the database is fully
cached.  Is it?

> default_statistics_target = 1000

If plan times get long with complex queries, you might want to back
that off; otherwise, OK.

> autovacuum_max_workers = 1

That seems like a bad idea.  Allowing multiple workers helps reduce
bloat and improve statistics.  If autovacuum is affecting
performance, you would be better off tweaking the autovacuum cost
limits.

-Kevin

Re: Slow Query- Bad Row Estimate

From
"Ozer, Pam"
Date:
I am not sure what you mean by reformulate the data representation.  Do
you mean do I have to join on all three columns?

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, October 29, 2010 2:18 PM
To: Ozer, Pam
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow Query- Bad Row Estimate

"Ozer, Pam" <pozer@automotive.com> writes:
> Unfortunately I have not received a response on this question.  Is
more
> information needed?  Does anyone have any ideas why the estimates may
be
> bad?  Or what I might be able to do to speed this up?

The most likely explanation for the bad rowcount estimates is that there
is correlation between the regionid/countyid/cityid columns, only the
planner doesn't know it.  Can you reformulate that data representation
at all, or at least avoid depending on it as a join key?

            regards, tom lane

Re: Slow Query- Bad Row Estimate

From
"Ozer, Pam"
Date:
I had just analyzed the dealergroupgeochache table. Wow.  Thank you. That did the trick. Can you give me an explanation
ofthe default_stats work?  I don't think I completely understand what it means when you set it to 500 instead of 1000?
 

thanks
-----Original Message-----
From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus
Sent: Friday, October 29, 2010 2:10 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow Query- Bad Row Estimate

On 10/29/10 1:54 PM, Ozer, Pam wrote:
> "                    ->  Index Scan using dealergroupgeocache_i01 on
> dealergroupgeocache  (cost=0.00..5719.56 rows=9055 width=10) (actual
> time=0.015..87.689 rows=163491 loops=1)"

This appears to be your problem here.

a) when was dealergroupgeocache last ANALYZED?

b) try increasing the stats_target on dealergroupid and regionid, to say
500 and re-analyzing.

-- 
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Slow Query- Bad Row Estimate

From
Tom Lane
Date:
"Ozer, Pam" <pozer@automotive.com> writes:
> I am not sure what you mean by reformulate the data representation.  Do
> you mean do I have to join on all three columns?

No, I was wondering if you could change things so that you join on just
one column, instead of two that each tell part of the truth.

BTW, did you check your current statistics target?  If it's small
then raising it might possibly fix the problem by itself.

            regards, tom lane

Re: Slow Query- Bad Row Estimate

From
Josh Berkus
Date:
On 10/29/10 2:47 PM, Ozer, Pam wrote:
> I had just analyzed the dealergroupgeochache table. Wow.  Thank you. That did the trick. Can you give me an
explanationof the default_stats work?  I don't think I completely understand what it means when you set it to 500
insteadof 1000? 

You're already at 1000?


--
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Slow Query- Bad Row Estimate

From
"Ozer, Pam"
Date:
Yes.  The default statistics target was at 1000.  So that would be what the column was using correct?

-----Original Message-----
From: Josh Berkus [mailto:josh@agliodbs.com] 
Sent: Friday, October 29, 2010 2:55 PM
To: Ozer, Pam
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow Query- Bad Row Estimate

On 10/29/10 2:47 PM, Ozer, Pam wrote:
> I had just analyzed the dealergroupgeochache table. Wow.  Thank you. That did the trick. Can you give me an
explanationof the default_stats work?  I don't think I completely understand what it means when you set it to 500
insteadof 1000?
 

You're already at 1000?


-- 
                                  -- Josh Berkus
                                     PostgreSQL Experts Inc.
                                     http://www.pgexperts.com

Re: Slow Query- Bad Row Estimate

From
Tom Lane
Date:
"Ozer, Pam" <pozer@automotive.com> writes:
> Yes.  The default statistics target was at 1000.  So that would be what the column was using correct?

But you evidently didn't have stats.  Perhaps you have autovacuum turned
off?  What PG version is this anyway?

            regards, tom lane

Re: Slow Query- Bad Row Estimate

From
"Ozer, Pam"
Date:
Its 8.4.  On the column stats_target=-1 before I changed it.  AutoVacuum
is set to on.  I actually did a full analyze of the database and then
ran it again.  So what am I missing?

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Friday, October 29, 2010 3:03 PM
To: Ozer, Pam
Cc: Josh Berkus; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Slow Query- Bad Row Estimate

"Ozer, Pam" <pozer@automotive.com> writes:
> Yes.  The default statistics target was at 1000.  So that would be
what the column was using correct?

But you evidently didn't have stats.  Perhaps you have autovacuum turned
off?  What PG version is this anyway?

            regards, tom lane