Re: Slow query with planner row strange estimation

From: phb07
Subject: Re: Slow query with planner row strange estimation
Date: ,
Msg-id: 4C3B7C0D.6070404@apra.asso.fr
(view: Whole thread, Raw)
In response to: Re: Slow query with planner row strange estimation  (Dimitri)
Responses: Re: Slow query with planner row strange estimation  (damien hostin)
Re: Slow query with planner row strange estimation  (Robert Haas)
List: pgsql-performance


Dimitri a écrit :
> It's probably one of the cases when having HINTS in PostgreSQL may be
> very helpful..
>
> SELECT /*+ enable_nestloop=off */ ... FROM ...
>
> will just fix this query without impacting other queries and without
> adding any additional instructions into the application code..
>
> So, why there is a such resistance to implement hints withing SQL
> queries in PG?..
>
> Rgds,
> -Dimitri
>
>
+1.
Another typical case when it would be helpful is with setting the
cursor_tuple_fraction GUC variable for a specific statement, without
being obliged to issue 2 SET statements, one before the SELECT and the
other after.

> On 7/9/10, Robert Haas <> wrote:
>
>> On Fri, Jul 9, 2010 at 6:13 AM, damien hostin <>
>> wrote:
>>
>>>> Have you tried running ANALYZE on the production server?
>>>>
>>>> You might also want to try ALTER TABLE ... SET STATISTICS to a large
>>>> value on some of the join columns involved in the query.
>>>>
>>> Hello,
>>>
>>> Before comparing the test case on the two machines, I run analyse on the
>>> whole and look at pg_stats table to see if change occurs for the columns.
>>> but on the production server the stats never became as good as on the
>>> desktop computer. I set statistic at 10000 on column used by the join, run
>>> analyse which take a 3000000 row sample then look at the stats. The stats
>>> are not as good as on the desktop. Row number is nearly the same but only
>>> 1
>>> or 2 values are found.
>>>
>>> The data are not balanced the same way on the two computer :
>>> - Desktop is 12000 rows with 6000 implicated in the query (50%),
>>> - "Production" (actually a dev/test server) is 6 million rows with 6000
>>> implicated in the query (0,1%).
>>> Columns used in the query are nullable, and in the 5994000 other rows that
>>> are not implicated in the query these columns are null.
>>>
>>> I don't know if the statistic target is a % or a number of value to
>>> obtain,
>>>
>> It's a number of values to obtain.
>>
>>
>>> but event set at max (10000), it didn't managed to collect good stats (for
>>> this particular query).
>>>
>> I think there's a cutoff where it won't collect values unless they
>> occur significantly more often than the average frequency.  I wonder
>> if that might be biting you here: without the actual values in the MCV
>> table, the join selectivity estimates probably aren't too good.
>>
>>
>>> As I don't know what more to do, my conclusion is that the data need to be
>>> better balanced to allow the analyse gather better stats. But if there is
>>> a
>>> way to improve the stats/query with this ugly balanced data, I'm open to
>>> it
>>> !
>>>
>>> I hope that in real production, data will never be loaded this way. If
>>> this
>>> appened we will maybe set enable_nestloop to off, but I don't think it's a
>>> good solution, other query have a chance to get slower.
>>>
>> Yeah, that usually works out poorly.
>>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com
>> The Enterprise Postgres Company
>>
>> --
>> Sent via pgsql-performance mailing list ()
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-performance
>>
>>
Regards.
Philippe Beaudoin.


pgsql-performance by date:

From: phb07
Date:
Subject: Re: Slow query with planner row strange estimation
From: Josh Berkus
Date:
Subject: Re: partition queries hitting all partitions even though check key is specified