Re: Performance and IN clauses - Mailing list pgsql-performance

From Tomas Vondra
Subject Re: Performance and IN clauses
Date
Msg-id 49260A50.9040904@fuzzy.cz
Whole thread Raw
In response to Re: Performance and IN clauses  (Mark Roberts <mailing_lists@pandapocket.com>)
List pgsql-performance
Mark Roberts napsal(a):
> On Tue, 2008-11-18 at 17:38 +0100, tv@fuzzy.cz wrote:
>> I bet there is no 'critical' length - this is just another case of
>> index
>> scan vs. seqscan. The efficiency depends on the size of the table /
>> row,
>> amount of data in the table, variability of the column used in the IN
>> clause, etc.
>>
>> Splitting the query with 1000 items into 10 separate queries, the
>> smaller
>> queries may be faster but the total time consumed may be actually
>> higher.
>> Something like
>>
>> 10 * (time of small query) + (time to combine them) > (time of large
>> query)
>>
>> If the performance of the 'split' solution is actually better than the
>> original query, it just means that the planner does not use index scan
>> when it actually should. That means that either
>>
>> (a) the planner is not smart enough
>> (b) it has not current statistics of the table (run ANALYZE on the
>> table)
>> (c) the statistics are not detailed enough (ALTER TABLE ... SET
>> STATICTICS)
>> (d) the cost variables are not set properly (do not match the hardware
>> -
>> decreate index scan cost / increase seq scan cost)
>>
>> regards
>> Tomas
>
> I know that it's much faster (for us) to run many smaller queries than
> one large query, and I think that it's primarily because of your reason
> a.  Most of our problems come from Pg misunderstanding the results of a
> join and making a bad plan decision.  Batching dramatically reduces the
> liklihood of this.

As I already said - even the smartest planner won't work without correct
input data. Have you tried fixing the points (b), (c) and (d)?

Fixing them might improve the planner performance so that you don't need
the batchning at all.

regards
Tomas

pgsql-performance by date:

Previous
From: PFC
Date:
Subject: Re: Hash join on int takes 8..114 seconds
Next
From: "Andrus"
Date:
Subject: Re: Hash join on int takes 8..114 seconds