Re: Query-Planer from 6seconds TO DAYS - Mailing list pgsql-performance

From Jeff Janes
Subject Re: Query-Planer from 6seconds TO DAYS
Date
Msg-id CAMkU=1yeAAbeK9mo3nwFRFqpV+EeL1w3a6KDfqrtbq9SS0fQDg@mail.gmail.com
Whole thread Raw
In response to Re: Query-Planer from 6seconds TO DAYS  (Böckler Andreas <andy@boeckler.org>)
Responses Re: Query-Planer from 6seconds TO DAYS  (Böckler Andreas <andy@boeckler.org>)
List pgsql-performance
On Wed, Oct 24, 2012 at 11:51 AM, Böckler Andreas <andy@boeckler.org> wrote:
>>
>> Was there more to the plan that you snipped?  If not, why isn't it
>> checking all the other partitions?
>
> Your right. It's checking all partitions!. So the constraint exclusion doesn't kick in.
> This can be fixed with
> SELECT
>         m.machine_id, s.timestamp, s.errorcode
> FROM
>         events m
>         INNER JOIN spsdata as s ON (m.machine_id=s.machine_id AND s.timestamp BETWEEN m.timestamp - interval '30
seconds'AND m.timestamp) 
> WHERE
>         m.code IN (2024)
>         AND m.timestamp  BETWEEN '2012-08-01' AND '2012-08-29'
>         AND s.timestamp  BETWEEN '2012-08-01' AND '2012-08-29'
>         AND s.errorcode in ('2024');

Even checking all the partitions it seemed to be pretty fast (78 ms).
Is it worth adding all of that spinach (which could easily get out of
date) just to improve a query that is already fast?



>
> It doesn't take hours to end, but it's not the performance gain you would expect.
>
> I'v changed the query to one partition spsdata_2012m08  and attached the slow and fast cases with EXPLAIN ANALYZE.
>
> The difference is one day in the WHERE-Clause
> 290.581 ms VS  687887.674 ms !
> Thats 2372 times slower.

From the fast case:

         ->  Bitmap Index Scan on spsdata_2012m08_machine_id_key
(cost=0.00..2338.28 rows=56026 width=0) (actual time=0.262..0.262
rows=6 loops=186)
               Index Cond: ((s.machine_id = m.machine_id) AND
(s."timestamp" > (m."timestamp" - '00:00:30'::interval)) AND
(s."timestamp" <= m."timestamp"))

The difference in predicted rows to actual rows, 56026 to 6, is pretty
impressive.  That is why the cost of the fast method is vastly
overestimated, and making it just slightly bigger yet pushes it over
the edge to looking more expensive than the slower sequential scan.
It does seem to be the case of the range selectivity not being
estimate correctly.

> How can i force the fast query plan in a select?

I'd probably punt and do it in the application code.  Do the select on
the event table, then loop over the results issues the queries on the
spsdata table.  That way the range endpoints would be constants rather
than coming from joins, and the planner should do a better job.

Can you load the data into 9.2 and see if it does better?  (I'm not
optimistic that it will be.)


> I've played with seq_page_cost and enable_seqscan already, but you have to know the right values before SELECT to get
goodresults ;) 

Not sure what you mean here.  If you change the settings just for the
query, it should be safe because when the query is already fast it is
not using the seq scan, so discouraging it from using one even further
is not going to do any harm.

Or do you mean you have lots of queries which are slow other than the
one shown, and you can't track all of them down?

Cheers,

Jeff


pgsql-performance by date:

Previous
From: Claudio Freire
Date:
Subject: Re: How to upgrade from 9.1 to 9.2 with replication?
Next
From: "Kevin Grittner"
Date:
Subject: Re: Query-Planer from 6seconds TO DAYS