Re: slow plan on join when adding where clause - Mailing list pgsql-novice

From L. Loewe
Subject Re: slow plan on join when adding where clause
Date
Msg-id op.vairc8l7b4gxma@oakcreek2
Whole thread Raw
In response to slow plan on join when adding where clause  ("L. Loewe" <lloewe@hotmail.com>)
Responses Re: slow plan on join when adding where clause  (syan tan <kittylitter@people.net.au>)
Re: slow plan on join when adding where clause  (Mladen Gogala <mgogala@vmsinfo.com>)
List pgsql-novice
Thanks.

That did change the plan a bit but it's still not doing it the way
it does it when searching all sensors - and it's still a lot slower.


  Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
              (actual time=30983.301..31827.299 rows=772 loops=1)
    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
    ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
              (actual time=54.095..18136.153 rows=492150 loops=1)
          Filter: ((sensor_id + 0) = 1)
    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
              (actual time=0.000..0.009 rows=124 loops=492150)
          ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
             (actual time=0.010..0.021 rows=124 loops=1)
  Total runtime: 31827.435 ms

I'm not sure the index is the problem - seems more like the way it
assembles the data.
If I read this right the planner takes the rows matching sensor_id=1
first and then joins the result with the time slices of events.

This'd be probably a good idea if events had a lot of rows or the time
slices were large.
But with the data at hand it is a lot faster to take the rows
of events first and then match each one to main.ti (which is what
the planner does without the where clause).

It's understandable that the planner cannot properly figure out
how many rows these start-stop slices will select, however since
it appears to make an assumption I was looking for a way to
tell it which value to assume...

Regards




On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com>
wrote:

> There is an old trick which can help you here. Try doing this:
> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>                                     main.ti < events.stop and
> sensor_id+0=1;
>
> That will stop planner from merging two indexes and using bitmap.
>
> L. Loewe wrote:
>> Hi all
>>
>>
>> I have a fairly large table ("main" ~50M rows) containing a timestamp
>> (indexed),
>>  a sensor_id (indexed) and some sensor data ranging over a few years.
>>
>> The second table ("events" <500 rows) contains a set of interesting
>> events with
>> an events.id and two timestamps: events.start and events.stop plus some
>> additional data.
>>
>> Now I want to join these tables to get the sensor data for those events.
>>
>> The interval between start and stop is quite short for each event
>> (usually a
>> couple of minutes) so that there aren't too many rows from table "main"
>> matching this criteria (~1K) for each event:
>>
>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
>> events.stop;
>>
>> EXPLAIN ANALYZE gives
>>
>> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
>>                  (actual time=0.038..42.314 rows=69209 loops=1)
>>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>                 (actual time=0.006..0.025 rows=124 loops=1)
>>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
>> rows=4229880 width=40)
>>                 (actual time=0.016..0.178 rows=558 loops=124)
>>          Index Cond: ((main.ti > events.start) AND (main.ti <
>> events.stop))
>> Total runtime: 47.682 ms
>>
>> So far so good, however if I add a sensor_id constraint the planner
>> chooses a
>> different approach:
>>
>>
>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>                                     main.ti < events.stop and
>> sensor_id=1;
>>
>> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
>>                 (actual time=23427.599..23886.276 rows=772 loops=1)
>>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
>> width=40)
>>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
>>          Recheck Cond: (sensor_id= 1)
>>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
>> rows=348087 width=0)
>>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
>>                Index Cond: (sensor_id= 1)
>>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>>                 (actual time=0.000..0.010 rows=124 loops=490984)
>>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>                (actual time=0.005..0.021 rows=124 loops=1)
>> Total runtime: 23886.494 ms
>>
>>
>> Apparently the planner assumes that sensor_id=1 will return fewer rows
>> than
>> the time constraint while the opposite is true:
>> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>>
>> Is there a way to hint the planner to use plan 1 even with the
>> sensor_id=1
>> clause or am I doing something fundamentally wrong here?
>>
>> Thanks
>>
>

pgsql-novice by date:

Previous
From: "Wright, George"
Date:
Subject: Re: Need Help Enabling Remote Connections
Next
From: EvilJonny
Date:
Subject: Read only postgres server