Thread: slow plan on join when adding where clause

slow plan on join when adding where clause

From
"L. Loewe"
Date:
Hi all


I have a fairly large table ("main" ~50M rows) containing a timestamp
(indexed),
  a sensor_id (indexed) and some sensor data ranging over a few years.

The second table ("events" <500 rows) contains a set of interesting events
with
an events.id and two timestamps: events.start and events.stop plus some
additional data.

Now I want to join these tables to get the sensor data for those events.

The interval between start and stop is quite short for each event (usually
a
couple of minutes) so that there aren't too many rows from table "main"
matching this criteria (~1K) for each event:

SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
events.stop;

EXPLAIN ANALYZE gives

Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
                  (actual time=0.038..42.314 rows=69209 loops=1)
    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
                 (actual time=0.006..0.025 rows=124 loops=1)
    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
rows=4229880 width=40)
                 (actual time=0.016..0.178 rows=558 loops=124)
          Index Cond: ((main.ti > events.start) AND (main.ti < events.stop))
Total runtime: 47.682 ms

So far so good, however if I add a sensor_id constraint the planner
chooses a
different approach:


SELECT  * FROM  events, main WHERE  main.ti > events.start and
                                     main.ti < events.stop and sensor_id=1;

Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
                 (actual time=23427.599..23886.276 rows=772 loops=1)
    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
width=40)
                 (actual time=3771.719..9508.728 rows=490984 loops=1)
          Recheck Cond: (sensor_id= 1)
          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
rows=348087 width=0)
                 (actual time=3769.075..3769.075 rows=491102 loops=1)
                Index Cond: (sensor_id= 1)
    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
                 (actual time=0.000..0.010 rows=124 loops=490984)
          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
                (actual time=0.005..0.021 rows=124 loops=1)
Total runtime: 23886.494 ms


Apparently the planner assumes that sensor_id=1 will return fewer rows than
the time constraint while the opposite is true:
sensor_id=1 -> ~ 500K , time window -> ~ 1K.

Is there a way to hint the planner to use plan 1 even with the sensor_id=1
clause or am I doing something fundamentally wrong here?

Thanks

Re: slow plan on join when adding where clause

From
Mladen Gogala
Date:
There is an old trick which can help you here. Try doing this:
SELECT  * FROM  events, main WHERE  main.ti > events.start and
                                    main.ti < events.stop and sensor_id+0=1;

That will stop planner from merging two indexes and using bitmap.

L. Loewe wrote:
> Hi all
>
>
> I have a fairly large table ("main" ~50M rows) containing a timestamp
> (indexed),
>  a sensor_id (indexed) and some sensor data ranging over a few years.
>
> The second table ("events" <500 rows) contains a set of interesting
> events with
> an events.id and two timestamps: events.start and events.stop plus some
> additional data.
>
> Now I want to join these tables to get the sensor data for those events.
>
> The interval between start and stop is quite short for each event
> (usually a
> couple of minutes) so that there aren't too many rows from table "main"
> matching this criteria (~1K) for each event:
>
> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
> events.stop;
>
> EXPLAIN ANALYZE gives
>
> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
>                  (actual time=0.038..42.314 rows=69209 loops=1)
>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>                 (actual time=0.006..0.025 rows=124 loops=1)
>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
> rows=4229880 width=40)
>                 (actual time=0.016..0.178 rows=558 loops=124)
>          Index Cond: ((main.ti > events.start) AND (main.ti <
> events.stop))
> Total runtime: 47.682 ms
>
> So far so good, however if I add a sensor_id constraint the planner
> chooses a
> different approach:
>
>
> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>                                     main.ti < events.stop and
> sensor_id=1;
>
> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
>                 (actual time=23427.599..23886.276 rows=772 loops=1)
>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
> width=40)
>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
>          Recheck Cond: (sensor_id= 1)
>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
> rows=348087 width=0)
>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
>                Index Cond: (sensor_id= 1)
>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>                 (actual time=0.000..0.010 rows=124 loops=490984)
>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>                (actual time=0.005..0.021 rows=124 loops=1)
> Total runtime: 23886.494 ms
>
>
> Apparently the planner assumes that sensor_id=1 will return fewer rows
> than
> the time constraint while the opposite is true:
> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>
> Is there a way to hint the planner to use plan 1 even with the
> sensor_id=1
> clause or am I doing something fundamentally wrong here?
>
> Thanks
>


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: slow plan on join when adding where clause

From
Jasen Betts
Date:
On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote:
> Hi all

> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>                                      main.ti < events.stop and sensor_id=1;

create index main_ti_sid on main(sensor_id,ti);


Re: slow plan on join when adding where clause

From
Mladen Gogala
Date:
Jasen, he does have the index, that precisely is a problem. According to
his statement, full table scan is faster than an index search.

Jasen Betts wrote:
> On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote:
>
>> Hi all
>>
>
>
>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>                                      main.ti < events.stop and sensor_id=1;
>>
>
> create index main_ti_sid on main(sensor_id,ti);
>
>
>

--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: slow plan on join when adding where clause

From
"L. Loewe"
Date:
Thanks.

That did change the plan a bit but it's still not doing it the way
it does it when searching all sensors - and it's still a lot slower.


  Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
              (actual time=30983.301..31827.299 rows=772 loops=1)
    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
    ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
              (actual time=54.095..18136.153 rows=492150 loops=1)
          Filter: ((sensor_id + 0) = 1)
    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
              (actual time=0.000..0.009 rows=124 loops=492150)
          ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
             (actual time=0.010..0.021 rows=124 loops=1)
  Total runtime: 31827.435 ms

I'm not sure the index is the problem - seems more like the way it
assembles the data.
If I read this right the planner takes the rows matching sensor_id=1
first and then joins the result with the time slices of events.

This'd be probably a good idea if events had a lot of rows or the time
slices were large.
But with the data at hand it is a lot faster to take the rows
of events first and then match each one to main.ti (which is what
the planner does without the where clause).

It's understandable that the planner cannot properly figure out
how many rows these start-stop slices will select, however since
it appears to make an assumption I was looking for a way to
tell it which value to assume...

Regards




On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com>
wrote:

> There is an old trick which can help you here. Try doing this:
> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>                                     main.ti < events.stop and
> sensor_id+0=1;
>
> That will stop planner from merging two indexes and using bitmap.
>
> L. Loewe wrote:
>> Hi all
>>
>>
>> I have a fairly large table ("main" ~50M rows) containing a timestamp
>> (indexed),
>>  a sensor_id (indexed) and some sensor data ranging over a few years.
>>
>> The second table ("events" <500 rows) contains a set of interesting
>> events with
>> an events.id and two timestamps: events.start and events.stop plus some
>> additional data.
>>
>> Now I want to join these tables to get the sensor data for those events.
>>
>> The interval between start and stop is quite short for each event
>> (usually a
>> couple of minutes) so that there aren't too many rows from table "main"
>> matching this criteria (~1K) for each event:
>>
>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
>> events.stop;
>>
>> EXPLAIN ANALYZE gives
>>
>> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
>>                  (actual time=0.038..42.314 rows=69209 loops=1)
>>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>                 (actual time=0.006..0.025 rows=124 loops=1)
>>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
>> rows=4229880 width=40)
>>                 (actual time=0.016..0.178 rows=558 loops=124)
>>          Index Cond: ((main.ti > events.start) AND (main.ti <
>> events.stop))
>> Total runtime: 47.682 ms
>>
>> So far so good, however if I add a sensor_id constraint the planner
>> chooses a
>> different approach:
>>
>>
>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>                                     main.ti < events.stop and
>> sensor_id=1;
>>
>> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
>>                 (actual time=23427.599..23886.276 rows=772 loops=1)
>>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
>> width=40)
>>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
>>          Recheck Cond: (sensor_id= 1)
>>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
>> rows=348087 width=0)
>>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
>>                Index Cond: (sensor_id= 1)
>>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>>                 (actual time=0.000..0.010 rows=124 loops=490984)
>>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>                (actual time=0.005..0.021 rows=124 loops=1)
>> Total runtime: 23886.494 ms
>>
>>
>> Apparently the planner assumes that sensor_id=1 will return fewer rows
>> than
>> the time constraint while the opposite is true:
>> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>>
>> Is there a way to hint the planner to use plan 1 even with the
>> sensor_id=1
>> clause or am I doing something fundamentally wrong here?
>>
>> Thanks
>>
>

Re: slow plan on join when adding where clause

From
syan tan
Date:
Why doesn't the planner do a pruning lower cost ratio threshold
alternative join implementation search ? There's only seq scan, and
3 types of index scans ( hash, btree, and bitmap) , so how hard would
it be for a planner to look at the alternatives e.g. 3 joins ,
4^3  choices of joins , compare 64 choices of plans for less complicated
queries ?
Also, has anyone ever tried to put some sort of functional
dependencies / normalization tool in contrib of postgresql, or is it
enough that postgresql encourages writing hacked up queries as well as
anomaly fixing triggers capability , making it feature complete for the
market ?
On Thu, 2010-04-01 at 22:11 -0600, L. Loewe wrote:
> Thanks.
>
> That did change the plan a bit but it's still not doing it the way
> it does it when searching all sensors - and it's still a lot slower.
>
>
>   Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
>               (actual time=30983.301..31827.299 rows=772 loops=1)
>     Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>     ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
>               (actual time=54.095..18136.153 rows=492150 loops=1)
>           Filter: ((sensor_id + 0) = 1)
>     ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>               (actual time=0.000..0.009 rows=124 loops=492150)
>           ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
>              (actual time=0.010..0.021 rows=124 loops=1)
>   Total runtime: 31827.435 ms
>
> I'm not sure the index is the problem - seems more like the way it
> assembles the data.
> If I read this right the planner takes the rows matching sensor_id=1
> first and then joins the result with the time slices of events.
>
> This'd be probably a good idea if events had a lot of rows or the time
> slices were large.
> But with the data at hand it is a lot faster to take the rows
> of events first and then match each one to main.ti (which is what
> the planner does without the where clause).
>
> It's understandable that the planner cannot properly figure out
> how many rows these start-stop slices will select, however since
> it appears to make an assumption I was looking for a way to
> tell it which value to assume...
>
> Regards
>
>
>
>
> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com>
> wrote:
>
> > There is an old trick which can help you here. Try doing this:
> > SELECT  * FROM  events, main WHERE  main.ti > events.start and
> >                                     main.ti < events.stop and
> > sensor_id+0=1;
> >
> > That will stop planner from merging two indexes and using bitmap.
> >
> > L. Loewe wrote:
> >> Hi all
> >>
> >>
> >> I have a fairly large table ("main" ~50M rows) containing a timestamp
> >> (indexed),
> >>  a sensor_id (indexed) and some sensor data ranging over a few years.
> >>
> >> The second table ("events" <500 rows) contains a set of interesting
> >> events with
> >> an events.id and two timestamps: events.start and events.stop plus some
> >> additional data.
> >>
> >> Now I want to join these tables to get the sensor data for those events.
> >>
> >> The interval between start and stop is quite short for each event
> >> (usually a
> >> couple of minutes) so that there aren't too many rows from table "main"
> >> matching this criteria (~1K) for each event:
> >>
> >> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
> >> events.stop;
> >>
> >> EXPLAIN ANALYZE gives
> >>
> >> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
> >>                  (actual time=0.038..42.314 rows=69209 loops=1)
> >>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
> >>                 (actual time=0.006..0.025 rows=124 loops=1)
> >>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
> >> rows=4229880 width=40)
> >>                 (actual time=0.016..0.178 rows=558 loops=124)
> >>          Index Cond: ((main.ti > events.start) AND (main.ti <
> >> events.stop))
> >> Total runtime: 47.682 ms
> >>
> >> So far so good, however if I add a sensor_id constraint the planner
> >> chooses a
> >> different approach:
> >>
> >>
> >> SELECT  * FROM  events, main WHERE  main.ti > events.start and
> >>                                     main.ti < events.stop and
> >> sensor_id=1;
> >>
> >> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
> >>                 (actual time=23427.599..23886.276 rows=772 loops=1)
> >>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
> >>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
> >> width=40)
> >>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
> >>          Recheck Cond: (sensor_id= 1)
> >>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
> >> rows=348087 width=0)
> >>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
> >>                Index Cond: (sensor_id= 1)
> >>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
> >>                 (actual time=0.000..0.010 rows=124 loops=490984)
> >>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
> >>                (actual time=0.005..0.021 rows=124 loops=1)
> >> Total runtime: 23886.494 ms
> >>
> >>
> >> Apparently the planner assumes that sensor_id=1 will return fewer rows
> >> than
> >> the time constraint while the opposite is true:
> >> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
> >>
> >> Is there a way to hint the planner to use plan 1 even with the
> >> sensor_id=1
> >> clause or am I doing something fundamentally wrong here?
> >>
> >> Thanks
> >>
> >
>



Re: slow plan on join when adding where clause

From
Mladen Gogala
Date:
How about creating index on both columns and drop the separate index?
The composite index on ti and sensor_id could still be used for the
original query but would probably give you a killer performance when the
columns are used together.

L. Loewe wrote:
> Thanks.
>
> That did change the plan a bit but it's still not doing it the way
> it does it when searching all sensors - and it's still a lot slower.
>
>
>  Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
>              (actual time=30983.301..31827.299 rows=772 loops=1)
>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>    ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
>              (actual time=54.095..18136.153 rows=492150 loops=1)
>          Filter: ((sensor_id + 0) = 1)
>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>              (actual time=0.000..0.009 rows=124 loops=492150)
>          ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
>              (actual time=0.010..0.021 rows=124 loops=1)
>  Total runtime: 31827.435 ms
>
> I'm not sure the index is the problem - seems more like the way it
> assembles the data.
> If I read this right the planner takes the rows matching sensor_id=1
> first and then joins the result with the time slices of events.
>
> This'd be probably a good idea if events had a lot of rows or the time
> slices were large.
> But with the data at hand it is a lot faster to take the rows
> of events first and then match each one to main.ti (which is what
> the planner does without the where clause).
>
> It's understandable that the planner cannot properly figure out
> how many rows these start-stop slices will select, however since
> it appears to make an assumption I was looking for a way to
> tell it which value to assume...
>
> Regards
>
>
>
>
> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala
> <mgogala@vmsinfo.com> wrote:
>
>> There is an old trick which can help you here. Try doing this:
>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>                                     main.ti < events.stop and
>> sensor_id+0=1;
>>
>> That will stop planner from merging two indexes and using bitmap.
>>
>> L. Loewe wrote:
>>> Hi all
>>>
>>>
>>> I have a fairly large table ("main" ~50M rows) containing a
>>> timestamp (indexed),
>>>  a sensor_id (indexed) and some sensor data ranging over a few years.
>>>
>>> The second table ("events" <500 rows) contains a set of interesting
>>> events with
>>> an events.id and two timestamps: events.start and events.stop plus some
>>> additional data.
>>>
>>> Now I want to join these tables to get the sensor data for those
>>> events.
>>>
>>> The interval between start and stop is quite short for each event
>>> (usually a
>>> couple of minutes) so that there aren't too many rows from table "main"
>>> matching this criteria (~1K) for each event:
>>>
>>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti
>>> < events.stop;
>>>
>>> EXPLAIN ANALYZE gives
>>>
>>> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
>>>                  (actual time=0.038..42.314 rows=69209 loops=1)
>>>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>>                 (actual time=0.006..0.025 rows=124 loops=1)
>>>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
>>> rows=4229880 width=40)
>>>                 (actual time=0.016..0.178 rows=558 loops=124)
>>>          Index Cond: ((main.ti > events.start) AND (main.ti <
>>> events.stop))
>>> Total runtime: 47.682 ms
>>>
>>> So far so good, however if I add a sensor_id constraint the planner
>>> chooses a
>>> different approach:
>>>
>>>
>>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>>                                     main.ti < events.stop and
>>> sensor_id=1;
>>>
>>> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
>>>                 (actual time=23427.599..23886.276 rows=772 loops=1)
>>>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>>>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
>>> width=40)
>>>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
>>>          Recheck Cond: (sensor_id= 1)
>>>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
>>> rows=348087 width=0)
>>>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
>>>                Index Cond: (sensor_id= 1)
>>>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>>>                 (actual time=0.000..0.010 rows=124 loops=490984)
>>>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>>                (actual time=0.005..0.021 rows=124 loops=1)
>>> Total runtime: 23886.494 ms
>>>
>>>
>>> Apparently the planner assumes that sensor_id=1 will return fewer
>>> rows than
>>> the time constraint while the opposite is true:
>>> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>>>
>>> Is there a way to hint the planner to use plan 1 even with the
>>> sensor_id=1
>>> clause or am I doing something fundamentally wrong here?
>>>
>>> Thanks
>>>
>>
>


--

Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
http://www.vmsinfo.com
The Leader in Integrated Media Intelligence Solutions




Re: slow plan on join when adding where clause

From
"L. Loewe"
Date:
Nope. Still slow with sensor_id = 1.
The problem appears to be unrelated to the indexes but the the planners
assumption that
sensor_id=1 will return about 500K rows, which is correct and
start<ti<stop will return 5000K rows, which is way off of the actual 800.

My current solution is to create a (temporary) table with all sensors and
use it for sensor based queries - it works but requires some maintenance
which I wanted to avoid.

regards

On Thu, 08 Apr 2010 10:40:14 -0600, Mladen Gogala <mgogala@vmsinfo.com>
wrote:

> How about creating index on both columns and drop the separate index?
> The composite index on ti and sensor_id could still be used for the
> original query but would probably give you a killer performance when the
> columns are used together.
>
> L. Loewe wrote:
>> Thanks.
>>
>> That did change the plan a bit but it's still not doing it the way
>> it does it when searching all sensors - and it's still a lot slower.
>>
>>
>>  Nested Loop  (cost=2.36..1482129.06 rows=2629241 width=68)
>>              (actual time=30983.301..31827.299 rows=772 loops=1)
>>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>>    ->  Seq Scan on main  (cost=0.00..890547.50 rows=190832 width=40)
>>              (actual time=54.095..18136.153 rows=492150 loops=1)
>>          Filter: ((sensor_id + 0) = 1)
>>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>>              (actual time=0.000..0.009 rows=124 loops=492150)
>>          ->  Seq Scan on events  (cost=0.00..2.24 rows=124 width=28)
>>              (actual time=0.010..0.021 rows=124 loops=1)
>>  Total runtime: 31827.435 ms
>>
>> I'm not sure the index is the problem - seems more like the way it
>> assembles the data.
>> If I read this right the planner takes the rows matching sensor_id=1
>> first and then joins the result with the time slices of events.
>>
>> This'd be probably a good idea if events had a lot of rows or the time
>> slices were large.
>> But with the data at hand it is a lot faster to take the rows
>> of events first and then match each one to main.ti (which is what
>> the planner does without the where clause).
>>
>> It's understandable that the planner cannot properly figure out
>> how many rows these start-stop slices will select, however since
>> it appears to make an assumption I was looking for a way to
>> tell it which value to assume...
>>
>> Regards
>>
>>
>>
>>
>> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com>
>> wrote:
>>
>>> There is an old trick which can help you here. Try doing this:
>>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>>                                     main.ti < events.stop and
>>> sensor_id+0=1;
>>>
>>> That will stop planner from merging two indexes and using bitmap.
>>>
>>> L. Loewe wrote:
>>>> Hi all
>>>>
>>>>
>>>> I have a fairly large table ("main" ~50M rows) containing a timestamp
>>>> (indexed),
>>>>  a sensor_id (indexed) and some sensor data ranging over a few years.
>>>>
>>>> The second table ("events" <500 rows) contains a set of interesting
>>>> events with
>>>> an events.id and two timestamps: events.start and events.stop plus
>>>> some
>>>> additional data.
>>>>
>>>> Now I want to join these tables to get the sensor data for those
>>>> events.
>>>>
>>>> The interval between start and stop is quite short for each event
>>>> (usually a
>>>> couple of minutes) so that there aren't too many rows from table
>>>> "main"
>>>> matching this criteria (~1K) for each event:
>>>>
>>>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti <
>>>> events.stop;
>>>>
>>>> EXPLAIN ANALYZE gives
>>>>
>>>> Nested Loop  (cost=0.00..27621542.27 rows=524505120 width=68)
>>>>                  (actual time=0.038..42.314 rows=69209 loops=1)
>>>>    ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>>>                 (actual time=0.006..0.025 rows=124 loops=1)
>>>>    ->  Index Scan using idx_main_ti on main  (cost=0.00..159306.16
>>>> rows=4229880 width=40)
>>>>                 (actual time=0.016..0.178 rows=558 loops=124)
>>>>          Index Cond: ((main.ti > events.start) AND (main.ti <
>>>> events.stop))
>>>> Total runtime: 47.682 ms
>>>>
>>>> So far so good, however if I add a sensor_id constraint the planner
>>>> chooses a
>>>> different approach:
>>>>
>>>>
>>>> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>>>>                                     main.ti < events.stop and
>>>> sensor_id=1;
>>>>
>>>> Nested Loop  (cost=7309.32..1422246.30 rows=4795865 width=68)
>>>>                 (actual time=23427.599..23886.276 rows=772 loops=1)
>>>>    Join Filter: ((main.ti > events.start) AND (main.ti < events.stop))
>>>>    ->  Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087
>>>> width=40)
>>>>                 (actual time=3771.719..9508.728 rows=490984 loops=1)
>>>>          Recheck Cond: (sensor_id= 1)
>>>>          ->  Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94
>>>> rows=348087 width=0)
>>>>                 (actual time=3769.075..3769.075 rows=491102 loops=1)
>>>>                Index Cond: (sensor_id= 1)
>>>>    ->  Materialize  (cost=2.36..3.60 rows=124 width=28)
>>>>                 (actual time=0.000..0.010 rows=124 loops=490984)
>>>>          ->  Seq Scan on events (cost=0.00..2.24 rows=124 width=28)
>>>>                (actual time=0.005..0.021 rows=124 loops=1)
>>>> Total runtime: 23886.494 ms
>>>>
>>>>
>>>> Apparently the planner assumes that sensor_id=1 will return fewer
>>>> rows than
>>>> the time constraint while the opposite is true:
>>>> sensor_id=1 -> ~ 500K , time window -> ~ 1K.
>>>>
>>>> Is there a way to hint the planner to use plan 1 even with the
>>>> sensor_id=1
>>>> clause or am I doing something fundamentally wrong here?
>>>>
>>>> Thanks
>>>>
>>>
>>
>
>


--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Re: slow plan on join when adding where clause

From
syan
Date:
Is the index clustered?
On Thu, 2010-04-01 at 14:19 -0400, Mladen Gogala wrote:
> Jasen, he does have the index, that precisely is a problem. According to
> his statement, full table scan is faster than an index search.
>
> Jasen Betts wrote:
> > On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote:
> >
> >> Hi all
> >>
> >
> >
> >> SELECT  * FROM  events, main WHERE  main.ti > events.start and
> >>                                      main.ti < events.stop and sensor_id=1;
> >>
> >
> > create index main_ti_sid on main(sensor_id,ti);
> >
> >
> >
>
> --
>
> Mladen Gogala
> Sr. Oracle DBA
> 1500 Broadway
> New York, NY 10036
> (212) 329-5251
> http://www.vmsinfo.com
> The Leader in Integrated Media Intelligence Solutions
>
>
>
>



Re: slow plan on join when adding where clause

From
"L. Loewe"
Date:
No it is not, but that's not the issue I think - see my other posts.




On Mon, 12 Apr 2010 19:19:34 -0600, syan <kittylitter@people.net.au> wrote:

> Is the index clustered?

> On Thu, 2010-04-01 at 14:19 -0400, Mladen Gogala wrote:
>> Jasen, he does have the index, that precisely is a problem. According to
>> his statement, full table scan is faster than an index search.
>>
>> Jasen Betts wrote:
>> > On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote:
>> >
>> >> Hi all
>> >>
>> >
>> >
>> >> SELECT  * FROM  events, main WHERE  main.ti > events.start and
>> >>                                      main.ti < events.stop and
>> sensor_id=1;
>> >>
>> >
>> > create index main_ti_sid on main(sensor_id,ti);
>> >
>> >