Thread: slow plan on join when adding where clause
Hi all I have a fairly large table ("main" ~50M rows) containing a timestamp (indexed), a sensor_id (indexed) and some sensor data ranging over a few years. The second table ("events" <500 rows) contains a set of interesting events with an events.id and two timestamps: events.start and events.stop plus some additional data. Now I want to join these tables to get the sensor data for those events. The interval between start and stop is quite short for each event (usually a couple of minutes) so that there aren't too many rows from table "main" matching this criteria (~1K) for each event: SELECT * FROM events, main WHERE main.ti > events.start and main.ti < events.stop; EXPLAIN ANALYZE gives Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) (actual time=0.038..42.314 rows=69209 loops=1) -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) (actual time=0.006..0.025 rows=124 loops=1) -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 rows=4229880 width=40) (actual time=0.016..0.178 rows=558 loops=124) Index Cond: ((main.ti > events.start) AND (main.ti < events.stop)) Total runtime: 47.682 ms So far so good, however if I add a sensor_id constraint the planner chooses a different approach: SELECT * FROM events, main WHERE main.ti > events.start and main.ti < events.stop and sensor_id=1; Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) (actual time=23427.599..23886.276 rows=772 loops=1) Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 width=40) (actual time=3771.719..9508.728 rows=490984 loops=1) Recheck Cond: (sensor_id= 1) -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 rows=348087 width=0) (actual time=3769.075..3769.075 rows=491102 loops=1) Index Cond: (sensor_id= 1) -> Materialize (cost=2.36..3.60 rows=124 width=28) (actual time=0.000..0.010 rows=124 loops=490984) -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) (actual time=0.005..0.021 rows=124 loops=1) Total runtime: 23886.494 ms Apparently the planner assumes that sensor_id=1 will return fewer rows than the time constraint while the opposite is true: sensor_id=1 -> ~ 500K , time window -> ~ 1K. Is there a way to hint the planner to use plan 1 even with the sensor_id=1 clause or am I doing something fundamentally wrong here? Thanks
There is an old trick which can help you here. Try doing this: SELECT * FROM events, main WHERE main.ti > events.start and main.ti < events.stop and sensor_id+0=1; That will stop planner from merging two indexes and using bitmap. L. Loewe wrote: > Hi all > > > I have a fairly large table ("main" ~50M rows) containing a timestamp > (indexed), > a sensor_id (indexed) and some sensor data ranging over a few years. > > The second table ("events" <500 rows) contains a set of interesting > events with > an events.id and two timestamps: events.start and events.stop plus some > additional data. > > Now I want to join these tables to get the sensor data for those events. > > The interval between start and stop is quite short for each event > (usually a > couple of minutes) so that there aren't too many rows from table "main" > matching this criteria (~1K) for each event: > > SELECT * FROM events, main WHERE main.ti > events.start and main.ti < > events.stop; > > EXPLAIN ANALYZE gives > > Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) > (actual time=0.038..42.314 rows=69209 loops=1) > -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > (actual time=0.006..0.025 rows=124 loops=1) > -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 > rows=4229880 width=40) > (actual time=0.016..0.178 rows=558 loops=124) > Index Cond: ((main.ti > events.start) AND (main.ti < > events.stop)) > Total runtime: 47.682 ms > > So far so good, however if I add a sensor_id constraint the planner > chooses a > different approach: > > > SELECT * FROM events, main WHERE main.ti > events.start and > main.ti < events.stop and > sensor_id=1; > > Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) > (actual time=23427.599..23886.276 rows=772 loops=1) > Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) > -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 > width=40) > (actual time=3771.719..9508.728 rows=490984 loops=1) > Recheck Cond: (sensor_id= 1) > -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 > rows=348087 width=0) > (actual time=3769.075..3769.075 rows=491102 loops=1) > Index Cond: (sensor_id= 1) > -> Materialize (cost=2.36..3.60 rows=124 width=28) > (actual time=0.000..0.010 rows=124 loops=490984) > -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > (actual time=0.005..0.021 rows=124 loops=1) > Total runtime: 23886.494 ms > > > Apparently the planner assumes that sensor_id=1 will return fewer rows > than > the time constraint while the opposite is true: > sensor_id=1 -> ~ 500K , time window -> ~ 1K. > > Is there a way to hint the planner to use plan 1 even with the > sensor_id=1 > clause or am I doing something fundamentally wrong here? > > Thanks > -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote: > Hi all > SELECT * FROM events, main WHERE main.ti > events.start and > main.ti < events.stop and sensor_id=1; create index main_ti_sid on main(sensor_id,ti);
Jasen, he does have the index, that precisely is a problem. According to his statement, full table scan is faster than an index search. Jasen Betts wrote: > On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote: > >> Hi all >> > > >> SELECT * FROM events, main WHERE main.ti > events.start and >> main.ti < events.stop and sensor_id=1; >> > > create index main_ti_sid on main(sensor_id,ti); > > > -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
Thanks. That did change the plan a bit but it's still not doing it the way it does it when searching all sensors - and it's still a lot slower. Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68) (actual time=30983.301..31827.299 rows=772 loops=1) Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) -> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40) (actual time=54.095..18136.153 rows=492150 loops=1) Filter: ((sensor_id + 0) = 1) -> Materialize (cost=2.36..3.60 rows=124 width=28) (actual time=0.000..0.009 rows=124 loops=492150) -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) (actual time=0.010..0.021 rows=124 loops=1) Total runtime: 31827.435 ms I'm not sure the index is the problem - seems more like the way it assembles the data. If I read this right the planner takes the rows matching sensor_id=1 first and then joins the result with the time slices of events. This'd be probably a good idea if events had a lot of rows or the time slices were large. But with the data at hand it is a lot faster to take the rows of events first and then match each one to main.ti (which is what the planner does without the where clause). It's understandable that the planner cannot properly figure out how many rows these start-stop slices will select, however since it appears to make an assumption I was looking for a way to tell it which value to assume... Regards On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com> wrote: > There is an old trick which can help you here. Try doing this: > SELECT * FROM events, main WHERE main.ti > events.start and > main.ti < events.stop and > sensor_id+0=1; > > That will stop planner from merging two indexes and using bitmap. > > L. Loewe wrote: >> Hi all >> >> >> I have a fairly large table ("main" ~50M rows) containing a timestamp >> (indexed), >> a sensor_id (indexed) and some sensor data ranging over a few years. >> >> The second table ("events" <500 rows) contains a set of interesting >> events with >> an events.id and two timestamps: events.start and events.stop plus some >> additional data. >> >> Now I want to join these tables to get the sensor data for those events. >> >> The interval between start and stop is quite short for each event >> (usually a >> couple of minutes) so that there aren't too many rows from table "main" >> matching this criteria (~1K) for each event: >> >> SELECT * FROM events, main WHERE main.ti > events.start and main.ti < >> events.stop; >> >> EXPLAIN ANALYZE gives >> >> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) >> (actual time=0.038..42.314 rows=69209 loops=1) >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >> (actual time=0.006..0.025 rows=124 loops=1) >> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 >> rows=4229880 width=40) >> (actual time=0.016..0.178 rows=558 loops=124) >> Index Cond: ((main.ti > events.start) AND (main.ti < >> events.stop)) >> Total runtime: 47.682 ms >> >> So far so good, however if I add a sensor_id constraint the planner >> chooses a >> different approach: >> >> >> SELECT * FROM events, main WHERE main.ti > events.start and >> main.ti < events.stop and >> sensor_id=1; >> >> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) >> (actual time=23427.599..23886.276 rows=772 loops=1) >> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) >> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 >> width=40) >> (actual time=3771.719..9508.728 rows=490984 loops=1) >> Recheck Cond: (sensor_id= 1) >> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 >> rows=348087 width=0) >> (actual time=3769.075..3769.075 rows=491102 loops=1) >> Index Cond: (sensor_id= 1) >> -> Materialize (cost=2.36..3.60 rows=124 width=28) >> (actual time=0.000..0.010 rows=124 loops=490984) >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >> (actual time=0.005..0.021 rows=124 loops=1) >> Total runtime: 23886.494 ms >> >> >> Apparently the planner assumes that sensor_id=1 will return fewer rows >> than >> the time constraint while the opposite is true: >> sensor_id=1 -> ~ 500K , time window -> ~ 1K. >> >> Is there a way to hint the planner to use plan 1 even with the >> sensor_id=1 >> clause or am I doing something fundamentally wrong here? >> >> Thanks >> >
Why doesn't the planner do a pruning lower cost ratio threshold alternative join implementation search ? There's only seq scan, and 3 types of index scans ( hash, btree, and bitmap) , so how hard would it be for a planner to look at the alternatives e.g. 3 joins , 4^3 choices of joins , compare 64 choices of plans for less complicated queries ? Also, has anyone ever tried to put some sort of functional dependencies / normalization tool in contrib of postgresql, or is it enough that postgresql encourages writing hacked up queries as well as anomaly fixing triggers capability , making it feature complete for the market ? On Thu, 2010-04-01 at 22:11 -0600, L. Loewe wrote: > Thanks. > > That did change the plan a bit but it's still not doing it the way > it does it when searching all sensors - and it's still a lot slower. > > > Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68) > (actual time=30983.301..31827.299 rows=772 loops=1) > Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) > -> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40) > (actual time=54.095..18136.153 rows=492150 loops=1) > Filter: ((sensor_id + 0) = 1) > -> Materialize (cost=2.36..3.60 rows=124 width=28) > (actual time=0.000..0.009 rows=124 loops=492150) > -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > (actual time=0.010..0.021 rows=124 loops=1) > Total runtime: 31827.435 ms > > I'm not sure the index is the problem - seems more like the way it > assembles the data. > If I read this right the planner takes the rows matching sensor_id=1 > first and then joins the result with the time slices of events. > > This'd be probably a good idea if events had a lot of rows or the time > slices were large. > But with the data at hand it is a lot faster to take the rows > of events first and then match each one to main.ti (which is what > the planner does without the where clause). > > It's understandable that the planner cannot properly figure out > how many rows these start-stop slices will select, however since > it appears to make an assumption I was looking for a way to > tell it which value to assume... > > Regards > > > > > On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com> > wrote: > > > There is an old trick which can help you here. Try doing this: > > SELECT * FROM events, main WHERE main.ti > events.start and > > main.ti < events.stop and > > sensor_id+0=1; > > > > That will stop planner from merging two indexes and using bitmap. > > > > L. Loewe wrote: > >> Hi all > >> > >> > >> I have a fairly large table ("main" ~50M rows) containing a timestamp > >> (indexed), > >> a sensor_id (indexed) and some sensor data ranging over a few years. > >> > >> The second table ("events" <500 rows) contains a set of interesting > >> events with > >> an events.id and two timestamps: events.start and events.stop plus some > >> additional data. > >> > >> Now I want to join these tables to get the sensor data for those events. > >> > >> The interval between start and stop is quite short for each event > >> (usually a > >> couple of minutes) so that there aren't too many rows from table "main" > >> matching this criteria (~1K) for each event: > >> > >> SELECT * FROM events, main WHERE main.ti > events.start and main.ti < > >> events.stop; > >> > >> EXPLAIN ANALYZE gives > >> > >> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) > >> (actual time=0.038..42.314 rows=69209 loops=1) > >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > >> (actual time=0.006..0.025 rows=124 loops=1) > >> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 > >> rows=4229880 width=40) > >> (actual time=0.016..0.178 rows=558 loops=124) > >> Index Cond: ((main.ti > events.start) AND (main.ti < > >> events.stop)) > >> Total runtime: 47.682 ms > >> > >> So far so good, however if I add a sensor_id constraint the planner > >> chooses a > >> different approach: > >> > >> > >> SELECT * FROM events, main WHERE main.ti > events.start and > >> main.ti < events.stop and > >> sensor_id=1; > >> > >> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) > >> (actual time=23427.599..23886.276 rows=772 loops=1) > >> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) > >> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 > >> width=40) > >> (actual time=3771.719..9508.728 rows=490984 loops=1) > >> Recheck Cond: (sensor_id= 1) > >> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 > >> rows=348087 width=0) > >> (actual time=3769.075..3769.075 rows=491102 loops=1) > >> Index Cond: (sensor_id= 1) > >> -> Materialize (cost=2.36..3.60 rows=124 width=28) > >> (actual time=0.000..0.010 rows=124 loops=490984) > >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > >> (actual time=0.005..0.021 rows=124 loops=1) > >> Total runtime: 23886.494 ms > >> > >> > >> Apparently the planner assumes that sensor_id=1 will return fewer rows > >> than > >> the time constraint while the opposite is true: > >> sensor_id=1 -> ~ 500K , time window -> ~ 1K. > >> > >> Is there a way to hint the planner to use plan 1 even with the > >> sensor_id=1 > >> clause or am I doing something fundamentally wrong here? > >> > >> Thanks > >> > > >
How about creating index on both columns and drop the separate index? The composite index on ti and sensor_id could still be used for the original query but would probably give you a killer performance when the columns are used together. L. Loewe wrote: > Thanks. > > That did change the plan a bit but it's still not doing it the way > it does it when searching all sensors - and it's still a lot slower. > > > Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68) > (actual time=30983.301..31827.299 rows=772 loops=1) > Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) > -> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40) > (actual time=54.095..18136.153 rows=492150 loops=1) > Filter: ((sensor_id + 0) = 1) > -> Materialize (cost=2.36..3.60 rows=124 width=28) > (actual time=0.000..0.009 rows=124 loops=492150) > -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > (actual time=0.010..0.021 rows=124 loops=1) > Total runtime: 31827.435 ms > > I'm not sure the index is the problem - seems more like the way it > assembles the data. > If I read this right the planner takes the rows matching sensor_id=1 > first and then joins the result with the time slices of events. > > This'd be probably a good idea if events had a lot of rows or the time > slices were large. > But with the data at hand it is a lot faster to take the rows > of events first and then match each one to main.ti (which is what > the planner does without the where clause). > > It's understandable that the planner cannot properly figure out > how many rows these start-stop slices will select, however since > it appears to make an assumption I was looking for a way to > tell it which value to assume... > > Regards > > > > > On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala > <mgogala@vmsinfo.com> wrote: > >> There is an old trick which can help you here. Try doing this: >> SELECT * FROM events, main WHERE main.ti > events.start and >> main.ti < events.stop and >> sensor_id+0=1; >> >> That will stop planner from merging two indexes and using bitmap. >> >> L. Loewe wrote: >>> Hi all >>> >>> >>> I have a fairly large table ("main" ~50M rows) containing a >>> timestamp (indexed), >>> a sensor_id (indexed) and some sensor data ranging over a few years. >>> >>> The second table ("events" <500 rows) contains a set of interesting >>> events with >>> an events.id and two timestamps: events.start and events.stop plus some >>> additional data. >>> >>> Now I want to join these tables to get the sensor data for those >>> events. >>> >>> The interval between start and stop is quite short for each event >>> (usually a >>> couple of minutes) so that there aren't too many rows from table "main" >>> matching this criteria (~1K) for each event: >>> >>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti >>> < events.stop; >>> >>> EXPLAIN ANALYZE gives >>> >>> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) >>> (actual time=0.038..42.314 rows=69209 loops=1) >>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >>> (actual time=0.006..0.025 rows=124 loops=1) >>> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 >>> rows=4229880 width=40) >>> (actual time=0.016..0.178 rows=558 loops=124) >>> Index Cond: ((main.ti > events.start) AND (main.ti < >>> events.stop)) >>> Total runtime: 47.682 ms >>> >>> So far so good, however if I add a sensor_id constraint the planner >>> chooses a >>> different approach: >>> >>> >>> SELECT * FROM events, main WHERE main.ti > events.start and >>> main.ti < events.stop and >>> sensor_id=1; >>> >>> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) >>> (actual time=23427.599..23886.276 rows=772 loops=1) >>> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) >>> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 >>> width=40) >>> (actual time=3771.719..9508.728 rows=490984 loops=1) >>> Recheck Cond: (sensor_id= 1) >>> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 >>> rows=348087 width=0) >>> (actual time=3769.075..3769.075 rows=491102 loops=1) >>> Index Cond: (sensor_id= 1) >>> -> Materialize (cost=2.36..3.60 rows=124 width=28) >>> (actual time=0.000..0.010 rows=124 loops=490984) >>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >>> (actual time=0.005..0.021 rows=124 loops=1) >>> Total runtime: 23886.494 ms >>> >>> >>> Apparently the planner assumes that sensor_id=1 will return fewer >>> rows than >>> the time constraint while the opposite is true: >>> sensor_id=1 -> ~ 500K , time window -> ~ 1K. >>> >>> Is there a way to hint the planner to use plan 1 even with the >>> sensor_id=1 >>> clause or am I doing something fundamentally wrong here? >>> >>> Thanks >>> >> > -- Mladen Gogala Sr. Oracle DBA 1500 Broadway New York, NY 10036 (212) 329-5251 http://www.vmsinfo.com The Leader in Integrated Media Intelligence Solutions
Nope. Still slow with sensor_id = 1. The problem appears to be unrelated to the indexes but the the planners assumption that sensor_id=1 will return about 500K rows, which is correct and start<ti<stop will return 5000K rows, which is way off of the actual 800. My current solution is to create a (temporary) table with all sensors and use it for sensor based queries - it works but requires some maintenance which I wanted to avoid. regards On Thu, 08 Apr 2010 10:40:14 -0600, Mladen Gogala <mgogala@vmsinfo.com> wrote: > How about creating index on both columns and drop the separate index? > The composite index on ti and sensor_id could still be used for the > original query but would probably give you a killer performance when the > columns are used together. > > L. Loewe wrote: >> Thanks. >> >> That did change the plan a bit but it's still not doing it the way >> it does it when searching all sensors - and it's still a lot slower. >> >> >> Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68) >> (actual time=30983.301..31827.299 rows=772 loops=1) >> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) >> -> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40) >> (actual time=54.095..18136.153 rows=492150 loops=1) >> Filter: ((sensor_id + 0) = 1) >> -> Materialize (cost=2.36..3.60 rows=124 width=28) >> (actual time=0.000..0.009 rows=124 loops=492150) >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >> (actual time=0.010..0.021 rows=124 loops=1) >> Total runtime: 31827.435 ms >> >> I'm not sure the index is the problem - seems more like the way it >> assembles the data. >> If I read this right the planner takes the rows matching sensor_id=1 >> first and then joins the result with the time slices of events. >> >> This'd be probably a good idea if events had a lot of rows or the time >> slices were large. >> But with the data at hand it is a lot faster to take the rows >> of events first and then match each one to main.ti (which is what >> the planner does without the where clause). >> >> It's understandable that the planner cannot properly figure out >> how many rows these start-stop slices will select, however since >> it appears to make an assumption I was looking for a way to >> tell it which value to assume... >> >> Regards >> >> >> >> >> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com> >> wrote: >> >>> There is an old trick which can help you here. Try doing this: >>> SELECT * FROM events, main WHERE main.ti > events.start and >>> main.ti < events.stop and >>> sensor_id+0=1; >>> >>> That will stop planner from merging two indexes and using bitmap. >>> >>> L. Loewe wrote: >>>> Hi all >>>> >>>> >>>> I have a fairly large table ("main" ~50M rows) containing a timestamp >>>> (indexed), >>>> a sensor_id (indexed) and some sensor data ranging over a few years. >>>> >>>> The second table ("events" <500 rows) contains a set of interesting >>>> events with >>>> an events.id and two timestamps: events.start and events.stop plus >>>> some >>>> additional data. >>>> >>>> Now I want to join these tables to get the sensor data for those >>>> events. >>>> >>>> The interval between start and stop is quite short for each event >>>> (usually a >>>> couple of minutes) so that there aren't too many rows from table >>>> "main" >>>> matching this criteria (~1K) for each event: >>>> >>>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti < >>>> events.stop; >>>> >>>> EXPLAIN ANALYZE gives >>>> >>>> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) >>>> (actual time=0.038..42.314 rows=69209 loops=1) >>>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >>>> (actual time=0.006..0.025 rows=124 loops=1) >>>> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 >>>> rows=4229880 width=40) >>>> (actual time=0.016..0.178 rows=558 loops=124) >>>> Index Cond: ((main.ti > events.start) AND (main.ti < >>>> events.stop)) >>>> Total runtime: 47.682 ms >>>> >>>> So far so good, however if I add a sensor_id constraint the planner >>>> chooses a >>>> different approach: >>>> >>>> >>>> SELECT * FROM events, main WHERE main.ti > events.start and >>>> main.ti < events.stop and >>>> sensor_id=1; >>>> >>>> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) >>>> (actual time=23427.599..23886.276 rows=772 loops=1) >>>> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) >>>> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 >>>> width=40) >>>> (actual time=3771.719..9508.728 rows=490984 loops=1) >>>> Recheck Cond: (sensor_id= 1) >>>> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 >>>> rows=348087 width=0) >>>> (actual time=3769.075..3769.075 rows=491102 loops=1) >>>> Index Cond: (sensor_id= 1) >>>> -> Materialize (cost=2.36..3.60 rows=124 width=28) >>>> (actual time=0.000..0.010 rows=124 loops=490984) >>>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >>>> (actual time=0.005..0.021 rows=124 loops=1) >>>> Total runtime: 23886.494 ms >>>> >>>> >>>> Apparently the planner assumes that sensor_id=1 will return fewer >>>> rows than >>>> the time constraint while the opposite is true: >>>> sensor_id=1 -> ~ 500K , time window -> ~ 1K. >>>> >>>> Is there a way to hint the planner to use plan 1 even with the >>>> sensor_id=1 >>>> clause or am I doing something fundamentally wrong here? >>>> >>>> Thanks >>>> >>> >> > > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
Is the index clustered? On Thu, 2010-04-01 at 14:19 -0400, Mladen Gogala wrote: > Jasen, he does have the index, that precisely is a problem. According to > his statement, full table scan is faster than an index search. > > Jasen Betts wrote: > > On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote: > > > >> Hi all > >> > > > > > >> SELECT * FROM events, main WHERE main.ti > events.start and > >> main.ti < events.stop and sensor_id=1; > >> > > > > create index main_ti_sid on main(sensor_id,ti); > > > > > > > > -- > > Mladen Gogala > Sr. Oracle DBA > 1500 Broadway > New York, NY 10036 > (212) 329-5251 > http://www.vmsinfo.com > The Leader in Integrated Media Intelligence Solutions > > > >
No it is not, but that's not the issue I think - see my other posts. On Mon, 12 Apr 2010 19:19:34 -0600, syan <kittylitter@people.net.au> wrote: > Is the index clustered? > On Thu, 2010-04-01 at 14:19 -0400, Mladen Gogala wrote: >> Jasen, he does have the index, that precisely is a problem. According to >> his statement, full table scan is faster than an index search. >> >> Jasen Betts wrote: >> > On 2010-03-29, L. Loewe <lloewe@hotmail.com> wrote: >> > >> >> Hi all >> >> >> > >> > >> >> SELECT * FROM events, main WHERE main.ti > events.start and >> >> main.ti < events.stop and >> sensor_id=1; >> >> >> > >> > create index main_ti_sid on main(sensor_id,ti); >> > >> >