Re: slow plan on join when adding where clause - Mailing list pgsql-novice
From | L. Loewe |
---|---|
Subject | Re: slow plan on join when adding where clause |
Date | |
Msg-id | op.vauzww0lb4gxma@oakcreek2 Whole thread Raw |
In response to | slow plan on join when adding where clause ("L. Loewe" <lloewe@hotmail.com>) |
List | pgsql-novice |
Nope. Still slow with sensor_id = 1. The problem appears to be unrelated to the indexes but the the planners assumption that sensor_id=1 will return about 500K rows, which is correct and start<ti<stop will return 5000K rows, which is way off of the actual 800. My current solution is to create a (temporary) table with all sensors and use it for sensor based queries - it works but requires some maintenance which I wanted to avoid. regards On Thu, 08 Apr 2010 10:40:14 -0600, Mladen Gogala <mgogala@vmsinfo.com> wrote: > How about creating index on both columns and drop the separate index? > The composite index on ti and sensor_id could still be used for the > original query but would probably give you a killer performance when the > columns are used together. > > L. Loewe wrote: >> Thanks. >> >> That did change the plan a bit but it's still not doing it the way >> it does it when searching all sensors - and it's still a lot slower. >> >> >> Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68) >> (actual time=30983.301..31827.299 rows=772 loops=1) >> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) >> -> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40) >> (actual time=54.095..18136.153 rows=492150 loops=1) >> Filter: ((sensor_id + 0) = 1) >> -> Materialize (cost=2.36..3.60 rows=124 width=28) >> (actual time=0.000..0.009 rows=124 loops=492150) >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >> (actual time=0.010..0.021 rows=124 loops=1) >> Total runtime: 31827.435 ms >> >> I'm not sure the index is the problem - seems more like the way it >> assembles the data. >> If I read this right the planner takes the rows matching sensor_id=1 >> first and then joins the result with the time slices of events. >> >> This'd be probably a good idea if events had a lot of rows or the time >> slices were large. >> But with the data at hand it is a lot faster to take the rows >> of events first and then match each one to main.ti (which is what >> the planner does without the where clause). >> >> It's understandable that the planner cannot properly figure out >> how many rows these start-stop slices will select, however since >> it appears to make an assumption I was looking for a way to >> tell it which value to assume... >> >> Regards >> >> >> >> >> On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com> >> wrote: >> >>> There is an old trick which can help you here. Try doing this: >>> SELECT * FROM events, main WHERE main.ti > events.start and >>> main.ti < events.stop and >>> sensor_id+0=1; >>> >>> That will stop planner from merging two indexes and using bitmap. >>> >>> L. Loewe wrote: >>>> Hi all >>>> >>>> >>>> I have a fairly large table ("main" ~50M rows) containing a timestamp >>>> (indexed), >>>> a sensor_id (indexed) and some sensor data ranging over a few years. >>>> >>>> The second table ("events" <500 rows) contains a set of interesting >>>> events with >>>> an events.id and two timestamps: events.start and events.stop plus >>>> some >>>> additional data. >>>> >>>> Now I want to join these tables to get the sensor data for those >>>> events. >>>> >>>> The interval between start and stop is quite short for each event >>>> (usually a >>>> couple of minutes) so that there aren't too many rows from table >>>> "main" >>>> matching this criteria (~1K) for each event: >>>> >>>> SELECT * FROM events, main WHERE main.ti > events.start and main.ti < >>>> events.stop; >>>> >>>> EXPLAIN ANALYZE gives >>>> >>>> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) >>>> (actual time=0.038..42.314 rows=69209 loops=1) >>>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >>>> (actual time=0.006..0.025 rows=124 loops=1) >>>> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 >>>> rows=4229880 width=40) >>>> (actual time=0.016..0.178 rows=558 loops=124) >>>> Index Cond: ((main.ti > events.start) AND (main.ti < >>>> events.stop)) >>>> Total runtime: 47.682 ms >>>> >>>> So far so good, however if I add a sensor_id constraint the planner >>>> chooses a >>>> different approach: >>>> >>>> >>>> SELECT * FROM events, main WHERE main.ti > events.start and >>>> main.ti < events.stop and >>>> sensor_id=1; >>>> >>>> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) >>>> (actual time=23427.599..23886.276 rows=772 loops=1) >>>> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) >>>> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 >>>> width=40) >>>> (actual time=3771.719..9508.728 rows=490984 loops=1) >>>> Recheck Cond: (sensor_id= 1) >>>> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 >>>> rows=348087 width=0) >>>> (actual time=3769.075..3769.075 rows=491102 loops=1) >>>> Index Cond: (sensor_id= 1) >>>> -> Materialize (cost=2.36..3.60 rows=124 width=28) >>>> (actual time=0.000..0.010 rows=124 loops=490984) >>>> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) >>>> (actual time=0.005..0.021 rows=124 loops=1) >>>> Total runtime: 23886.494 ms >>>> >>>> >>>> Apparently the planner assumes that sensor_id=1 will return fewer >>>> rows than >>>> the time constraint while the opposite is true: >>>> sensor_id=1 -> ~ 500K , time window -> ~ 1K. >>>> >>>> Is there a way to hint the planner to use plan 1 even with the >>>> sensor_id=1 >>>> clause or am I doing something fundamentally wrong here? >>>> >>>> Thanks >>>> >>> >> > > -- Using Opera's revolutionary e-mail client: http://www.opera.com/mail/
pgsql-novice by date: