Re: slow plan on join when adding where clause - Mailing list pgsql-novice
From | syan tan |
---|---|
Subject | Re: slow plan on join when adding where clause |
Date | |
Msg-id | 1270599042.3241.11.camel@syan-desktop Whole thread Raw |
In response to | Re: slow plan on join when adding where clause ("L. Loewe" <lloewe@hotmail.com>) |
List | pgsql-novice |
Why doesn't the planner do a pruning lower cost ratio threshold alternative join implementation search ? There's only seq scan, and 3 types of index scans ( hash, btree, and bitmap) , so how hard would it be for a planner to look at the alternatives e.g. 3 joins , 4^3 choices of joins , compare 64 choices of plans for less complicated queries ? Also, has anyone ever tried to put some sort of functional dependencies / normalization tool in contrib of postgresql, or is it enough that postgresql encourages writing hacked up queries as well as anomaly fixing triggers capability , making it feature complete for the market ? On Thu, 2010-04-01 at 22:11 -0600, L. Loewe wrote: > Thanks. > > That did change the plan a bit but it's still not doing it the way > it does it when searching all sensors - and it's still a lot slower. > > > Nested Loop (cost=2.36..1482129.06 rows=2629241 width=68) > (actual time=30983.301..31827.299 rows=772 loops=1) > Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) > -> Seq Scan on main (cost=0.00..890547.50 rows=190832 width=40) > (actual time=54.095..18136.153 rows=492150 loops=1) > Filter: ((sensor_id + 0) = 1) > -> Materialize (cost=2.36..3.60 rows=124 width=28) > (actual time=0.000..0.009 rows=124 loops=492150) > -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > (actual time=0.010..0.021 rows=124 loops=1) > Total runtime: 31827.435 ms > > I'm not sure the index is the problem - seems more like the way it > assembles the data. > If I read this right the planner takes the rows matching sensor_id=1 > first and then joins the result with the time slices of events. > > This'd be probably a good idea if events had a lot of rows or the time > slices were large. > But with the data at hand it is a lot faster to take the rows > of events first and then match each one to main.ti (which is what > the planner does without the where clause). > > It's understandable that the planner cannot properly figure out > how many rows these start-stop slices will select, however since > it appears to make an assumption I was looking for a way to > tell it which value to assume... > > Regards > > > > > On Tue, 30 Mar 2010 12:06:46 -0600, Mladen Gogala <mgogala@vmsinfo.com> > wrote: > > > There is an old trick which can help you here. Try doing this: > > SELECT * FROM events, main WHERE main.ti > events.start and > > main.ti < events.stop and > > sensor_id+0=1; > > > > That will stop planner from merging two indexes and using bitmap. > > > > L. Loewe wrote: > >> Hi all > >> > >> > >> I have a fairly large table ("main" ~50M rows) containing a timestamp > >> (indexed), > >> a sensor_id (indexed) and some sensor data ranging over a few years. > >> > >> The second table ("events" <500 rows) contains a set of interesting > >> events with > >> an events.id and two timestamps: events.start and events.stop plus some > >> additional data. > >> > >> Now I want to join these tables to get the sensor data for those events. > >> > >> The interval between start and stop is quite short for each event > >> (usually a > >> couple of minutes) so that there aren't too many rows from table "main" > >> matching this criteria (~1K) for each event: > >> > >> SELECT * FROM events, main WHERE main.ti > events.start and main.ti < > >> events.stop; > >> > >> EXPLAIN ANALYZE gives > >> > >> Nested Loop (cost=0.00..27621542.27 rows=524505120 width=68) > >> (actual time=0.038..42.314 rows=69209 loops=1) > >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > >> (actual time=0.006..0.025 rows=124 loops=1) > >> -> Index Scan using idx_main_ti on main (cost=0.00..159306.16 > >> rows=4229880 width=40) > >> (actual time=0.016..0.178 rows=558 loops=124) > >> Index Cond: ((main.ti > events.start) AND (main.ti < > >> events.stop)) > >> Total runtime: 47.682 ms > >> > >> So far so good, however if I add a sensor_id constraint the planner > >> chooses a > >> different approach: > >> > >> > >> SELECT * FROM events, main WHERE main.ti > events.start and > >> main.ti < events.stop and > >> sensor_id=1; > >> > >> Nested Loop (cost=7309.32..1422246.30 rows=4795865 width=68) > >> (actual time=23427.599..23886.276 rows=772 loops=1) > >> Join Filter: ((main.ti > events.start) AND (main.ti < events.stop)) > >> -> Bitmap Heap Scan on main (cost=7306.96..343174.23 rows=348087 > >> width=40) > >> (actual time=3771.719..9508.728 rows=490984 loops=1) > >> Recheck Cond: (sensor_id= 1) > >> -> Bitmap Index Scan on idx_main_sens (cost=0.00..7219.94 > >> rows=348087 width=0) > >> (actual time=3769.075..3769.075 rows=491102 loops=1) > >> Index Cond: (sensor_id= 1) > >> -> Materialize (cost=2.36..3.60 rows=124 width=28) > >> (actual time=0.000..0.010 rows=124 loops=490984) > >> -> Seq Scan on events (cost=0.00..2.24 rows=124 width=28) > >> (actual time=0.005..0.021 rows=124 loops=1) > >> Total runtime: 23886.494 ms > >> > >> > >> Apparently the planner assumes that sensor_id=1 will return fewer rows > >> than > >> the time constraint while the opposite is true: > >> sensor_id=1 -> ~ 500K , time window -> ~ 1K. > >> > >> Is there a way to hint the planner to use plan 1 even with the > >> sensor_id=1 > >> clause or am I doing something fundamentally wrong here? > >> > >> Thanks > >> > > >
pgsql-novice by date: