Re: Propogating conditions into a query - Mailing list pgsql-general
From | Kim Bisgaard |
---|---|
Subject | Re: Propogating conditions into a query |
Date | |
Msg-id | 42BA6653.2090201@dmi.dk Whole thread Raw |
In response to | Re: Propogating conditions into a query (Kim Bisgaard <kib+pg@dmi.dk>) |
Responses |
Re: Propogating conditions into a query
|
List | pgsql-general |
Hi Tom, I have now completed the move to PG8.0.3, and feel that I have confirmed that this problem is related to the problem I'm having: Formulated like this, it is not performing: SELECT station_id, timeobs,temp_grass, temp_dry_at_2m FROM temp_dry_at_2m a FULL OUTER JOIN temp_grass b USING (station_id, timeobs) WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0'; Merge Full Join (cost=1598312.83..11032924.48 rows=6956994 width=32) (actual time=119061.098..133314.306 rows=1 loops=1) Merge Cond: (("outer".timeobs = "inner".timeobs) AND ("outer".station_id = "inner".station_id)) Filter: ((COALESCE("inner".station_id, "outer".station_id) = 52981) AND (COALESCE("inner".timeobs, "outer".timeobs) = '2004-01-0100:00:00'::timestamp without time zone)) -> Sort (cost=346429.38..352445.11 rows=2406292 width=16) (actual time=20315.241..23850.529 rows=2406292 loops=1) Sort Key: b.timeobs, b.station_id -> Seq Scan on temp_grass b (cost=0.00..41756.92 rows=2406292 width=16) (actual time=10.517..7003.468 rows=2406292loops=1) -> Sort (cost=1251883.44..1269275.93 rows=6956994 width=16) (actual time=82122.354..92027.850 rows=6956994 loops=1) Sort Key: a.timeobs, a.station_id -> Seq Scan on temp_dry_at_2m a (cost=0.00..117549.94 rows=6956994 width=16) (actual time=23.759..39930.741 rows=6956994loops=1) Total runtime: 133623.422 ms But Postgresql can do the work, if it is reformulated into: SELECT station_id, timeobs, temp_grass, temp_dry_at_2m FROM (SELECT station_id, timeobs, temp_dry_at_2m FROM temp_dry_at_2m WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') a FULL OUTER JOIN (SELECT station_id, timeobs, temp_grass FROM temp_grass WHERE station_id = 52981 AND timeobs = '2004-1-1 0:0:0') b USING (station_id, timeobs) Merge Full Join (cost=0.00..43023.64 rows=10614 width=32) (actual time=0.056..0.064 rows=1 loops=1) -> Index Scan using temp_grass_idx on temp_grass (cost=0.00..246.55 rows=61 width=16) (actual time=0.029..0.031 rows=1loops=1) Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981)) -> Index Scan using temp_dry_at_2m_idx on temp_dry_at_2m (cost=0.00..699.52 rows=174 width=16) (actual time=0.017..0.020rows=1 loops=1) Index Cond: ((timeobs = '2004-01-01 00:00:00'::timestamp without time zone) AND (station_id = 52981)) Total runtime: 0.163 ms The reason the first query is not performing is because the query optimizer does not push the conditions down into the sub-queries - right?? The reason that I do not just use the reformulated query, is that e.g. the station_id comes from another table (and there can be more of them), so it is bloody inconvenient to first select them, and then repeat them a number of time in the above transformation (I need to outer join more than two tables) ........ Best regards, Kim Bisgaard wrote: > Hi Tom, > > This sounds like the same "problem" which prevented PG from using the > indices, and thus giving abyssmal performance in this other thread: > >> I have two BIG tables (virtually identical) with 3 NOT NULL columns >> Station_id, TimeObs, Temp_XXXX, with unique indexes on (Station_id, >> TimeObs) and valid ANALYSE (set statistics=100). I want to join the >> two tables with a FULL OUTER JOIN. >> >> When I specify the query as: >> >> SELECT station_id, timeobs,temp_grass, temp_dry_at_2m >> FROM temp_dry_at_2m a >> FULL OUTER JOIN temp_grass b USING (station_id, timeobs) >> WHERE station_id = 52981 >> AND timeobs = '2004-1-1 0:0:0' > > > Then I would also vote for improving the inteligence of the optimizer! > :-) > > Regards, > Kim. > > Tom Lane wrote: > >> Phil Endecott <spam_from_postgresql_general@chezphil.org> writes: >> >> >>> I don't see anything in there about LEFT OUTER JOIN though. Any ideas? >>> >> >> >> Oh, I missed that part of your message. Hmm, I think the issue is >> that in >> >> >> >>>> D join (M join G on (M.g=G.id)) on (D.id=M.b) where D.id=nnn >>>> >>> >> >> the planner deduces M.b=nnn by transitivity, but when the join is an >> outer join it can't make the same deduction. >> >> [ thinks some more... ] If we distinguished conditions that hold below >> the join from those that hold above it, we could deduce that M.b=nnn can >> be enforced below the join even though it might not be true above it. >> There's no such mechanism in existence now, though. >> >> A possible workaround is to generate your query like >> >> D left join (M join G on (M.g=G.id)) on (D.id=M.b AND M.b=nnn) where >> D.id=nnn >> >> but I don't know how practical that is for you. >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 5: Have you checked our extensive FAQ? >> >> http://www.postgresql.org/docs/faq >> >> >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq >
pgsql-general by date: