Re: Forcing filter/join order? - Mailing list pgsql-performance
From | Peter Darley |
---|---|
Subject | Re: Forcing filter/join order? |
Date | |
Msg-id | PDEOIIFFBIAABMGNJAGPIENFCNAA.pdarley@kinesis-cem.com Whole thread Raw |
In response to | Forcing filter/join order? (Josh Berkus <josh@agliodbs.com>) |
Responses |
Re: Forcing filter/join order?
|
List | pgsql-performance |
Josh, I'm sure the big brains have a better suggestion, but in the mean time could you do something as simple as: SELECT * FROM (select * from events where event_date BETWEEN 'date1' AND 'date2') e LEFT OUTER JOIN cases ON e.case_id = cases.case_id; Thanks, Peter Darley -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org]On Behalf Of Josh Berkus Sent: Wednesday, February 18, 2004 4:11 PM To: pgsql-performance Subject: [PERFORM] Forcing filter/join order? Folks, Have an interesting issue with a complex query, where apparently I need to twist the query planner's arm, and am looking for advice on how to do so. The situation: I have a table, events, with about 300,000 records. It does an outer join to a second table, cases, with about 150,000 records. A very simplified version query would be like SELECT * FROM events LEFT OUTER JOIN cases ON events.case_id = cases.case_id WHERE events.event_date BETWEEN 'date1' AND 'date2' This join is very expensive, as you can imagine. Yet I can't seem to force the query planner to apply the filter conditions to the events table *before* attempting to join it to cases. Here's the crucial explain lines: -> Merge Left Join (cost=0.00..11880.82 rows=15879 width=213) (actual time=5.777..901.899 rows=648 loops=1) Merge Cond: ("outer".case_id = "inner".case_id) Join Filter: (("outer".link_type)::text = 'case'::text) -> Index Scan using idx_event_ends on events (cost=0.00..4546.15 rows=15879 width=80 ) (actual time=4.144..333.769 rows=648 loops=1) Filter: ((status <> 0) AND ((event_date + duration) >= '2004-02-18 00:00:00'::timestamp without time zone) AND (event_date <= '2004-03-05 23:59:00'::timestamp without time zone)) -> Index Scan using cases_pkey on cases (cost=0.00..6802.78 rows=117478 width=137) ( actual time=0.139..402.363 rows=116835 loops=1) As you can see, part of the problem is a pretty drastic (20x) mis-estimation of the selectivity of the date limits on events -- and results in 90% of the execution time of my query on this one join. I've tried raising the statistics on event_date, duration, and case_id (to 1000), but this doesn't seem to affect the estimate or the query plan. In the above test, idx_event_ends indexes (case_id, status, event_date, (event_date + duration)), but as you can see the planner uses only the first column. This was an attempt to circumvent the planner's tendency to completely ignoring any index on (event_date, (event_date + duration)) -- even though that index is the most selective combination on the events table. Is there anything I can do to force the query planner to filter on events before joining cases, other than waiting for version 7.5? -- -Josh Berkus Aglio Database Solutions San Francisco ---------------------------(end of broadcast)--------------------------- TIP 7: don't forget to increase your free space map settings
pgsql-performance by date: