Re: 8.3.1 query plan - Mailing list pgsql-general
From | Steve Clark |
---|---|
Subject | Re: 8.3.1 query plan |
Date | |
Msg-id | 48B80D27.9040805@netwolves.com Whole thread Raw |
In response to | Re: 8.3.1 query plan (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-general |
Tom Lane wrote: > Steve Clark <sclark@netwolves.com> writes: > >>Tom Lane wrote: >> >>>Consider testing the conditions on A at the top level, instead of >>>redundantly checking them inside the sub-query on B. > > >>Thanks for the response Tom, I am a SQL neophyte, so I'll try to >>rework the query. > > > What I meant to suggest was just > > explain insert into myevents select * from t_unit_event_log a where > a.event_status = 1 and a.event_ref_log_no IS NOT NULL > and a.event_log_no not in (select event_log_no from myevents) > and exists (select b.event_log_no from myevents b > where a.event_ref_log_no = b.event_log_no) > > ie, pull everything out of the subquery that doesn't depend on B. > > Although, looking at it in this form, it seems like you'd be well > advised to then replace the EXISTS with an IN: > > ... and a.event_ref_log_no in (select b.event_log_no from myevents b) > > Although those two forms should be equivalent, reality is that current > releases of PG are generally smarter about optimizing IN than EXISTS. > (The difference should largely go away as of 8.4.) > > regards, tom lane > > Thanks again Tom. It appears to me that in both cases the planner ends up doing a seq scan of the myevents table for each proposed new insertion from the t_unit_event_log table which make the query take to long to be feasible. What I need to do is somehow immediately eliminate the candidate row from the t_unit_event_log if the row is already in the myevents table, but I am at a loss as to how to do that. Regards, Steve srm2=# explain insert into myevents select * from t_unit_event_log a where srm2-# a.event_status = 1 and a.event_ref_log_no IS NOT NULL srm2-# and a.event_log_no not in (select event_log_no from myevents) srm2-# and exists (select b.event_log_no from myevents b srm2(# where a.event_ref_log_no = b.event_log_no); QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using indx_tuel_usn_ec_es on t_unit_event_log a (cost=31711.73..3597826539.34 rows=51479 width=145) Index Cond: (event_status = 1) Filter: ((event_ref_log_no IS NOT NULL) AND (subplan) AND (NOT (subplan))) SubPlan -> Materialize (cost=31711.73..42857.85 rows=830612 width=4) -> Seq Scan on myevents (cost=0.00..28041.12 rows=830612 width=4) -> Index Scan using indx1myevents on myevents b (cost=0.00..8.37 rows=1 width=4) Index Cond: ($1 = event_log_no) (8 rows) srm2=# explain insert into myevents select * from t_unit_event_log a where a.event_status = 1 and a.event_ref_log_no IS NOT NULL and a.event_log_no not in (select event_log_no from myevents) and a.event_ref_log_no in (select b.event_log_no from myevents b); QUERY PLAN ------------------------------------------------------------------------------------------------------------------- Merge IN Join (cost=40821629.90..3265326013.41 rows=39021 width=145) Merge Cond: (a.event_ref_log_no = b.event_log_no) -> Index Scan using indx_tuel_erln on t_unit_event_log a (cost=31711.73..8616020218.25 rows=102959 width=145) Filter: ((event_ref_log_no IS NOT NULL) AND (event_status = 1) AND (NOT (subplan))) SubPlan -> Materialize (cost=31711.73..42857.85 rows=830612 width=4) -> Seq Scan on myevents (cost=0.00..28041.12 rows=830612 width=4) -> Materialize (cost=119646.12..130028.77 rows=830612 width=4) -> Sort (cost=119646.12..121722.65 rows=830612 width=4) Sort Key: b.event_log_no -> Seq Scan on myevents b (cost=0.00..28041.12 rows=830612 width=4) (11 rows)
pgsql-general by date: