Thread: 8.3.1 query plan

8.3.1 query plan

From
Steve Clark
Date:
Hello List,

I have inherited an existing application. It has a table of events and some events may reference an earlier
event. We have a script that saves the first N number of events for each device, but we also want to save
any additional event that is referenced by one of the first N events. The following query was developed
to do this. It seemed to work ok for a while but one time when it was run it never finished after running
for a day.

So I did an explain and I see it is doing a seq scan in the last sub plan - there are about 375,000 rows
in myevent - why isn't it using the index instead of doing a seq scan?

create unique index indx1myevents on myevents (event_log_no)
CREATE INDEX
vacuum analyze
VACUUM

explain insert into myevents select * from t_unit_event_log a where exists
        (select b.event_log_no from myevents b
         where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
                and a.event_ref_log_no = b.event_log_no and a.event_log_no not in
                        (select event_log_no from myevents)
        )
 Seq Scan on t_unit_event_log a  (cost=0.00..25863477934.49 rows=645692 width=145)
   Filter: (subplan)
   SubPlan
     ->  Result  (cost=20019.39..20027.70 rows=1 width=4)
           One-Time Filter: (($1 = 1) AND ($2 IS NOT NULL) AND (NOT (subplan)))
           ->  Index Scan using indx1myevents on myevents b  (cost=20019.39..20027.70 rows=1 width=4)
                 Index Cond: ($2 = event_log_no)
           SubPlan
             ->  Materialize  (cost=16579.16..22379.12 rows=432196 width=4)
                   ->  Seq Scan on myevents  (cost=0.00..14668.96 rows=432196 width=4)

Why wouldn't the planner use the index instead of doing a seq scan?

Any advice would be greatly appreciated.

Thanks,
Steve


Re: 8.3.1 query plan

From
Martijn van Oosterhout
Date:
On Wed, Aug 27, 2008 at 11:00:54AM -0400, Steve Clark wrote:
> So I did an explain and I see it is doing a seq scan in the last sub plan -
> there are about 375,000 rows
> in myevent - why isn't it using the index instead of doing a seq scan?
>
> create unique index indx1myevents on myevents (event_log_no)
> CREATE INDEX

Are there NULLs in this subtable? NOT IN must check the entire table
for NULLs. Try changing the query to use NOT EXISTS.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Attachment

Re: 8.3.1 query plan

From
Steve Clark
Date:
Martijn van Oosterhout wrote:
> On Wed, Aug 27, 2008 at 11:00:54AM -0400, Steve Clark wrote:
>
>>So I did an explain and I see it is doing a seq scan in the last sub plan -
>>there are about 375,000 rows
>>in myevent - why isn't it using the index instead of doing a seq scan?
>>
>>create unique index indx1myevents on myevents (event_log_no)
>>CREATE INDEX
>
>
> Are there NULLs in this subtable? NOT IN must check the entire table
> for NULLs. Try changing the query to use NOT EXISTS.
>
> Have a nice day,

Hi Martijn,

I guess I misunderstand what you are saying because this is what I get now:

srm2=# explain insert into myevents select * from t_unit_event_log a where exists
srm2-#         (select b.event_log_no from myevents b
srm2(#          where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
srm2(#                 and a.event_ref_log_no = b.event_log_no and a.event_log_no not exists
srm2(#                         (select event_log_no from myevents)
srm2(#         );
ERROR:  syntax error at or near "exists"
LINE 4: ...nt_ref_log_no = b.event_log_no and a.event_log_no not exists

Regards,
Steve

Re: 8.3.1 query plan

From
Tom Lane
Date:
Steve Clark <sclark@netwolves.com> writes:
> explain insert into myevents select * from t_unit_event_log a where exists
>         (select b.event_log_no from myevents b
>          where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
>                 and a.event_ref_log_no = b.event_log_no and a.event_log_no not in
>                         (select event_log_no from myevents)
>         )

Consider testing the conditions on A at the top level, instead of
redundantly checking them inside the sub-query on B.  I'm not certain
exactly how much that's hurting you (EXPLAIN ANALYZE output would've
been more informative), but it can't be good.

            regards, tom lane

Re: 8.3.1 query plan

From
Steve Clark
Date:
Tom Lane wrote:
> Steve Clark <sclark@netwolves.com> writes:
>
>>explain insert into myevents select * from t_unit_event_log a where exists
>>        (select b.event_log_no from myevents b
>>         where a.event_status = 1 and a.event_ref_log_no IS NOT NULL
>>                and a.event_ref_log_no = b.event_log_no and a.event_log_no not in
>>                        (select event_log_no from myevents)
>>        )
>
>
> Consider testing the conditions on A at the top level, instead of
> redundantly checking them inside the sub-query on B.  I'm not certain
> exactly how much that's hurting you (EXPLAIN ANALYZE output would've
> been more informative), but it can't be good.
>
>             regards, tom lane
>
>
Thanks for the response Tom, I am a SQL neophyte, so I'll try to rework the query. The explain analyze
runs forever, or longer than a day anyway.

Regards,
Steve

Re: 8.3.1 query plan

From
Tom Lane
Date:
Steve Clark <sclark@netwolves.com> writes:
> Tom Lane wrote:
>> Consider testing the conditions on A at the top level, instead of
>> redundantly checking them inside the sub-query on B.

> Thanks for the response Tom, I am a SQL neophyte, so I'll try to
> rework the query.

What I meant to suggest was just

explain insert into myevents select * from t_unit_event_log a where
a.event_status = 1 and a.event_ref_log_no IS NOT NULL
and a.event_log_no not in (select event_log_no from myevents)
and exists (select b.event_log_no from myevents b
where a.event_ref_log_no = b.event_log_no)

ie, pull everything out of the subquery that doesn't depend on B.

Although, looking at it in this form, it seems like you'd be well
advised to then replace the EXISTS with an IN:

... and a.event_ref_log_no in (select b.event_log_no from myevents b)

Although those two forms should be equivalent, reality is that current
releases of PG are generally smarter about optimizing IN than EXISTS.
(The difference should largely go away as of 8.4.)

            regards, tom lane

Re: 8.3.1 query plan

From
Steve Clark
Date:
Tom Lane wrote:
> Steve Clark <sclark@netwolves.com> writes:
>
>>Tom Lane wrote:
>>
>>>Consider testing the conditions on A at the top level, instead of
>>>redundantly checking them inside the sub-query on B.
>
>
>>Thanks for the response Tom, I am a SQL neophyte, so I'll try to
>>rework the query.
>
>
> What I meant to suggest was just
>
> explain insert into myevents select * from t_unit_event_log a where
> a.event_status = 1 and a.event_ref_log_no IS NOT NULL
> and a.event_log_no not in (select event_log_no from myevents)
> and exists (select b.event_log_no from myevents b
> where a.event_ref_log_no = b.event_log_no)
>
> ie, pull everything out of the subquery that doesn't depend on B.
>
> Although, looking at it in this form, it seems like you'd be well
> advised to then replace the EXISTS with an IN:
>
> ... and a.event_ref_log_no in (select b.event_log_no from myevents b)
>
> Although those two forms should be equivalent, reality is that current
> releases of PG are generally smarter about optimizing IN than EXISTS.
> (The difference should largely go away as of 8.4.)
>
>             regards, tom lane
>
>
Thanks again Tom.

It appears to me that in both cases the planner ends up doing a seq scan of the myevents table
for each proposed new insertion from the t_unit_event_log table which make the query take to
long to be feasible. What I need to do is somehow immediately eliminate the candidate row from
the t_unit_event_log if the row is already in the myevents table, but I am at a loss as to how to do
that.

Regards,
Steve

srm2=# explain insert into myevents select * from t_unit_event_log a where
srm2-# a.event_status = 1 and a.event_ref_log_no IS NOT NULL
srm2-# and a.event_log_no not in (select event_log_no from myevents)
srm2-# and exists (select b.event_log_no from myevents b
srm2(# where a.event_ref_log_no = b.event_log_no);
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
 Index Scan using indx_tuel_usn_ec_es on t_unit_event_log a  (cost=31711.73..3597826539.34 rows=51479 width=145)
   Index Cond: (event_status = 1)
   Filter: ((event_ref_log_no IS NOT NULL) AND (subplan) AND (NOT (subplan)))
   SubPlan
     ->  Materialize  (cost=31711.73..42857.85 rows=830612 width=4)
           ->  Seq Scan on myevents  (cost=0.00..28041.12 rows=830612 width=4)
     ->  Index Scan using indx1myevents on myevents b  (cost=0.00..8.37 rows=1 width=4)
           Index Cond: ($1 = event_log_no)
(8 rows)

srm2=# explain insert into myevents select * from t_unit_event_log a where
a.event_status = 1 and a.event_ref_log_no IS NOT NULL
and a.event_log_no not in (select event_log_no from myevents)
and a.event_ref_log_no in (select b.event_log_no from myevents b);
                                                    QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
 Merge IN Join  (cost=40821629.90..3265326013.41 rows=39021 width=145)
   Merge Cond: (a.event_ref_log_no = b.event_log_no)
   ->  Index Scan using indx_tuel_erln on t_unit_event_log a  (cost=31711.73..8616020218.25 rows=102959 width=145)
         Filter: ((event_ref_log_no IS NOT NULL) AND (event_status = 1) AND (NOT (subplan)))
         SubPlan
           ->  Materialize  (cost=31711.73..42857.85 rows=830612 width=4)
                 ->  Seq Scan on myevents  (cost=0.00..28041.12 rows=830612 width=4)
   ->  Materialize  (cost=119646.12..130028.77 rows=830612 width=4)
         ->  Sort  (cost=119646.12..121722.65 rows=830612 width=4)
               Sort Key: b.event_log_no
               ->  Seq Scan on myevents b  (cost=0.00..28041.12 rows=830612 width=4)
(11 rows)