Thread: strange explain

strange explain

From
Oleg Bartunov
Date:
Hi,


I've got performance problem and while I'dont ready to describe it
I'd like to ask about strange explain:

tour=# explain analyze  select * from tours  where            ( operator_id in (2,3,4,5,7) and type_id = 2 )  or
   ( operator_id = 8 and type_id=4 );
 

NOTICE:  QUERY PLAN:

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours  (cost=0.00..12.25 rows=1
width=1091)(actual time=0.26..0.26 rows=0 loops=1)
 
Total runtime: 0.45 msec

EXPLAIN

What does many 'type_idx' means ?

Theare are 2 indices - operator_idx and type_idx.

Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: strange explain

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> tour=# explain analyze  select * from tours  where
>              ( operator_id in (2,3,4,5,7) and type_id = 2 )  or
>              ( operator_id = 8 and type_id=4 );

> Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours  (cost=0.00..12.25 rows=1
width=1091)(actual time=0.26..0.26 rows=0 loops=1)
 

> What does many 'type_idx' means ?

Multiple indexscans.

It looks to me like your WHERE clause is being flattened into
            ( operator_id = 2 and type_id=2 ) or            ( operator_id = 3 and type_id=2 ) or            (
operator_id= 4 and type_id=2 ) or            ( operator_id = 5 and type_id=2 ) or            ( operator_id = 7 and
type_id=2) or            ( operator_id = 8 and type_id=4 )
 

and then it has a choice of repeated indexscans on operator_id or
type_id.  Depending on the selectivity stats it might pick either.
You might find that a 2-column index on both would be a win.
        regards, tom lane


Re: strange explain

From
"Rod Taylor"
Date:
It appears it scanes the type_idx once per opereator.  IN gets broken
down into ORs

Is this what the TODO entry 'Make IN / NOT IN have similar performance
as EXISTS' means?
--
Rod
----- Original Message -----
From: "Oleg Bartunov" <oleg@sai.msu.su>
To: "Pgsql Hackers" <pgsql-hackers@postgresql.org>; "Tom Lane"
<tgl@sss.pgh.pa.us>
Sent: Monday, May 13, 2002 9:42 AM
Subject: [HACKERS] strange explain


> Hi,
>
>
> I've got performance problem and while I'dont ready to describe it
> I'd like to ask about strange explain:
>
> tour=# explain analyze  select * from tours  where
>              ( operator_id in (2,3,4,5,7) and type_id = 2 )  or
>              ( operator_id = 8 and type_id=4 );
>
> NOTICE:  QUERY PLAN:
>
> Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx,
type_idx on tours  (cost=0.00..12.25 rows=1 width=1091) (actual
time=0.26..0.26 rows=0 loops=1)
> Total runtime: 0.45 msec
>
> EXPLAIN
>
> What does many 'type_idx' means ?
>
> Theare are 2 indices - operator_idx and type_idx.
>
>
> Regards,
> Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of
broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to
majordomo@postgresql.org)
>



Re: strange explain

From
Tom Lane
Date:
"Rod Taylor" <rbt@zort.ca> writes:
> Is this what the TODO entry 'Make IN / NOT IN have similar performance
> as EXISTS' means?

No.  The TODO item is talking about IN with a sub-SELECT, which is not
optimized at all at the moment.  IN with a list of scalar values is
converted to ((x = value1) OR (x = value2) OR ...), which we can do
something with.
        regards, tom lane


Re: strange explain

From
Tom Lane
Date:
Oleg Bartunov <oleg@sai.msu.su> writes:
> What's the difference for planner between 2 queries ?

> tour=# explain analyze  select * from tours  where
>               ( operator_id in (2,3,4,5,7) and type_id = 2 );

> tour=# explain analyze  select * from tours  where
>            ( operator_id in (2,3,4,5,7) and type_id = 4 ) or
>            ( operator_id = 8 and type_id = 3);

The first one's already in normal form and doesn't need any more
flattening.  I believe the system will consider a multiple indexscan
on operator_idx for it, but probably the cost estimator is concluding
that that's a loser compared to one indexscan using type_id = 2.
Without any info on the selectivity of these conditions it's hard to say
whether that's a correct choice or not.
        regards, tom lane


Re: strange explain

From
Oleg Bartunov
Date:
Thanks Tom,


On Mon, 13 May 2002, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
> > tour=# explain analyze  select * from tours  where
> >              ( operator_id in (2,3,4,5,7) and type_id = 2 )  or
> >              ( operator_id = 8 and type_id=4 );
>
> > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours  (cost=0.00..12.25 rows=1
width=1091)(actual time=0.26..0.26 rows=0 loops=1)
 
>
> > What does many 'type_idx' means ?
>
> Multiple indexscans.
>
> It looks to me like your WHERE clause is being flattened into
>
>              ( operator_id = 2 and type_id=2 ) or
>              ( operator_id = 3 and type_id=2 ) or
>              ( operator_id = 4 and type_id=2 ) or
>              ( operator_id = 5 and type_id=2 ) or
>              ( operator_id = 7 and type_id=2 ) or
>              ( operator_id = 8 and type_id=4 )
>

this is what I assume.

> and then it has a choice of repeated indexscans on operator_id or
> type_id.  Depending on the selectivity stats it might pick either.
> You might find that a 2-column index on both would be a win.
>

Yes, we've went exactly this way.

I'm very exited how planner could be smart. When I played with the query
and specify different values of type_id I notice it's chose plans depends
on is value exists or not.


>             regards, tom lane
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: strange explain

From
Oleg Bartunov
Date:
Tom,

one more question.

What's the difference for planner between 2 queries ?
For the first query I have plain index scan, but multiple
index scan for second.

tour=# explain analyze  select * from tours  where             ( operator_id in (2,3,4,5,7) and type_id = 2 );
NOTICE:  QUERY PLAN:

Index Scan using type_idx on tours  (cost=0.00..2.03 rows=1 width=1091) (actual time=0.03..0.03 rows=0 loops=1)
Total runtime: 0.16 msec

EXPLAIN
tour=# explain analyze  select * from tours  where          ( operator_id in (2,3,4,5,7) and type_id = 4 ) or
(operator_id = 8 and type_id = 3);
 
NOTICE:  QUERY PLAN:

Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours  (cost=0.00..12.25 rows=1
width=1091)(actual time=0.27..0.27 rows=0 loops=1)
 
Total runtime: 0.44 msec

EXPLAIN




On Mon, 13 May 2002, Tom Lane wrote:

> Oleg Bartunov <oleg@sai.msu.su> writes:
> > tour=# explain analyze  select * from tours  where
> >              ( operator_id in (2,3,4,5,7) and type_id = 2 )  or
> >              ( operator_id = 8 and type_id=4 );
>
> > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours  (cost=0.00..12.25 rows=1
width=1091)(actual time=0.26..0.26 rows=0 loops=1)
 
>
> > What does many 'type_idx' means ?
>
> Multiple indexscans.
>
> It looks to me like your WHERE clause is being flattened into
>
>              ( operator_id = 2 and type_id=2 ) or
>              ( operator_id = 3 and type_id=2 ) or
>              ( operator_id = 4 and type_id=2 ) or
>              ( operator_id = 5 and type_id=2 ) or
>              ( operator_id = 7 and type_id=2 ) or
>              ( operator_id = 8 and type_id=4 )
>
> and then it has a choice of repeated indexscans on operator_id or
> type_id.  Depending on the selectivity stats it might pick either.
> You might find that a 2-column index on both would be a win.
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: strange explain

From
"Zeugswetter Andreas SB SD"
Date:
> EXPLAIN
> tour=# explain analyze  select * from tours  where
>            ( operator_id in (2,3,4,5,7) and type_id = 4 ) or
>            ( operator_id = 8 and type_id = 3);
> NOTICE:  QUERY PLAN:
>
> Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours  (cost=>
> 0.00..12.25 rows=1 width=1091) (actual time=0.27..0.27 rows=0 loops=1)
> Total runtime: 0.44 msec

Actually this plan looks very strange to me. One would expect it to only use
type_idx twice (in lack of a better index (type_id, operator_id)).
Fetch all rows with type_id=4 and then filter the result on operator_id in (...),
then do type_id=3 and filter operator_id=8.
Seems there is room for another performance improvement here :-)

Andreas


Re: strange explain

From
Tom Lane
Date:
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes:
>> tour=# explain analyze  select * from tours  where
>> ( operator_id in (2,3,4,5,7) and type_id = 4 ) or
>> ( operator_id = 8 and type_id = 3);

> Actually this plan looks very strange to me. One would expect it to only use 
> type_idx twice (in lack of a better index (type_id, operator_id)). 
> Seems there is room for another performance improvement here :-)

Yeah, this demonstrates that reducing the quals to canonical form isn't
always the best thing to do.

Or maybe we could just look for duplicate indexqual conditions at the
end of the process?
        regards, tom lane