Thread: strange explain
Hi, I've got performance problem and while I'dont ready to describe it I'd like to ask about strange explain: tour=# explain analyze select * from tours where ( operator_id in (2,3,4,5,7) and type_id = 2 ) or ( operator_id = 8 and type_id=4 ); NOTICE: QUERY PLAN: Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091)(actual time=0.26..0.26 rows=0 loops=1) Total runtime: 0.45 msec EXPLAIN What does many 'type_idx' means ? Theare are 2 indices - operator_idx and type_idx. Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > tour=# explain analyze select * from tours where > ( operator_id in (2,3,4,5,7) and type_id = 2 ) or > ( operator_id = 8 and type_id=4 ); > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091)(actual time=0.26..0.26 rows=0 loops=1) > What does many 'type_idx' means ? Multiple indexscans. It looks to me like your WHERE clause is being flattened into ( operator_id = 2 and type_id=2 ) or ( operator_id = 3 and type_id=2 ) or ( operator_id= 4 and type_id=2 ) or ( operator_id = 5 and type_id=2 ) or ( operator_id = 7 and type_id=2) or ( operator_id = 8 and type_id=4 ) and then it has a choice of repeated indexscans on operator_id or type_id. Depending on the selectivity stats it might pick either. You might find that a 2-column index on both would be a win. regards, tom lane
It appears it scanes the type_idx once per opereator. IN gets broken down into ORs Is this what the TODO entry 'Make IN / NOT IN have similar performance as EXISTS' means? -- Rod ----- Original Message ----- From: "Oleg Bartunov" <oleg@sai.msu.su> To: "Pgsql Hackers" <pgsql-hackers@postgresql.org>; "Tom Lane" <tgl@sss.pgh.pa.us> Sent: Monday, May 13, 2002 9:42 AM Subject: [HACKERS] strange explain > Hi, > > > I've got performance problem and while I'dont ready to describe it > I'd like to ask about strange explain: > > tour=# explain analyze select * from tours where > ( operator_id in (2,3,4,5,7) and type_id = 2 ) or > ( operator_id = 8 and type_id=4 ); > > NOTICE: QUERY PLAN: > > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091) (actual time=0.26..0.26 rows=0 loops=1) > Total runtime: 0.45 msec > > EXPLAIN > > What does many 'type_idx' means ? > > Theare are 2 indices - operator_idx and type_idx. > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, sci.researcher, hostmaster of AstroNet, > Sternberg Astronomical Institute, Moscow University (Russia) > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(095)939-16-83, +007(095)939-23-83 > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) >
"Rod Taylor" <rbt@zort.ca> writes: > Is this what the TODO entry 'Make IN / NOT IN have similar performance > as EXISTS' means? No. The TODO item is talking about IN with a sub-SELECT, which is not optimized at all at the moment. IN with a list of scalar values is converted to ((x = value1) OR (x = value2) OR ...), which we can do something with. regards, tom lane
Oleg Bartunov <oleg@sai.msu.su> writes: > What's the difference for planner between 2 queries ? > tour=# explain analyze select * from tours where > ( operator_id in (2,3,4,5,7) and type_id = 2 ); > tour=# explain analyze select * from tours where > ( operator_id in (2,3,4,5,7) and type_id = 4 ) or > ( operator_id = 8 and type_id = 3); The first one's already in normal form and doesn't need any more flattening. I believe the system will consider a multiple indexscan on operator_idx for it, but probably the cost estimator is concluding that that's a loser compared to one indexscan using type_id = 2. Without any info on the selectivity of these conditions it's hard to say whether that's a correct choice or not. regards, tom lane
Thanks Tom, On Mon, 13 May 2002, Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: > > tour=# explain analyze select * from tours where > > ( operator_id in (2,3,4,5,7) and type_id = 2 ) or > > ( operator_id = 8 and type_id=4 ); > > > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091)(actual time=0.26..0.26 rows=0 loops=1) > > > What does many 'type_idx' means ? > > Multiple indexscans. > > It looks to me like your WHERE clause is being flattened into > > ( operator_id = 2 and type_id=2 ) or > ( operator_id = 3 and type_id=2 ) or > ( operator_id = 4 and type_id=2 ) or > ( operator_id = 5 and type_id=2 ) or > ( operator_id = 7 and type_id=2 ) or > ( operator_id = 8 and type_id=4 ) > this is what I assume. > and then it has a choice of repeated indexscans on operator_id or > type_id. Depending on the selectivity stats it might pick either. > You might find that a 2-column index on both would be a win. > Yes, we've went exactly this way. I'm very exited how planner could be smart. When I played with the query and specify different values of type_id I notice it's chose plans depends on is value exists or not. > regards, tom lane > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Tom, one more question. What's the difference for planner between 2 queries ? For the first query I have plain index scan, but multiple index scan for second. tour=# explain analyze select * from tours where ( operator_id in (2,3,4,5,7) and type_id = 2 ); NOTICE: QUERY PLAN: Index Scan using type_idx on tours (cost=0.00..2.03 rows=1 width=1091) (actual time=0.03..0.03 rows=0 loops=1) Total runtime: 0.16 msec EXPLAIN tour=# explain analyze select * from tours where ( operator_id in (2,3,4,5,7) and type_id = 4 ) or (operator_id = 8 and type_id = 3); NOTICE: QUERY PLAN: Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091)(actual time=0.27..0.27 rows=0 loops=1) Total runtime: 0.44 msec EXPLAIN On Mon, 13 May 2002, Tom Lane wrote: > Oleg Bartunov <oleg@sai.msu.su> writes: > > tour=# explain analyze select * from tours where > > ( operator_id in (2,3,4,5,7) and type_id = 2 ) or > > ( operator_id = 8 and type_id=4 ); > > > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=0.00..12.25 rows=1 width=1091)(actual time=0.26..0.26 rows=0 loops=1) > > > What does many 'type_idx' means ? > > Multiple indexscans. > > It looks to me like your WHERE clause is being flattened into > > ( operator_id = 2 and type_id=2 ) or > ( operator_id = 3 and type_id=2 ) or > ( operator_id = 4 and type_id=2 ) or > ( operator_id = 5 and type_id=2 ) or > ( operator_id = 7 and type_id=2 ) or > ( operator_id = 8 and type_id=4 ) > > and then it has a choice of repeated indexscans on operator_id or > type_id. Depending on the selectivity stats it might pick either. > You might find that a 2-column index on both would be a win. > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
> EXPLAIN > tour=# explain analyze select * from tours where > ( operator_id in (2,3,4,5,7) and type_id = 4 ) or > ( operator_id = 8 and type_id = 3); > NOTICE: QUERY PLAN: > > Index Scan using type_idx, type_idx, type_idx, type_idx, type_idx, type_idx on tours (cost=> > 0.00..12.25 rows=1 width=1091) (actual time=0.27..0.27 rows=0 loops=1) > Total runtime: 0.44 msec Actually this plan looks very strange to me. One would expect it to only use type_idx twice (in lack of a better index (type_id, operator_id)). Fetch all rows with type_id=4 and then filter the result on operator_id in (...), then do type_id=3 and filter operator_id=8. Seems there is room for another performance improvement here :-) Andreas
"Zeugswetter Andreas SB SD" <ZeugswetterA@spardat.at> writes: >> tour=# explain analyze select * from tours where >> ( operator_id in (2,3,4,5,7) and type_id = 4 ) or >> ( operator_id = 8 and type_id = 3); > Actually this plan looks very strange to me. One would expect it to only use > type_idx twice (in lack of a better index (type_id, operator_id)). > Seems there is room for another performance improvement here :-) Yeah, this demonstrates that reducing the quals to canonical form isn't always the best thing to do. Or maybe we could just look for duplicate indexqual conditions at the end of the process? regards, tom lane