Thread: query with subquery abnormally slow?
I'm doing a SELECT query with a subquery on a table with 12K rows but it is very slow (10 seconds+). The query looks like this: select * from items where package in (select package from items where ...blah... group by package) Where package is an indexed varchar field. The subquery generates 50 rows, and completes quickly when I run it alone. Now if I manually replace the subquery with these 50 strings - like select * from items where package in ('r1', 'r2', 'r3', .., 'r50') this new query completes in less than a second. The result of the complete query is 500 rows. Is this natural? Is there anything I can do to speed things up? (Besides doing N+1 queries where N is the number of rows the subquery would produce.) Oskar Liljeblad (osk@hem.passagen.se)
On Mon, Nov 01, 1999 at 07:47:07PM +0100, Oskar Liljeblad wrote: > I'm doing a SELECT query with a subquery on a table with 12K rows > but it is very slow (10 seconds+). The query looks like this: Ok, I should have read the latest FAQ and checked the mail archive before posting this. I've read about the EXISTS-workaround, but it seems it can't be applied to my query because it deals with one table only. Or can it? Oskar Liljeblad (osk@hem.passagen.se)
> select * > from items > where package in > (select package > from items > where ...blah... > group by package) Can't see why you don't rewrite this as one query: select * from items where ... blah ... order by package; (is it aggregates in the where clause?) Assuming you do need to do it the way you have done it , SELECT * FROM items WHERE NOT EXISTS (SELECT package FROM items itemscopy WHERE ... blah ... AND itemscopy.itemid=items.itemid GROUP BY package); should do it. itemid should be replaced by whatever the primary key of the items table is. Note that in blah, fields must be referred to as itemcopy.field1,itemcopy.field2, etc. Yours, Moray McConnachie
On Mon, Nov 01, 1999 at 07:47:07PM +0100, Oskar Liljeblad wrote: > I'm doing a SELECT query with a subquery on a table with 12K rows > but it is very slow (10 seconds+). The query looks like this: > > select * > from items > where package in > (select package > from items > where ...blah... > group by package) I could do it much faster with this query: select distinct i2.package, i1.* from items i1, items i2 where i1.package = i2.package and ...blah... but it is still at least three times slower than doing the subquery first, and for each row in the result another query. Any ideas? Oskar Liljeblad (osk@hem.passagen.se)
have you tried to: explain <your select statement> to see the cost of the parts? I would try this. Then I would try a : create index <indexname> on items (package) and repeat the explain to see if it changes the cost. Bill On Mon, Nov 01, 1999 at 09:10:30PM +0100, Oskar Liljeblad wrote: >On Mon, Nov 01, 1999 at 07:47:07PM +0100, Oskar Liljeblad wrote: >> I'm doing a SELECT query with a subquery on a table with 12K rows >> but it is very slow (10 seconds+). The query looks like this: >> >> select * >> from items >> where package in >> (select package >> from items >> where ...blah... >> group by package) > >I could do it much faster with this query: > > select distinct i2.package, i1.* > from items i1, items i2 > where i1.package = i2.package > and ...blah... > >but it is still at least three times slower than >doing the subquery first, and for each row in the >result another query. Any ideas? > >Oskar Liljeblad (osk@hem.passagen.se) > >************ > Bill Brandt brandtwr@draaw.net http://www.draaw.net/
On Mon, Nov 01, 1999 at 07:59:20PM -0000, Moray McConnachie wrote: > > select * > > from items > > where package in > > (select package > > from items > > where ...blah... > > group by package) > > Can't see why you don't rewrite this as one query: > > select * from items where ... blah ... order by package; > (is it aggregates in the where clause?) The items table consists of some columns `performer', `title', and `package'. Each row corresponds to a song (or similar), and a `package' is the album (or similar) the song is on. (This design could certainly be improved with multiple tables, but it wasn't created that way.) My query is supposed to list all songs on all albums (packages) which has at least one song by a specific artist (performer). ("Can you show me some stats on the albums which artist X has participated in?".) > Assuming you do need to do it the way you have done it , ... > should do it. itemid should be replaced by whatever the primary key of the > items table is. Note that in blah, fields must be referred to as > itemcopy.field1,itemcopy.field2, etc. This query works: select *from items i1where exists (select package from items i2 where i2.performer ~ '...' and i1.package = i2.package group by i2.package) However this was also very slow (>30 seconds). When I replaced "and i1.package = i2.package" with "and i1.id = i2.id" only some songs from the matching albums were returned. (The songs which matched performer ~ '...'.) Oskar Liljeblad (osk@hem.passagen.se)
Oskar Liljeblad <osk@hem.passagen.se> writes: > I'm doing a SELECT query with a subquery on a table with 12K rows > but it is very slow (10 seconds+). The query looks like this: > select * > from items > where package in > (select package > from items > where ...blah... > group by package) Have you considered something like select i1.* from items i1, items i2 where i1.package = i2.package and i2.performer ~ '...'; This would only be fast given an index on package, I think, but you said you had one... regards, tom lane
I inadvertently deleted Oskar's message where he described what he is trying to do at a higher level, but I don't think I'm completely missing the point with the following: The table looks like so: select * from test; [ ssmldb=> select * from test; package |artist |song ----------------------+-------------------+----------------- Surf Comp |Mermen |Pulpin Surf Comp |Bambi Molesters |Tremor Surf Comp |The Squares |Squaranoid Dumb Loud Hollow Twang|Bambi Molesters |Point Break Dumb Loud Hollow Twang|Bambi Molesters |Glider Songs of the Cows |Mermen |Songs of the Cows Surfmania |The Aqua Velvets |Surfmania Surf Comp 2 |Mermen |Slo Mo H50 Surf Comp 2 |Los Straightjackets|Caveman ] select t2.* from test t1, test t2 where t1.package = t2.package and t1.artist = 'Mermen' and t2.artist != 'Mermen'; [ package |artist |song -----------+-------------------+---------- Surf Comp |Bambi Molesters |Tremor Surf Comp |The Squares |Squaranoid Surf Comp 2|Los Straightjackets|Caveman (3 rows) ] The query above shows all songs by a different artist that share an album with a song by the artist in question. It however omits the songs by the artist in question. To get those back, you could try: select distinct t1.* from test t1, test t2 where t1.package = t2.package and ((t1.artist = 'Mermen' and t2.artist != 'Mermen')or (t1.artist != 'Mermen' and t2.artist = 'Mermen')); or use a union clause: select t1.* from test t1, test t2 where t1.package = t2.package and (t2.artist = 'Mermen' and t1.artist != 'Mermen') union select t3.* from test t3, test t4 where t3.package = t4.package and (t3.artist = 'Mermen' and t4.artist!= 'Mermen') ; I don't know how these do for speed as I don't care to create a big table and indices and all that, but they do not use EXITS and it seems with suitable indices they should be fairly fast. -Z-