Thread: Damn slow query
Hello, i've got this query that's really slow... Figure this: testdb=> select now() ; select gid from bs where gid not in ( select x from z2test ); select now(); now -------------------------------2002-10-09 22:37:21.234627+02 (1 row) gid ----------<lotsa rows> (524 rows) now -------------------------------2002-10-09 23:20:53.227844+02 (1 row) That's 45 minutes i don't wanna spend in there... I got indexes: testdb=> \d bs_gid_idx Index "bs_gid_idx"Column | Type --------+-----------------------gid | character varying(16)online | smallint btree testdb=> \d z2test_x_idx; Index "z2test_x_idx"Column | Type --------+-----------------------x | character varying(16) btree Rowcounts are: testdb=> select count(*) from bs ; select count(*) from z2test ;count -------25376 (1 row) count -------19329 (1 row) The bs table have many other columns besides the gid one, the z2test table only has the x column. How can i speed this query up? It never scans by the indexes. I know it's a lot of iterations anyway i do it, but this is too damn slow. I can't profile anything at this box, because it's in production state, but if you really want me to, i'll do it tomorrow on another box. Magnus -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-Programmer/Networker [|] Magnus Naeslund -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
On Wed, 9 Oct 2002, Magnus Naeslund(f) wrote: > Hello, i've got this query that's really slow... > Figure this: > > testdb=> select now() ; select gid from bs where gid not in ( select x > from z2test ); select now(); Per FAQ suggestion, try something like select gid from bs where not exists (select * from z2test wherez2test.x=bs.gid); to see if it is faster.
Magnus Naeslund(f) wrote: > Hello, i've got this query that's really slow... > Figure this: > > testdb=> select now() ; select gid from bs where gid not in ( select x > from z2test ); select now(); "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you rewrite this as: select b.gid from bs b where not exists (select 1 from z2test z where z.x = b.gid); or possibly: select b.gid from bs b left join z2test z on z.x = b.gid where z.x IS NULL; HTH, Joe
Joe Conway <mail@joeconway.com> wrote: > "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you > rewrite this as: > ... Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > Per FAQ suggestion, try something like ... Thanks alot, below are the results on your suggestions, quite an dramatic differance (but this is another box, faster, and running 7.3b2 so the 45 minutes doesn't hold here, but it took more than 10 minutes before i stopped the original query). Is this an todo item, or should every user figure this out (yeah i know i should have read the FAQ when it went so totally bad). The NOT IN it seems quite natural here, but then again, i don't think as the db as you do :) mag=> \timing Timing is on. mag=> explain analyze select count(gid) from bs where not exists ( select * from z2test where z2test.x=bs.gid );Aggregate (cost=129182.18..129182.18 rows=1 width=9) (actual time=590.90..590.90 rows=1 loops=1) -> Seq Scan on bs (cost=0.00..129150.46 rows=12688 width=9) (actual time=42.57..590.46 rows=524 loops=1) Filter: (NOT (subplan)) SubPlan -> Index Scan using z2temp_x_idxon z2test (cost=0.00..5.07 rows=1 width=9) (actual time=0.02..0.02 rows=1 loops=25376) Index Cond: (x = $0)Total runtime: 591.01 msec Time: 592.25 ms mag=> EXPLAIN analyze select count(b.gid) from bs b left join z2test z on z.x = b.gid where z.x IS NULL;Aggregate (cost=1703.65..1703.65 rows=1 width=18) (actual time=370.31..370.31 rows=1 loops=1) -> Hash Join (cost=346.61..1640.21 rows=25376 width=18) (actual time=75.45..369.91 rows=524 loops=1) Hash Cond: ("outer".gid = "inner".x) Filter: ("inner".x IS NULL) -> Seq Scan on bs b (cost=0.00..595.76 rows=25376 width=9) (actual time=0.01..34.20 rows=25376 loops=1) -> Hash (cost=298.29..298.29 rows=19329 width=9) (actual time=43.82..43.82 rows=0 loops=1) -> Seq Scan on z2test z (cost=0.00..298.29 rows=19329 width=9) (actual time=0.02..22.69 rows=19329 loops=1)Total runtime: 370.42 msec Time: 371.90 ms mag=> Magnus
Magnus Naeslund(f) wrote: > Joe Conway <mail@joeconway.com> wrote: > > "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you > > rewrite this as: > > > > ... > > Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote: > > Per FAQ suggestion, try something like > > ... > > Thanks alot, below are the results on your suggestions, quite an > dramatic differance (but this is another box, faster, and running 7.3b2 > so the 45 minutes doesn't hold here, but it took more than 10 minutes > before i stopped the original query). > > Is this an todo item, or should every user figure this out (yeah i know > i should have read the FAQ when it went so totally bad). > The NOT IN it seems quite natural here, but then again, i don't think as > the db as you do :) We already have a TODO item: * Allow Subplans to use efficient joins(hash, merge) with upper variable -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> wrote: > > We already have a TODO item: > > * Allow Subplans to use efficient joins(hash, merge) with upper > variable Cool. One thing to note here is that the JOIN query that Joe suggested is both faster than the subselect thing (no suprise) but also don't care if z2test has an index on it or not. The subselect query started taking huge amount of time again if i dropped the z2test_x_idx ... So if the todo could somehow figure out that that subselect should be an JOIN instead of an NOT EXISTS query, that would be great, because the index on z2test isn't that super-obvious (i think, because i know the data is tiny). Magnus -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-Programmer/Networker [|] Magnus Naeslund -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Magnus Naeslund(f) wrote: > One thing to note here is that the JOIN query that Joe suggested is both > faster than the subselect thing (no suprise) but also don't care if > z2test has an index on it or not. It's worth noting though that JOIN is not always the fastest method. I've found situations where NOT EXISTS was significantly faster than the LEFT JOIN method (although both are usually orders of magnatude faster than NOT IN). Joe