Thread: Damn slow query

Damn slow query

From
"Magnus Naeslund(f)"
Date:
Hello, i've got this query that's really slow...
Figure this:

testdb=> select now() ; select gid from bs where gid not in ( select x
from z2test ); select now();             now
-------------------------------2002-10-09 22:37:21.234627+02
(1 row)
  gid
----------<lotsa rows>
(524 rows)             now
-------------------------------2002-10-09 23:20:53.227844+02
(1 row)

That's 45 minutes i don't wanna spend in there...
I got indexes:

testdb=> \d bs_gid_idx Index "bs_gid_idx"Column |         Type
--------+-----------------------gid    | character varying(16)online | smallint
btree

testdb=> \d z2test_x_idx;     Index "z2test_x_idx"Column |         Type
--------+-----------------------x      | character varying(16)
btree

Rowcounts are:

testdb=> select count(*) from bs ; select count(*) from z2test ;count
-------25376
(1 row)
count
-------19329
(1 row)

The bs table have many other columns besides the gid one, the z2test
table only has the x column.

How can i speed this query up?
It never scans by the indexes.
I know it's a lot of iterations anyway i do it, but this is too damn
slow.

I can't profile anything at this box, because it's in production state,
but if you really want me to, i'll do it tomorrow on another box.

Magnus

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-Programmer/Networker [|] Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



Re: Damn slow query

From
Stephan Szabo
Date:
On Wed, 9 Oct 2002, Magnus Naeslund(f) wrote:

> Hello, i've got this query that's really slow...
> Figure this:
>
> testdb=> select now() ; select gid from bs where gid not in ( select x
> from z2test ); select now();

Per FAQ suggestion, try something like
select gid from bs where not exists (select * from z2test wherez2test.x=bs.gid);
to see if it is faster.




Re: Damn slow query

From
Joe Conway
Date:
Magnus Naeslund(f) wrote:
> Hello, i've got this query that's really slow...
> Figure this:
> 
> testdb=> select now() ; select gid from bs where gid not in ( select x
> from z2test ); select now();

"IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you rewrite 
this as:

select b.gid from bs b where not exists (select 1 from z2test z where z.x = 
b.gid);

or possibly:

select b.gid from bs b left join z2test z on z.x = b.gid where z.x IS NULL;

HTH,

Joe



Re: Damn slow query

From
"Magnus Naeslund(f)"
Date:
Joe Conway <mail@joeconway.com> wrote:
> "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you
> rewrite this as:
>

...

Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:
> Per FAQ suggestion, try something like

...

Thanks alot, below are the results on your suggestions, quite an
dramatic differance (but this is another box, faster, and running 7.3b2
so the 45 minutes doesn't hold here, but it took more than 10 minutes
before i stopped the original query).

Is this an todo item, or should every user figure this out (yeah i know
i should have read the FAQ when it went so totally bad).
The NOT IN it seems quite natural here, but then again, i don't think as
the db as you do :)

mag=> \timing
Timing is on.
mag=> explain analyze select count(gid) from bs where not exists (
select * from z2test where z2test.x=bs.gid );Aggregate  (cost=129182.18..129182.18 rows=1 width=9) (actual
time=590.90..590.90 rows=1 loops=1)  ->  Seq Scan on bs  (cost=0.00..129150.46 rows=12688 width=9) (actual
time=42.57..590.46 rows=524 loops=1)        Filter: (NOT (subplan))        SubPlan          ->  Index Scan using
z2temp_x_idxon z2test  (cost=0.00..5.07
 
rows=1 width=9) (actual time=0.02..0.02 rows=1 loops=25376)                Index Cond: (x = $0)Total runtime: 591.01
msec

Time: 592.25 ms

mag=> EXPLAIN analyze select count(b.gid) from bs b left join z2test z
on z.x = b.gid where z.x IS NULL;Aggregate  (cost=1703.65..1703.65 rows=1 width=18) (actual
time=370.31..370.31 rows=1 loops=1)  ->  Hash Join  (cost=346.61..1640.21 rows=25376 width=18) (actual
time=75.45..369.91 rows=524 loops=1)        Hash Cond: ("outer".gid = "inner".x)        Filter: ("inner".x IS NULL)
  ->  Seq Scan on bs b  (cost=0.00..595.76 rows=25376 width=9)
 
(actual time=0.01..34.20 rows=25376 loops=1)        ->  Hash  (cost=298.29..298.29 rows=19329 width=9) (actual
time=43.82..43.82 rows=0 loops=1)              ->  Seq Scan on z2test z  (cost=0.00..298.29 rows=19329
width=9) (actual time=0.02..22.69 rows=19329 loops=1)Total runtime: 370.42 msec

Time: 371.90 ms
mag=>


Magnus




Re: Damn slow query

From
Bruce Momjian
Date:
Magnus Naeslund(f) wrote:
> Joe Conway <mail@joeconway.com> wrote:
> > "IN (subselect)" is notoriously slow (in fact it is an FAQ). Can you
> > rewrite this as:
> >
> 
> ...
> 
> Stephan Szabo <sszabo@megazone23.bigpanda.com> wrote:
> > Per FAQ suggestion, try something like
> 
> ...
> 
> Thanks alot, below are the results on your suggestions, quite an
> dramatic differance (but this is another box, faster, and running 7.3b2
> so the 45 minutes doesn't hold here, but it took more than 10 minutes
> before i stopped the original query).
> 
> Is this an todo item, or should every user figure this out (yeah i know
> i should have read the FAQ when it went so totally bad).
> The NOT IN it seems quite natural here, but then again, i don't think as
> the db as you do :)

We already have a TODO item:
* Allow Subplans to use efficient joins(hash, merge) with upper variable

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: Damn slow query

From
"Magnus Naeslund(f)"
Date:
Bruce Momjian <pgman@candle.pha.pa.us> wrote:
>
> We already have a TODO item:
>
> * Allow Subplans to use efficient joins(hash, merge) with upper
> variable

Cool.
One thing to note here is that the JOIN query that Joe suggested is both
faster than the subselect thing (no suprise) but also don't care if
z2test has an index on it or not.
The subselect query started taking huge amount of time again if i
dropped the z2test_x_idx ...

So if the todo could somehow figure out that that subselect should be an
JOIN instead of an NOT EXISTS query, that would be great, because the
index on z2test isn't that super-obvious (i think, because i know the
data is tiny).

Magnus

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-Programmer/Networker [|] Magnus Naeslund
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-



Re: Damn slow query

From
Joe Conway
Date:
Magnus Naeslund(f) wrote:
> One thing to note here is that the JOIN query that Joe suggested is both
> faster than the subselect thing (no suprise) but also don't care if
> z2test has an index on it or not.

It's worth noting though that JOIN is not always the fastest method. I've 
found situations where NOT EXISTS was significantly faster than the LEFT JOIN 
method (although both are usually orders of magnatude faster than NOT IN).

Joe