Re: need some help understanding sloq query - Mailing list pgsql-sql

From Esger Abbink
Subject Re: need some help understanding sloq query
Date
Msg-id 200112061334.OAA03446@fikkie.vesc.nl
Whole thread Raw
In response to need some help understanding sloq query  (Esger Abbink <esger@vesc.nl>)
Responses Re: need some help understanding sloq query
List pgsql-sql

> Esger,
> 
> > select some_other_fields from ents e, qtys q where e.set_id =
> > q.set_id and
> > e.e_id = q.e_id and e.set_id in (select set_id from datasets
> > where update_id in (select cur_update_id from current)) and
> > q.other_field = some_const ;
> > 
> > this query takes ages :(
> 
> I'm not surprised.  You're doing a nested subselect with the IN
> operator; frankly, you couldn't make a relatively simple query any
> slower than the above.  It looks like the query parser is doing its best
> to optimize, but you've forced it to compare every row in ents JOIN qtys
> against the whole datasets table.  What's wrong with:
> 
> SELECT some_fields 
> FROM ents JOIN qtys USING (set_id)
>     JOIN datasets USING (set_id)
>     JOIN current ON datasets.update_id = current.cur_update_id
> WHERE other_field = some_constant;
> 

i figured i wasnt doing things "right" ;)

unfortunately the query above takes down my postgres (7.0.3 tried on 2 servers). 1 join no problem, 2 joins results
in:

DEBUG:  Data Base System is in production state at Thu Dec  6 11:42:18 2001
Server process (pid 1095) exited with status 11 at Thu Dec  6 11:45:08 2001
Terminating any active server processes...
Server processes were terminated at Thu Dec  6 11:45:08 2001
Reinitializing shared memory and semaphores
The Data Base System is starting up

this is directly related to executing the query (or just the part with the first 2 joins) and is 100% reproducible
sofar.


secondly, from the costs explain gives for the query part i can get through (1st join: merge join costs 7800, index
scancosts 2000, index scan costs 4200) it seems that doing a simpler variant of my query (see the other post
-followup-)with IN replaced by '=' is a lot faster. the resulting query would be something like:
 

select some_other_fields from ents e, qtys q where 
e.set_id = q.set_id and
e.e_id = q.e_id and 
e.set_id = X and
q.other_field = some_const ;

this query has negligable costs (nested loop cost 200, index scan cost 50, index scan cost 4 on test db) but should be
envelopedin some for-loop construct to feed it the values for X after which the answers should be "glued" together. the
setof X's has no theoritical maximum that i can recall atm but in practice it currently is always size 1 and i dont
expectmuch if any growth. i cant however assume its size 1.
 


(the above does sound like a complex answer to a simple problem again though..)

> That allows Tom's magic query engine to do its work optimizing.
> Assuming, of course, that there *are* indexes on update_id and
> datasets.set_id ...
> 
> I heartily recommend that you read Celko's "SQL for Smarties" (book
> reviews: http://techdocs.postgresql.org/bookreviews.php ).  You seem to
> have fallen into the trap of using complex queries to answer simple
> questions, and your database performance is suffering because of it.

will investigate.

> 
> -Josh Berkus
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly
> 
> 
> 

-- 
NeoMail - Webmail that doesn't suck... as much.
http://neomail.sourceforge.net


pgsql-sql by date:

Previous
From: Stephan Szabo
Date:
Subject: Re: Select into
Next
From: James Thornton
Date:
Subject: update returns 1, but no changes have been made