Re: need some help understanding sloq query - Mailing list pgsql-sql
From | Esger Abbink |
---|---|
Subject | Re: need some help understanding sloq query |
Date | |
Msg-id | 200112061334.OAA03446@fikkie.vesc.nl Whole thread Raw |
In response to | need some help understanding sloq query (Esger Abbink <esger@vesc.nl>) |
Responses |
Re: need some help understanding sloq query
|
List | pgsql-sql |
> Esger, > > > select some_other_fields from ents e, qtys q where e.set_id = > > q.set_id and > > e.e_id = q.e_id and e.set_id in (select set_id from datasets > > where update_id in (select cur_update_id from current)) and > > q.other_field = some_const ; > > > > this query takes ages :( > > I'm not surprised. You're doing a nested subselect with the IN > operator; frankly, you couldn't make a relatively simple query any > slower than the above. It looks like the query parser is doing its best > to optimize, but you've forced it to compare every row in ents JOIN qtys > against the whole datasets table. What's wrong with: > > SELECT some_fields > FROM ents JOIN qtys USING (set_id) > JOIN datasets USING (set_id) > JOIN current ON datasets.update_id = current.cur_update_id > WHERE other_field = some_constant; > i figured i wasnt doing things "right" ;) unfortunately the query above takes down my postgres (7.0.3 tried on 2 servers). 1 join no problem, 2 joins results in: DEBUG: Data Base System is in production state at Thu Dec 6 11:42:18 2001 Server process (pid 1095) exited with status 11 at Thu Dec 6 11:45:08 2001 Terminating any active server processes... Server processes were terminated at Thu Dec 6 11:45:08 2001 Reinitializing shared memory and semaphores The Data Base System is starting up this is directly related to executing the query (or just the part with the first 2 joins) and is 100% reproducible sofar. secondly, from the costs explain gives for the query part i can get through (1st join: merge join costs 7800, index scancosts 2000, index scan costs 4200) it seems that doing a simpler variant of my query (see the other post -followup-)with IN replaced by '=' is a lot faster. the resulting query would be something like: select some_other_fields from ents e, qtys q where e.set_id = q.set_id and e.e_id = q.e_id and e.set_id = X and q.other_field = some_const ; this query has negligable costs (nested loop cost 200, index scan cost 50, index scan cost 4 on test db) but should be envelopedin some for-loop construct to feed it the values for X after which the answers should be "glued" together. the setof X's has no theoritical maximum that i can recall atm but in practice it currently is always size 1 and i dont expectmuch if any growth. i cant however assume its size 1. (the above does sound like a complex answer to a simple problem again though..) > That allows Tom's magic query engine to do its work optimizing. > Assuming, of course, that there *are* indexes on update_id and > datasets.set_id ... > > I heartily recommend that you read Celko's "SQL for Smarties" (book > reviews: http://techdocs.postgresql.org/bookreviews.php ). You seem to > have fallen into the trap of using complex queries to answer simple > questions, and your database performance is suffering because of it. will investigate. > > -Josh Berkus > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > > > -- NeoMail - Webmail that doesn't suck... as much. http://neomail.sourceforge.net