Thread: improvements to query with hierarchical elements
<div class="Section1"><p class="MsoNormal">Greetings,<p class="MsoNormal"> <p class="MsoNormal">I have a complex query whichI am trying to figure out the most efficient way of performing.<p class="MsoNormal"> <p class="MsoNormal">My databaseis laid out as follows:<p class="MsoNormal">items –have_many-> events –have_many-> event_locations –have_many->locations<p class="MsoNormal"> <p class="MsoNormal">also rows in the location_links table link two locationstogether in a parent-child relationship and rows in the location_descendants table provide a full list of the descendantsof a<p class="MsoNormal">particular location.<p class="MsoNormal"> <p class="MsoNormal">I am trying to find alllocations which both are direct children of a given parent location, and are associated with at least one item in a constrainedsubset of items.<p class="MsoNormal">(eg. Find all states of the USA in which at least one wooden axe was made.Also find the number of wooden axes made in each state.)<p class="MsoNormal"> <p class="MsoNormal">I have developedthe following query:<p class="MsoNormal"> <p class="MsoNormal">SELECT locations.*,<p class="MsoNormal"> location_ids.item_count AS item_count<p class="MsoNormal">FROM locations<p class="MsoNormal"> JOIN<p class="MsoNormal"> (SELECT immediate_descendants.ancestor_id AS id,<p class="MsoNormal"> COUNT(DISTINCT creation_events.item_id) AS item_count<p class="MsoNormal"> FROM event_locations<p class="MsoNormal"> JOIN<p class="MsoNormal"> (SELECT *<p class="MsoNormal"> FROM location_descendants<pclass="MsoNormal"> WHERE ancestor_id IN<p class="MsoNormal"> (SELECT child_id<p class="MsoNormal"> FROM location_links<p class="MsoNormal"> WHERE parent_id = *<b>note 1</b>*<p class="MsoNormal"> )<p class="MsoNormal"> ) AS immediate_descendants<pclass="MsoNormal"> ON event_locations.location_id = immediate_descendants.descendant_id<pclass="MsoNormal"> JOIN<p class="MsoNormal"> (SELECT *<p class="MsoNormal"> FROM events<pclass="MsoNormal"> WHERE item_id IN (*note 2*) AND association = 'creation'<p class="MsoNormal"> ) AS creation_events<p class="MsoNormal"> ON event_locations.event_id = creation_events.id<p class="MsoNormal"> GROUP BY immediate_descendants.ancestor_id<pclass="MsoNormal"> ) AS location_ids ON locations.id = location_ids.id<pclass="MsoNormal"> <p class="MsoNormal">*<b>note 1</b>* - the id of the parent location.<p class="MsoNormal">*<b>note2</b>* - the query which returns a list of constrained item ids<p class="MsoNormal"> <p class="MsoNormal">Thisworks but I am looking for any way to improve the performance of the query (including changing thelayout of the tables). Any ideas, suggestions or general pointers would be greatly appreciated.<p class="MsoNormal"> <pclass="MsoNormal">Thanks very much,<p class="MsoNormal">Ryan</div>
>Date: Sun, 20 Jan 2008 20:01:08 -0800 >From: Ryan Wallace <rywall@interchange.ubc.ca> >To: pgsql-sql@postgresql.org >Subject: improvements to query with hierarchical elements >Message-ID: <002601c85be2$410aea30$c320be90$@ubc.ca> >Greetings, > >I have a complex query which I am trying to figure out the most >efficient >way of performing. > >My database is laid out as follows: >items -have_many-> events -have_many-> event_locations -have_many-> >locations > >also rows in the location_links table link two locations together in a >parent-child relationship and rows in the location_descendants table >provide >a full list of the descendants of a >particular location. > >I am trying to find all locations which both are direct children of a >given >parent location, and are associated with at least one item in a >constrained >subset of items. >(eg. Find all states of the USA in which at least one wooden axe was >made. >Also find the number of wooden axes made in each state.) > >I have developed the following query: > >SELECT locations.*, > location_ids.item_count AS item_count >FROM locations > JOIN > (SELECT immediate_descendants.ancestor_id AS id, > COUNT(DISTINCT creation_events.item_id) AS >item_count > FROM event_locations > JOIN > (SELECT * > FROM location_descendants > WHERE ancestor_id IN > (SELECT child_id > FROM location_links > WHERE parent_id = *note 1* > ) > ) AS immediate_descendants > ON event_locations.location_id = >immediate_descendants.descendant_id > JOIN > (SELECT * > FROM events > WHERE item_id IN (*note 2*) AND >association = 'creation' > ) AS creation_events > ON event_locations.event_id = >creation_events.id > GROUP BY immediate_descendants.ancestor_id > ) AS location_ids ON locations.id = location_ids.id > >*note 1* - the id of the parent location. >*note 2* - the query which returns a list of constrained item ids > >This works but I am looking for any way to improve the performance of >the >query (including changing the layout of the tables). Any ideas, >suggestions >or general pointers would be greatly appreciated. > >Thanks very much, >Ryan Hi Ryan, I have built some similar queries so I might be able to help you. But it's a little hard (for me) to dig into your query without a test set. Could you please post some create table and insert statements to give us a little test bed to run your query in? I realize that may be a fair bit of work for you but it would help me to give you some ideas. Without seeing a more formal schema and being able to toy with it, I'm not sure I can give good advice. Others may have different opinions which I would welcome. Sincerely, Steve
At 07:24 PM 1/22/2008, you wrote: >Hi all, > >I have created a little test database to help illustrate my situation. > >CREATE TABLE categories ( > id integer NOT NULL, > name character varying(255) NOT NULL, > description character varying(255), > vocabulary_id integer, > derived boolean >); > >CREATE TABLE category_descendants ( > id integer NOT NULL, > ancestor_id integer, > descendant_id integer, > distance integer, > derived boolean >); > >CREATE TABLE category_links ( > id integer NOT NULL, > parent_id integer, > child_id integer, > derived boolean >); >[snip..] >As stated in my last post, any help you can give on how to improve >queries of this type would be very much appreciated. > >Thanks! >Ryan > Hi Ryan, I've been toying with your sample data for a bit and I apologize but your query has me baffled. Not that it's wrong - it actually looks very sophisticated, but it seems super complex to me - kind of like how I usually feel reading perl.. :) I'm sure real sql-heads would get it right away but I'm not able to. If you're looking to optimize the use-case you provided in your first email, the best thing I can suggest from what I understand would make an assumption: Are the data in your tables are slowly changing? So could you build some analytic/pre-calculated data into these tables or related supporting ones to guide your searches/queries? For example, if you want to find only records which are immediate children of other records, why not make a table which stores just that information? Your current tables are fully hierarchical which is great, but you want to look things up quickly based on a specific relationship: records who are direct children of a particular record.. So if you made a calculated table that stores this information, you could keep it up to date either by running the calculation script periodically or by attaching updates to relevant triggers / rules. I'm sorry I'm not able to get into the SQL / example you sent further. I got lost in the code, which I'm a little embarrassed to admit but there you are. If you're interested in this idea of precalculating values to optimize your search, I'd be happy to discuss further. Also, Ralph Kimball's Data Warehousing books are excellent on this subject (one of the few authors who truly changed the way I think about data). Steve