Thread: improvements to query with hierarchical elements

improvements to query with hierarchical elements

From
Ryan Wallace
Date:
<div class="Section1"><p class="MsoNormal">Greetings,<p class="MsoNormal"> <p class="MsoNormal">I have a complex query
whichI am trying to figure out the most efficient way of performing.<p class="MsoNormal"> <p class="MsoNormal">My
databaseis laid out as follows:<p class="MsoNormal">items –have_many-> events –have_many-> event_locations
–have_many->locations<p class="MsoNormal"> <p class="MsoNormal">also rows in the location_links table link two
locationstogether in a parent-child relationship and rows in the location_descendants table provide a full list of the
descendantsof a<p class="MsoNormal">particular location.<p class="MsoNormal"> <p class="MsoNormal">I am trying to find
alllocations which both are direct children of a given parent location, and are associated with at least one item in a
constrainedsubset of items.<p class="MsoNormal">(eg. Find all states of the USA in which at least one wooden axe was
made.Also find the number of wooden axes made in each state.)<p class="MsoNormal"> <p class="MsoNormal">I have
developedthe following query:<p class="MsoNormal"> <p class="MsoNormal">SELECT  locations.*,<p
class="MsoNormal">       location_ids.item_count AS item_count<p class="MsoNormal">FROM    locations<p
class="MsoNormal">       JOIN<p class="MsoNormal">                (SELECT immediate_descendants.ancestor_id AS id,<p
class="MsoNormal">                       COUNT(DISTINCT creation_events.item_id) AS item_count<p
class="MsoNormal">               FROM    event_locations<p class="MsoNormal">                        JOIN<p
class="MsoNormal">                               (SELECT *<p class="MsoNormal">                                FROM   
location_descendants<pclass="MsoNormal">                                WHERE   ancestor_id IN<p
class="MsoNormal">                                       (SELECT child_id<p
class="MsoNormal">                                       FROM    location_links<p
class="MsoNormal">                                       WHERE   parent_id = *<b>note 1</b>*<p
class="MsoNormal">                                       )<p class="MsoNormal">                                ) AS
immediate_descendants<pclass="MsoNormal">                        ON      event_locations.location_id =
immediate_descendants.descendant_id<pclass="MsoNormal">                        JOIN<p
class="MsoNormal">                               (SELECT *<p class="MsoNormal">                                FROM   
events<pclass="MsoNormal">                                WHERE   item_id IN (*note 2*) AND association = 'creation'<p
class="MsoNormal">                               ) AS creation_events<p class="MsoNormal">                       
ON     event_locations.event_id = creation_events.id<p class="MsoNormal">                GROUP BY
immediate_descendants.ancestor_id<pclass="MsoNormal">                ) AS location_ids ON locations.id =
location_ids.id<pclass="MsoNormal"> <p class="MsoNormal">*<b>note 1</b>* - the id of the parent location.<p
class="MsoNormal">*<b>note2</b>* - the query which returns a list of constrained item ids<p class="MsoNormal"> <p
class="MsoNormal">Thisworks but I am looking for any way to improve the performance of the query (including changing
thelayout of the tables). Any ideas, suggestions or general pointers would be greatly appreciated.<p
class="MsoNormal"> <pclass="MsoNormal">Thanks very much,<p class="MsoNormal">Ryan</div> 

Re: improvements to query with hierarchical elements

From
Steve Midgley
Date:
>Date: Sun, 20 Jan 2008 20:01:08 -0800
>From: Ryan Wallace <rywall@interchange.ubc.ca>
>To: pgsql-sql@postgresql.org
>Subject: improvements to query with hierarchical elements
>Message-ID: <002601c85be2$410aea30$c320be90$@ubc.ca>
>Greetings,
>
>I have a complex query which I am trying to figure out the most 
>efficient
>way of performing.
>
>My database is laid out as follows:
>items -have_many-> events -have_many-> event_locations -have_many->
>locations
>
>also rows in the location_links table link two locations together in a
>parent-child relationship and rows in the location_descendants table 
>provide
>a full list of the descendants of a
>particular location.
>
>I am trying to find all locations which both are direct children of a 
>given
>parent location, and are associated with at least one item in a 
>constrained
>subset of items.
>(eg. Find all states of the USA in which at least one wooden axe was 
>made.
>Also find the number of wooden axes made in each state.)
>
>I have developed the following query:
>
>SELECT  locations.*,
>         location_ids.item_count AS item_count
>FROM    locations
>         JOIN
>                 (SELECT immediate_descendants.ancestor_id AS id,
>                         COUNT(DISTINCT creation_events.item_id) AS
>item_count
>                 FROM    event_locations
>                         JOIN
>                                 (SELECT *
>                                 FROM    location_descendants
>                                 WHERE   ancestor_id IN
>                                         (SELECT child_id
>                                         FROM    location_links
>                                         WHERE   parent_id = *note 1*
>                                         )
>                                 ) AS immediate_descendants
>                         ON      event_locations.location_id =
>immediate_descendants.descendant_id
>                         JOIN
>                                 (SELECT *
>                                 FROM    events
>                                 WHERE   item_id IN (*note 2*) AND
>association = 'creation'
>                                 ) AS creation_events
>                         ON      event_locations.event_id =
>creation_events.id
>                 GROUP BY immediate_descendants.ancestor_id
>                 ) AS location_ids ON locations.id = location_ids.id
>
>*note 1* - the id of the parent location.
>*note 2* - the query which returns a list of constrained item ids
>
>This works but I am looking for any way to improve the performance of 
>the
>query (including changing the layout of the tables). Any ideas, 
>suggestions
>or general pointers would be greatly appreciated.
>
>Thanks very much,
>Ryan

Hi Ryan,

I have built some similar queries so I might be able to help you. But 
it's a little hard (for me) to dig into your query without a test set. 
Could you please post some create table and insert statements to give 
us a little test bed to run your query in? I realize that may be a fair 
bit of work for you but it would help me to give you some ideas.

Without seeing a more formal schema and being able to toy with it, I'm 
not sure I can give good advice. Others may have different opinions 
which I would welcome.

Sincerely,

Steve




Re: improvements to query with hierarchical elements

From
Steve Midgley
Date:
At 07:24 PM 1/22/2008, you wrote:
>Hi all,
>
>I have created a little test database to help illustrate my situation.
>
>CREATE TABLE categories (
>     id integer NOT NULL,
>     name character varying(255) NOT NULL,
>     description character varying(255),
>     vocabulary_id integer,
>     derived boolean
>);
>
>CREATE TABLE category_descendants (
>     id integer NOT NULL,
>     ancestor_id integer,
>     descendant_id integer,
>     distance integer,
>     derived boolean
>);
>
>CREATE TABLE category_links (
>     id integer NOT NULL,
>     parent_id integer,
>     child_id integer,
>     derived boolean
>);
>[snip..]
>As stated in my last post, any help you can give on how to improve 
>queries of this type would be very much appreciated.
>
>Thanks!
>Ryan
>

Hi Ryan,

I've been toying with your sample data for a bit and I apologize but 
your query has me baffled. Not that it's wrong - it actually looks very 
sophisticated, but it seems super complex to me - kind of like how I 
usually feel reading perl.. :)

I'm sure real sql-heads would get it right away but I'm not able to.

If you're looking to optimize the use-case you provided in your first 
email, the best thing I can suggest from what I understand would make 
an assumption:

Are the data in your tables are slowly changing? So could you build 
some analytic/pre-calculated data into these tables or related 
supporting ones to guide your searches/queries?

For example, if you want to find only records which are immediate 
children of other records, why not make a table which stores just that 
information? Your current tables are fully hierarchical which is great, 
but you want to look things up quickly based on a specific 
relationship: records who are direct children of a particular record..

So if you made a calculated table that stores this information, you 
could keep it up to date either by running the calculation script 
periodically or by attaching updates to relevant triggers / rules.

I'm sorry I'm not able to get into the SQL / example you sent further. 
I got lost in the code, which I'm a little embarrassed to admit but 
there you are.

If you're interested in this idea of precalculating values to optimize 
your search, I'd be happy to discuss further. Also, Ralph Kimball's 
Data Warehousing books are excellent on this subject (one of the few 
authors who truly changed the way I think about data).

Steve