Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree? - Mailing list pgsql-general

From Rob Sargentg
Subject Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Date
Msg-id 4FB508FA.40009@gmail.com
Whole thread Raw
In response to Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?  (Seref Arikan <serefarikan@kurumsalteknoloji.com>)
List pgsql-general
On 05/17/2012 03:06 AM, Seref Arikan wrote:
I have a key value table in my Postgresql db, which represents hierarchical data through parent_feature_mapping column that points to id of feature_mapping_id column of the same table.

I need to select root nodes that has children which satisfy various conditions. The conditions may extend to children of children, so I'm trying to find roots of trees that contain paths that satisfy  the given constraints.

An example is finding the trees where the root node has type 'COMPOSITION' and root node's archetypeNodeId attribute has value 'openEHR-EHR-COMPOSITION.discharge.v1' another constraint is root node having a child of type 'CONTENTITEM' that in turn has a child of type 'ITEMSTRUCTURE'
All nodes in a tree have the same payload Id. The fastest query that I could write so far is given below.


SELECT root.id from path_value as root
    WHERE
        root.rm_type_name = 'COMPOSITION'
    AND
        root.feature_name = 'composition'
    AND
    EXISTS (SELECT 1 from path_value as anodeid
            WHERE
                    anodeId.parent_feature_mapping_id = root.feature_mapping_id
            AND       
                    anodeId.payload_id = root.payload_id
            AND
                    anodeId.feature_name = 'archetypeNodeId'
            AND
                    anodeId.val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'
            LIMIT 1
            )
           
    AND
    EXISTS (SELECT 1 from path_value as node1
             WHERE
                node1.payload_id = root.payload_id             
            AND
                node1.parent_feature_mapping_id = root.feature_mapping_id
            AND
                node1.feature_name = 'content'
            AND
                node1.rm_type_name = 'CONTENTITEM'
            AND
            EXISTS (SELECT 1 from path_value as node2
                    WHERE
                    node2.payload_id = node1.payload_id
                    AND
                    node2.parent_feature_mapping_id = node1.feature_mapping_id
                    AND
                    node2.rm_type_name = 'ITEMSTRUCTURE'
                    LIMIT 1)
            LIMIT 1)

My question is: is this the best approach in terms of performance? This is an attempt to identify XML payloads that fit certain criteria. I have also considered using an ltree column that will contain the tree in a from that I can query as an alternative to sql based method, or I can use xpath queries on XML payload.

The create statement for my table is as follows:

CREATE TABLE public.path_value (
  val_string TEXT,
  feature_mapping_id INTEGER NOT NULL,
  parent_feature_mapping_id INTEGER,
  feature_name TEXT,
  rm_type_name TEXT,
  path INTEGER NOT NULL,
  payload_id INTEGER NOT NULL,
  id INTEGER NOT NULL,
  ehr_id INTEGER,
  CONSTRAINT path_value_pkey PRIMARY KEY(id)
) WITHOUT OIDS;


Best regards
Seref

Any other constraints or indexes on that table?

pgsql-general by date:

Previous
From: Seref Arikan
Date:
Subject: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Next
From: Paulo Correia
Date:
Subject: Re: Postgres 9.0 Streaming Replication and Load Balancing?