Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree? - Mailing list pgsql-general

From Seref Arikan
Subject Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Date
Msg-id CAG1bHGO-2uXAP_xhL6A1dkUhz6vtW=GvV7qvomaNEgw02mvkHQ@mail.gmail.com
Whole thread Raw
In response to Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?  (Seref Arikan <serefarikan@kurumsalteknoloji.com>)
Responses Re: Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
List pgsql-general
Trying to reply to Rob:

Apologies if this does not end up in the thread (gmail is just driving me mad, I can't seem to receive messages, so I've subscribed again)

For some reason Limit 1 cause my query to go on for minutes without a response, which was not the case.

The following query takes about 10.6 seconds, with a table that has about 20 million rows. First, the indices of the table with create statement:

CREATE TABLE public.path_value (
  val_string TEXT,
  feature_mapping_id INTEGER NOT NULL,
  parent_feature_mapping_id INTEGER,
  feature_name TEXT,
  rm_type_name TEXT,
  path INTEGER NOT NULL,
  payload_id INTEGER NOT NULL,
  id INTEGER NOT NULL,
  ehr_id INTEGER,
  CONSTRAINT path_value_pkey PRIMARY KEY(id)
) WITHOUT OIDS;

CREATE INDEX path_value_idx ON public.path_value
  USING btree (rm_type_name COLLATE pg_catalog."default", feature_name COLLATE pg_catalog."default");

CREATE INDEX path_value_idx1 ON public.path_value
  USING btree (feature_mapping_id, parent_feature_mapping_id);

CREATE INDEX path_value_idx2 ON public.path_value
  USING btree (payload_id);


Now results of EXPLAIN ANALYZE:

"QUERY PLAN"
"Nested Loop Semi Join  (cost=543422.34..571013.45 rows=5 width=4) (actual time=5854.267..10684.798 rows=82003 loops=1)"
"  Join Filter: (root.feature_mapping_id = anodeid.parent_feature_mapping_id)"
"  ->  Merge Join  (cost=543422.34..543426.29 rows=12 width=24) (actual time=5854.182..5953.531 rows=82003 loops=1)"
"        Merge Cond: ((root.feature_mapping_id = node1.parent_feature_mapping_id) AND (root.payload_id = node2.payload_id))"
"        ->  Sort  (cost=1150.65..1151.35 rows=283 width=12) (actual time=359.940..380.707 rows=82003 loops=1)"
"              Sort Key: root.feature_mapping_id, root.payload_id"
"              Sort Method: external merge  Disk: 1768kB"
"              ->  Index Scan using path_value_idx on path_value root  (cost=0.00..1139.12 rows=283 width=12) (actual time=0.073..261.240 rows=82003 loops=1)"
"                    Index Cond: ((rm_type_name = 'COMPOSITION'::text) AND (feature_name = 'composition'::text))"
"        ->  Sort  (cost=542271.69..542272.31 rows=247 width=12) (actual time=5494.236..5516.788 rows=82003 loops=1)"
"              Sort Key: node1.parent_feature_mapping_id, node2.payload_id"
"              Sort Method: external sort  Disk: 2088kB"
"              ->  HashAggregate  (cost=542259.40..542261.87 rows=247 width=12) (actual time=5377.761..5394.453 rows=82003 loops=1)"
"                    ->  Hash Semi Join  (cost=506297.37..542253.38 rows=1205 width=12) (actual time=2788.758..5325.838 rows=164004 loops=1)"
"                          Hash Cond: ((node1.payload_id = node2.payload_id) AND (node1.feature_mapping_id = node2.parent_feature_mapping_id))"
"                          ->  Bitmap Heap Scan on path_value node1  (cost=77.61..9421.62 rows=2464 width=12) (actual time=76.628..2430.473 rows=246005 loops=1)"
"                                Recheck Cond: ((rm_type_name = 'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
"                                ->  Bitmap Index Scan on path_value_idx  (cost=0.00..77.00 rows=2464 width=0) (actual time=75.640..75.640 rows=246005 loops=1)"
"                                      Index Cond: ((rm_type_name = 'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
"                          ->  Hash  (cost=498674.38..498674.38 rows=399092 width=8) (actual time=2711.653..2711.653 rows=410015 loops=1)"
"                                Buckets: 4096  Batches: 16  Memory Usage: 1013kB"
"                                ->  Bitmap Heap Scan on path_value node2  (cost=10345.32..498674.38 rows=399092 width=8) (actual time=71.718..2607.467 rows=410015 loops=1)"
"                                      Recheck Cond: (rm_type_name = 'ITEMSTRUCTURE'::text)"
"                                      ->  Bitmap Index Scan on path_value_idx  (cost=0.00..10245.55 rows=399092 width=0) (actual time=69.634..69.634 rows=410015 loops=1)"
"                                            Index Cond: (rm_type_name = 'ITEMSTRUCTURE'::text)"
"  ->  Index Scan using path_value_idx2 on path_value anodeid  (cost=0.00..2298.91 rows=1 width=8) (actual time=0.057..0.057 rows=1 loops=82003)"
"        Index Cond: (payload_id = node2.payload_id)"
"        Filter: ((feature_name = 'archetypeNodeId'::text) AND (val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'::text))"
"Total runtime: 10692.277 ms"

and finaly, the query that gave this result:

EXPLAIN ANALYZE SELECT root.id from path_value as root
    WHERE
        root.rm_type_name = 'COMPOSITION'
    AND
        root.feature_name = 'composition'
    AND
    EXISTS (SELECT anodeid.id from path_value as anodeid
            WHERE
                    anodeId.parent_feature_mapping_id = root.feature_mapping_id
            AND       
                    anodeId.payload_id = root.payload_id
            AND
                    anodeId.feature_name = 'archetypeNodeId'
            AND
                    anodeId.val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'           
            )
           
    AND
    EXISTS (SELECT 1 from path_value as node1
             WHERE
                node1.payload_id = root.payload_id             
            AND
                node1.parent_feature_mapping_id = root.feature_mapping_id
            AND
                node1.feature_name = 'content'
            AND
                node1.rm_type_name = 'CONTENTITEM'
            AND
            EXISTS (SELECT 1 from path_value as node2
                    WHERE
                    node2.payload_id = node1.payload_id
                    AND
                    node2.parent_feature_mapping_id = node1.feature_mapping_id
                    AND
                    node2.rm_type_name = 'ITEMSTRUCTURE'
                    )
            )

Is there a glaring error in my approach? Should I be better off with another SQL query, or Ltree/XPATH queries?

Best regards
Seref







pgsql-general by date:

Previous
From: Adrian Schreyer
Date:
Subject: cannot compile www_fdw Foreign Data Wrapper
Next
From: Tom Lane
Date:
Subject: Re: cannot compile www_fdw Foreign Data Wrapper