Home > mailing lists

Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree? - Mailing list pgsql-general

From	Seref Arikan
Subject	Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
Date	May 17, 2012 17:47:31
Msg-id	CAG1bHGO-2uXAP_xhL6A1dkUhz6vtW=GvV7qvomaNEgw02mvkHQ@mail.gmail.com Whole thread Raw
In response to	Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree? (Seref Arikan <serefarikan@kurumsalteknoloji.com>)
Responses	Re: Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree?
List	pgsql-general

Tree view

Trying to reply to Rob:

Apologies if this does not end up in the thread (gmail is just driving me mad, I can't seem to receive messages, so I've subscribed again)

For some reason Limit 1 cause my query to go on for minutes without a response, which was not the case.

The following query takes about 10.6 seconds, with a table that has about 20 million rows. First, the indices of the table with create statement:

CREATE TABLE public.path_value (
val_string TEXT,
feature_mapping_id INTEGER NOT NULL,
parent_feature_mapping_id INTEGER,
feature_name TEXT,
rm_type_name TEXT,
path INTEGER NOT NULL,
payload_id INTEGER NOT NULL,
id INTEGER NOT NULL,
ehr_id INTEGER,
CONSTRAINT path_value_pkey PRIMARY KEY(id)
) WITHOUT OIDS;

CREATE INDEX path_value_idx ON public.path_value
USING btree (rm_type_name COLLATE pg_catalog."default", feature_name COLLATE pg_catalog."default");

CREATE INDEX path_value_idx1 ON public.path_value
USING btree (feature_mapping_id, parent_feature_mapping_id);

CREATE INDEX path_value_idx2 ON public.path_value
USING btree (payload_id);

Now results of EXPLAIN ANALYZE:

"QUERY PLAN"
"Nested Loop Semi Join (cost=543422.34..571013.45 rows=5 width=4) (actual time=5854.267..10684.798 rows=82003 loops=1)"
" Join Filter: (root.feature_mapping_id = anodeid.parent_feature_mapping_id)"
" -> Merge Join (cost=543422.34..543426.29 rows=12 width=24) (actual time=5854.182..5953.531 rows=82003 loops=1)"
"        Merge Cond: ((root.feature_mapping_id = node1.parent_feature_mapping_id) AND (root.payload_id = node2.payload_id))"
"        -> Sort (cost=1150.65..1151.35 rows=283 width=12) (actual time=359.940..380.707 rows=82003 loops=1)"
"              Sort Key: root.feature_mapping_id, root.payload_id"
"              Sort Method: external merge Disk: 1768kB"
"              -> Index Scan using path_value_idx on path_value root (cost=0.00..1139.12 rows=283 width=12) (actual time=0.073..261.240 rows=82003 loops=1)"
"                    Index Cond: ((rm_type_name = 'COMPOSITION'::text) AND (feature_name = 'composition'::text))"
"        -> Sort (cost=542271.69..542272.31 rows=247 width=12) (actual time=5494.236..5516.788 rows=82003 loops=1)"
"              Sort Key: node1.parent_feature_mapping_id, node2.payload_id"
"              Sort Method: external sort Disk: 2088kB"
"              -> HashAggregate (cost=542259.40..542261.87 rows=247 width=12) (actual time=5377.761..5394.453 rows=82003 loops=1)"
"                    -> Hash Semi Join (cost=506297.37..542253.38 rows=1205 width=12) (actual time=2788.758..5325.838 rows=164004 loops=1)"
"                          Hash Cond: ((node1.payload_id = node2.payload_id) AND (node1.feature_mapping_id = node2.parent_feature_mapping_id))"
"                          -> Bitmap Heap Scan on path_value node1 (cost=77.61..9421.62 rows=2464 width=12) (actual time=76.628..2430.473 rows=246005 loops=1)"
"                                Recheck Cond: ((rm_type_name = 'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
"                                -> Bitmap Index Scan on path_value_idx (cost=0.00..77.00 rows=2464 width=0) (actual time=75.640..75.640 rows=246005 loops=1)"
"                                      Index Cond: ((rm_type_name = 'CONTENTITEM'::text) AND (feature_name = 'content'::text))"
"                          -> Hash (cost=498674.38..498674.38 rows=399092 width=8) (actual time=2711.653..2711.653 rows=410015 loops=1)"
"                                Buckets: 4096 Batches: 16 Memory Usage: 1013kB"
"                                -> Bitmap Heap Scan on path_value node2 (cost=10345.32..498674.38 rows=399092 width=8) (actual time=71.718..2607.467 rows=410015 loops=1)"
"                                      Recheck Cond: (rm_type_name = 'ITEMSTRUCTURE'::text)"
"                                      -> Bitmap Index Scan on path_value_idx (cost=0.00..10245.55 rows=399092 width=0) (actual time=69.634..69.634 rows=410015 loops=1)"
"                                            Index Cond: (rm_type_name = 'ITEMSTRUCTURE'::text)"
" -> Index Scan using path_value_idx2 on path_value anodeid (cost=0.00..2298.91 rows=1 width=8) (actual time=0.057..0.057 rows=1 loops=82003)"
"        Index Cond: (payload_id = node2.payload_id)"
"        Filter: ((feature_name = 'archetypeNodeId'::text) AND (val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'::text))"
"Total runtime: 10692.277 ms"

and finaly, the query that gave this result:

EXPLAIN ANALYZE SELECT root.id from path_value as root
    WHERE
        root.rm_type_name = 'COMPOSITION'
    AND
        root.feature_name = 'composition'
    AND
    EXISTS (SELECT anodeid.id from path_value as anodeid
            WHERE
                    anodeId.parent_feature_mapping_id = root.feature_mapping_id
            AND
                    anodeId.payload_id = root.payload_id
            AND
                    anodeId.feature_name = 'archetypeNodeId'
            AND
                    anodeId.val_string = 'openEHR-EHR-COMPOSITION.discharge.v1'
            )

    AND
    EXISTS (SELECT 1 from path_value as node1
             WHERE
                node1.payload_id = root.payload_id
            AND
                node1.parent_feature_mapping_id = root.feature_mapping_id
            AND
                node1.feature_name = 'content'
            AND
                node1.rm_type_name = 'CONTENTITEM'
            AND
            EXISTS (SELECT 1 from path_value as node2
                    WHERE
                    node2.payload_id = node1.payload_id
                    AND
                    node2.parent_feature_mapping_id = node1.feature_mapping_id
                    AND
                    node2.rm_type_name = 'ITEMSTRUCTURE'
                    )
            )

Is there a glaring error in my approach? Should I be better off with another SQL query, or Ltree/XPATH queries?

Best regards
Seref

pgsql-general by date:

From: Adrian Schreyer
Date: 17 May 2012, 15:46:33
Subject: cannot compile www_fdw Foreign Data Wrapper

From: Tom Lane
Date: 17 May 2012, 18:12:34
Subject: Re: cannot compile www_fdw Foreign Data Wrapper

Re: Is EXISTS the most efficient approach for PostgreSql to check for existence of nodes in a tree? - Mailing list pgsql-general

Previous

Next