Efficient "filter query" with positive and/or negative conditions - Mailing list pgsql-general
From | Anders Steinlein |
---|---|
Subject | Efficient "filter query" with positive and/or negative conditions |
Date | |
Msg-id | 7DCE73D5-B4D6-4AAB-86F7-900E73254B4C@steinlein.no Whole thread Raw |
List | pgsql-general |
We have a database with contacts within lists, which can have any number of tags associated with them. I would like to segment the contacts based on any number of tag-based conditions, all of which must be fulfilled (i.e. an ALL rule/filter). For instance, I want to extract all contacts which have the tags "customer" AND "newsletter" but NOT the "invited" tag. However, segments need not contain "positive" predicates, so a segment defined as only NOT "invited" tag should include ALL contacts except those with the "invited" tag. I have found a working query using CTE, but it's way too slow when tens of thousands of contacts are involved. Suggestions for an alternative query and/or optimizations to this one? Server: ------- PostgreSQL 8.4.2 on a shared server running Apache/PHP/MySQL/PostgreSQL shared_buffers: 128MB work_mem: 8MB effective_cache_size: 256MB Current query: -------------- WITH segments_contacts_tags AS ( SELECT segmentid, tagname, tagtype, email FROM segments_tags st LEFT JOIN contacts_tags ct USING (tagname) WHERE segmentid = 93 ), positive_segment_matches AS ( SELECT segmentid, email, COUNT(email) AS tags FROM segments_contacts_tags WHERE tagtype = 1 GROUP BY 1, 2 ), positive_predicates_count AS ( SELECT segmentid, COUNT(*) AS count FROM segments_tags WHERE tagtype = 1 GROUP BY 1 ) SELECT s.segmentid, cl.email FROM segments s INNER JOIN contacts_lists cl USING (lid) LEFT JOIN positive_segment_matches psm USING (segmentid, email) LEFT JOIN positive_predicates_count ppc USING (segmentid) WHERE segmentid = 93 AND ((psm.email IS NULL AND ppc.count IS NULL) OR psm.tags = ppc.count) EXCEPT SELECT segmentid, email FROM segments_contacts_tags st WHERE st.tagtype = 0; Explain analyze: ---------------- HashSetOp Except (cost=1817.12..5822.48 rows=200 width=36) (actual time=5827.885..6051.771 rows=64672 loops=1) CTE segments_contacts_tags -> Nested Loop Left Join (cost=0.00..1323.86 rows=21819 width=61) (actual time=0.143..696.589 rows=64672 loops=1) -> Seq Scan on segments_tags st (cost=0.00..1.05 rows=1 width=29) (actual time=0.014..0.025 rows=1 loops=1) Filter: (segmentid = 93) -> Index Scan using contacts_tags_tagname on contacts_tags ct (cost=0.00..1050.07 rows=21819 width=49) (actual time=0.110..265.453 rows=64672 loops=1) Index Cond: (st.tagname = ct.tagname) CTE positive_segment_matches -> HashAggregate (cost=491.75..491.88 rows=11 width=36) (actual time=1733.143..1995.489 rows=64672 loops=1) -> CTE Scan on segments_contacts_tags (cost=0.00..490.93 rows=109 width=36) (actual time=0.161..1245.836 rows=64672 loops=1) Filter: (tagtype = 1) CTE positive_predicates_count -> HashAggregate (cost=1.07..1.12 rows=4 width=4) (actual time=0.069..0.088 rows=4 loops=1) -> Seq Scan on segments_tags (cost=0.00..1.05 rows=4 width=4) (actual time=0.015..0.032 rows=4 loops=1) Filter: (tagtype = 1) -> Append (cost=0.26..3681.56 rows=64811 width=36) (actual time=2734.790..5363.556 rows=64672 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.26..3189.55 rows=64702 width=36) (actual time=2734.782..4923.849 rows=64672 loops=1) -> Hash Left Join (cost=0.26..2542.53 rows=64702 width=36) (actual time=2734.773..4483.547 rows=64672 loops=1) Hash Cond: ((s.segmentid = psm.segmentid) AND ((cl.email)::text = (psm.email)::text)) Filter: (((psm.email IS NULL) AND (ppc.count IS NULL)) OR (psm.tags = ppc.count)) -> Nested Loop (cost=0.00..2052.95 rows=64702 width=44) (actual time=0.169..687.204 rows=64702 loops=1) Join Filter: (s.lid = cl.lid) -> Nested Loop Left Join (cost=0.00..1.15 rows=1 width=16) (actual time=0.130..0.189 rows=1 loops=1) Join Filter: (s.segmentid = ppc.segmentid) -> Seq Scan on segments s (cost=0.00..1.05 rows=1 width=8) (actual time=0.025..0.032 rows=1 loops=1) Filter: (segmentid = 93) -> CTE Scan on positive_predicates_count ppc (cost=0.00..0.09 rows=1 width=12) (actual time=0.086..0.129 rows=1 loops=1) Filter: (ppc.segmentid = 93) -> Seq Scan on contacts_lists cl (cost=0.00..1243.02 rows=64702 width=36) (actual time=0.014..231.331 rows=64702 loops=1) -> Hash (cost=0.25..0.25 rows=1 width=44) (actual time=2734.535..2734.535 rows=64672 loops=1) -> CTE Scan on positive_segment_matches psm (cost=0.00..0.25 rows=1 width=44) (actual time=1733.169..2481.359 rows=64672 loops=1) Filter: (segmentid = 93) -> Subquery Scan "*SELECT* 2" (cost=0.00..492.02 rows=109 width=36) (actual time=23.767..23.767 rows=0 loops=1) -> CTE Scan on segments_contacts_tags st (cost=0.00..490.93 rows=109 width=36) (actual time=23.758..23.758 rows=0 loops=1) Filter: (tagtype = 0) Total runtime: 6278.849 ms Involved tables: ---------------- Table "public.contacts_lists" Column | Type | Modifiers ----------------+----------------------------- +------------------------------ email | email | not null lid | integer | not null lstatus | character(1) | not null default 'a'::bpchar ladded | timestamp without time zone | not null default now() lstatuschanged | timestamp without time zone | not null default now() Indexes: "contacts_lists_pkey" PRIMARY KEY, btree (email, lid) "contacts_lists_statchanged" btree (lstatuschanged) "contacts_lists_status" btree (lstatus) Check constraints: "status_check" CHECK (lstatus = 'a'::bpchar OR lstatus = 'b'::bpchar OR lstatus = 'u'::bpchar) Foreign-key constraints: "contacts_lists_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE "contacts_lists_lid_fkey" FOREIGN KEY (lid) REFERENCES lists(lid) ON UPDATE RESTRICT ON DELETE CASCADE Table "public.contacts_tags" Column | Type | Modifiers ---------+-------+----------- email | email | not null tagname | text | not null Indexes: "contacts_tags_pkey" PRIMARY KEY, btree (email, tagname) "contacts_tags_tagname" btree (tagname) Foreign-key constraints: "contacts_tags_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE Table "public.tags" Column | Type | Modifiers -----------+-----------------------------+------------------------ tagname | text | not null createdat | timestamp without time zone | not null default now() Indexes: "tags_pkey" PRIMARY KEY, btree (tagname) Referenced by: TABLE "contacts_tags" CONSTRAINT "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE TABLE "segments_tags" CONSTRAINT "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE Table "public.segments" Column | Type | Modifiers -------------+----------------------------- +-------------------------------------------------------------- segmentid | integer | not null default nextval('segments_segmentid_seq'::regclass) lid | integer | not null segmentname | text | not null createdat | timestamp without time zone | not null default now() Indexes: "segments_pkey" PRIMARY KEY, btree (segmentid) Foreign-key constraints: "segments_lid_fkey" FOREIGN KEY (lid) REFERENCES lists(lid) ON UPDATE RESTRICT ON DELETE CASCADE Referenced by: TABLE "segments_tags" CONSTRAINT "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE Table "public.segments_tags" Column | Type | Modifiers -----------+---------+----------- segmentid | integer | not null tagname | text | not null tagtype | integer | not null Indexes: "segments_tags_pkey" PRIMARY KEY, btree (segmentid, tagname) Foreign-key constraints: "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE -- anders
pgsql-general by date: