Bad plan for join to aggregate of join. - Mailing list pgsql-performance
From | Mischa Sandberg |
---|---|
Subject | Bad plan for join to aggregate of join. |
Date | |
Msg-id | 45071FCC.8090509@ca.sophos.com Whole thread Raw |
In response to | Abysmal hash join (Florian Weimer <fweimer@bfk.de>) |
Responses |
Re: Bad plan for join to aggregate of join.
|
List | pgsql-performance |
PG 8.0.3 is choosing a bad plan between a query. I'm going to force the plan (by making one join into a function). I'd like to know if this is unexpected; in general, can PG see that a join on an grouped-by field can be pushed down into the query as an indexable filter? The query below joins a table "message", to an aggregate of "message_recipient" joined to "recipient". The joins are all on indexed PK-FK columns. "message_recipient" is an intersect table. message :<: message_recipient :>: recipient In the query plan below, the right side of the join returns one row of "message", and PG knows it. The left side of the join compute the entire aggregate of "message_recipient" (est 700K rows), then does a merge join against the single message row. I would have hoped for a nested-loop join, where the message "id" field would be used to index-scan "message_recipient", which in turn would index-scan "recipient" by recipient "id". This is PG 8.0.3. All tables have been (very) recently analyzed. The query plans estimated rowcounts all look bang-on. "message" and "message_recipient" are tables of about 3M rows each. As usual, this is on a system to which I only have restricted access. But I'd be happy to expand on the info below with anything short of the pg_dump. -----------------------------------======================================================== EXPLAIN SELECT message.id AS m_db_id, message.m_global_id AS id, m_global_id, m_queue_id, h_message_id, m_date AS c_date_iso, m_date, c_subject_utf8, message.reason_id AS reason_id, m_reason.name AS m_reason, m_spam_probability, m_spam_level, h_to, m_message_size, m_header_size, date_part('epoch', message.m_date) AS c_qdate_time, h_from_local || '@' || h_from_domain AS h_from, env_from_local || '@' || env_from_domain AS env_from, env_from_local || '@' || env_from_domain AS m_envelope_from, location_name AS location, m_milter_host, m_relay, virus_name AS m_virus_name, m_all_recipients FROM message JOIN m_reason ON message.reason_id = m_reason.reason_id JOIN message_all_recipients ON message.id = message_all_recipients.m_id WHERE message.m_global_id = '2211000-1'; QUERY PLAN ------------------------------------------------------------------------------------------- Nested Loop (cost=254538.42..283378.44 rows=1 width=425) Join Filter: ("outer".reason_id = "inner".reason_id) -> Merge Join (cost=254538.42..283377.33 rows=1 width=416) Merge Cond: ("outer".m_id = "inner".id) -> Subquery Scan message_all_recipients (cost=254535.40..281604.95 rows=707735 width=40) -> GroupAggregate (cost=254535.40..274527.60 rows=707735 width=36) -> Sort (cost=254535.40..258250.57 rows=1486069 width=36) Sort Key: message_recipient.message_id -> Merge Join (cost=0.00..78970.52 rows=1486069 width=36) Merge Cond: ("outer".id = "inner".recipient_id) -> Index Scan using pk_recipient on recipient (cost=0.00..5150.65 rows=204514 width=36) -> Index Scan using pk_message_recipient on message_recipient (cost=0.00..56818.25 rows=1486069 width=16) Filter: (is_mapped = 1) -> Sort (cost=3.02..3.03 rows=1 width=384) Sort Key: message.id -> Index Scan using unq_message_m_global_id on message (cost=0.00..3.01 rows=1 width=384) Index Cond: ((m_global_id)::text = '2211000-1'::text) -> Seq Scan on m_reason (cost=0.00..1.04 rows=4 width=13) ----------------------------------- Relevant tables and view: # \d message Table "public.message" Column | Type | Modifiers --------------------+-----------------------------+--------------------------------------------------------- id | bigint | not null default nextval('public.message_id_seq'::text) m_global_id | character varying(255) | not null reason_id | smallint | not null location_name | character varying(255) | not null m_date | timestamp without time zone | m_queue_id | character varying(255) | h_message_id | character varying(255) | c_subject_utf8 | character varying(255) | env_from_local | character varying(255) | env_from_domain | character varying(255) | h_from_local | character varying(255) | h_from_domain | character varying(255) | h_from | character varying(255) | h_to | character varying(255) | m_milter_host | character varying(255) | m_relay | character varying(255) | m_spam_probability | double precision | m_message_size | integer | m_header_size | integer | m_spam_level | character varying(255) | virus_name | text | Indexes: "pk_message" PRIMARY KEY, btree (id) "unq_message_m_global_id" UNIQUE, btree (m_global_id) "message_h_message_id_index" btree (h_message_id) "message_m_date_index" btree (m_date) "message_m_queue_id_index" btree (m_queue_id) # \d message_recipient Table "public.message_recipient" Column | Type | Modifiers ---------------+----------+-------------------- recipient_id | bigint | not null message_id | bigint | not null is_mapped | smallint | not null default 0 is_calculated | smallint | not null default 0 is_envelope | smallint | not null default 0 reason_id | smallint | not null action | smallint | Indexes: "pk_message_recipient" PRIMARY KEY, btree (recipient_id, message_id) "message_recipient_message_id_index" btree (message_id) Foreign-key constraints: "rc_rcpnt_map_msg_id" FOREIGN KEY (message_id) REFERENCES message(id) ON DELETE CASCADE CREATE AGGREGATE catenate ( BASETYPE = text, SFUNC = textcat, STYPE = text, INITCOND = '' ); CREATE OR REPLACE VIEW message_all_recipients AS SELECT message_id AS m_id, substr(catenate(','||local||'@'||domain),2) AS m_all_recipients FROM message_recipient JOIN recipient ON id = recipient_id WHERE is_mapped = 1 GROUP BY message_id; ----------------------------------- pg_statistics info, problably not of much interest Object DiskIO CacheIO Ins Upd Del SeqScan TupRead IdxScan IdxFetch m_reason 308 599679 1 0 0 599985 2399935 0 0 message 4658766 14977816 2210967 0 933643 7299 81428503 5855900 8833404 message.pk_~ 227834 31683671 0 0 0 0 3897054 5850229 3897054 message.unq_~_m_global_id 252753 8591251 0 0 0 0 5552 5564 5552 message.~_h_~_id_index 1879172 8496722 0 0 0 0 0 0 0 message.~_m_date_index 245405 8526765 0 0 0 0 4930798 107 4930798 message.~_m_queue_id_index 245719 8598360 0 0 0 0 0 0 0 message_recipient 41862572 81546465 2703260 104 1144977 0 0 2648101 117192003 message_recipient.pk_~ 4541776 16430539 0 0 0 0 116042206 1710555 116042206 message_recipient.~_message_id_index 243379 14235956 0 0 0 0 1149797 937546 1149797 recipient 55288623 955926871 223057 0 112158 584592 103499192990 5726999 62036712 recipient.pk_~ 180080 1125073 0 0 0 0 7440446 117045 7440446 recipient.unq_~ 2205366 21513447 0 0 0 0 54166857 5609472 54166857 recipient.~_domain_index 191722 734683 0 0 0 0 429409 482 429409 ----------------------------------- output of "pgdisk", showing actual disk space vs pg_class info: ..DISK-KB ..DATA-KB ...EST-KB .EST-ROWS ...OID.... NAME 1625360 1021104 979000 1315620 17261 public.message 369208 159200 159032 1558240 17272 public.message_recipient 45752 16408 14552 181646 17293 public.recipient
pgsql-performance by date: