IN vs EXISTS equivalence - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | IN vs EXISTS equivalence |
Date | |
Msg-id | 471C6DEB.EE98.0025.0@wicourts.gov Whole thread Raw |
Responses |
Re: IN vs EXISTS equivalence
|
List | pgsql-hackers |
I've requested this before without response, but I'm asking again because it just caused me pain again: could we get a TODO added to have the planner recognize equivalent IN and EXISTS constructs and have them compete on cost estimates? I know it's not a trivial improvement, but if it's on the list maybe someone will pick it up, and I see it as the single biggest weakness in PostgreSQL performance. I don't need help resolving this particular case, because the fix is always blinding obvious when we hit this, and it doesn't even break portability because no other database we've tested fails to recognize these equivalent cases. step=# explain DELETE FROM "Body" WHERE "bodySeqNo" NOT IN (SELECT "bodySeqNo" FROM "Message"); QUERY PLAN ----------------------------------------------------------------------------------Seq Scan on "Body" (cost=90277.43..285235351699.39rows=3313379 width=6) Filter: (NOT (subplan)) SubPlan -> Materialize (cost=90277.43..159793.40rows=6627957 width=11) -> Seq Scan on "Message" (cost=0.00..80413.07 rows=6627957 width=11) (5 rows) step=# explain DELETE FROM "Body" WHERE NOT EXISTS (SELECT * FROM "Message" m WHERE m."bodySeqNo" = "Body"."bodySeqNo"); QUERY PLAN --------------------------------------------------------------------------------------------Seq Scan on "Body" (cost=0.00..3401760.88rows=3313416 width=6) Filter: (NOT (subplan)) SubPlan -> Index Scan using "Message_Body" on "Message"m (cost=0.00..0.49 rows=1 width=136) Index Cond: (("bodySeqNo")::numeric = ($0)::numeric) (5 rows) The bodySeqNo column is NOT NULL in both tables, and is the primary key in the Body table. The Message table has a non-unique index on it. (\d lists will follow at the bottom.) I cancelled the first query after it had been running for 54 hours over our slowest hours (the weekend). The second form ran in four minutes in competition with peak time queries. -Kevin step=# \d "Body" Table "public.Body" Column | Type | Modifiers -------------+------------------------+-----------bodySeqNo | "SequenceT" | not nullcontentType | charactervarying(255) | not nullencoding | character varying(255) |body | "BodyT" | Indexes: "Body_pkey" PRIMARY KEY, btree ("bodySeqNo") step=# \d "Message" Table "public.Message" Column | Type | Modifiers -----------------+--------------------------+-----------messageId | "SequenceT" | not nullclientMessageId| "ClientMessageIdT" | not nullcorrelationId | "SequenceT" |destQueue | "QueueNameT" | not nullreplyToQueue | "QueueNameT" | not nulltypeCode | character(2) |expiration | timestamp with time zone |priority | smallint | not nullstatus | character(2) | not nullcreated | timestamp with time zone | not nulllastModified | timestamp withtime zone | not nullbodySeqNo | "SequenceT" | not nullmessageIdSearch | "PrioritySequenceT" |not null Indexes: "Message_pkey" PRIMARY KEY, btree ("messageId") "MessageIndex2" UNIQUE, btree ("destQueue", "clientMessageId") "Message_MessageIdSearch" UNIQUE, btree ("destQueue", status, "messageIdSearch") CLUSTER "Message_Body"btree ("bodySeqNo") "Message_Created" btree ("destQueue", status, created) "Message_Created2" btree ("destQueue",created) "Message_Expiration" btree (expiration) "Message_LastModified" btree ("destQueue", "lastModified") "Message_ReplyToQueue" btree ("replyToQueue") Foreign-key constraints: "Message_fk1" FOREIGN KEY ("destQueue") REFERENCES "Queue"(name) "Message_fk2" FOREIGN KEY ("replyToQueue")REFERENCES "Queue"(name)
pgsql-hackers by date: