IN vs EXISTS equivalence - Mailing list pgsql-hackers

From Kevin Grittner
Subject IN vs EXISTS equivalence
Date
Msg-id 471C6DEB.EE98.0025.0@wicourts.gov
Whole thread Raw
Responses Re: IN vs EXISTS equivalence
List pgsql-hackers
I've requested this before without response, but I'm asking again
because it just caused me pain again: could we get a TODO added to
have the planner recognize equivalent IN and EXISTS constructs and
have them compete on cost estimates?  I know it's not a trivial
improvement, but if it's on the list maybe someone will pick it up,
and I see it as the single biggest weakness in PostgreSQL
performance.
I don't need help resolving this particular case, because the fix is
always blinding obvious when we hit this, and it doesn't even break
portability because no other database we've tested fails to recognize
these equivalent cases.
step=# explain DELETE FROM "Body" WHERE "bodySeqNo" NOT IN (SELECT "bodySeqNo" FROM "Message");
         QUERY PLAN 
----------------------------------------------------------------------------------Seq Scan on "Body"
(cost=90277.43..285235351699.39rows=3313379 width=6)  Filter: (NOT (subplan))  SubPlan    ->  Materialize
(cost=90277.43..159793.40rows=6627957 width=11)          ->  Seq Scan on "Message"  (cost=0.00..80413.07 rows=6627957
width=11)
(5 rows)

step=# explain DELETE FROM "Body" WHERE NOT EXISTS (SELECT * FROM "Message" m WHERE m."bodySeqNo" =
"Body"."bodySeqNo");                                       QUERY PLAN 
--------------------------------------------------------------------------------------------Seq Scan on "Body"
(cost=0.00..3401760.88rows=3313416 width=6)  Filter: (NOT (subplan))  SubPlan    ->  Index Scan using "Message_Body" on
"Message"m  (cost=0.00..0.49 rows=1 width=136)          Index Cond: (("bodySeqNo")::numeric = ($0)::numeric) 
(5 rows)
The bodySeqNo column is NOT NULL in both tables, and is the primary
key in the Body table.  The Message table has a non-unique index on
it.  (\d lists will follow at the bottom.)
I cancelled the first query after it had been running for 54 hours
over our slowest hours (the weekend).  The second form ran in four
minutes in competition with peak time queries.
-Kevin
step=# \d "Body"              Table "public.Body"  Column    |          Type          | Modifiers
-------------+------------------------+-----------bodySeqNo   | "SequenceT"            | not nullcontentType |
charactervarying(255) | not nullencoding    | character varying(255) |body        | "BodyT"                | 
Indexes:   "Body_pkey" PRIMARY KEY, btree ("bodySeqNo")

step=# \d "Message"                Table "public.Message"    Column      |           Type           | Modifiers
-----------------+--------------------------+-----------messageId       | "SequenceT"              | not
nullclientMessageId| "ClientMessageIdT"       | not nullcorrelationId   | "SequenceT"              |destQueue       |
"QueueNameT"            | not nullreplyToQueue    | "QueueNameT"             | not nulltypeCode        | character(2)
         |expiration      | timestamp with time zone |priority        | smallint                 | not nullstatus
  | character(2)             | not nullcreated         | timestamp with time zone | not nulllastModified    | timestamp
withtime zone | not nullbodySeqNo       | "SequenceT"              | not nullmessageIdSearch | "PrioritySequenceT"
|not null 
Indexes:   "Message_pkey" PRIMARY KEY, btree ("messageId")   "MessageIndex2" UNIQUE, btree ("destQueue",
"clientMessageId")  "Message_MessageIdSearch" UNIQUE, btree ("destQueue", status, "messageIdSearch") CLUSTER
"Message_Body"btree ("bodySeqNo")   "Message_Created" btree ("destQueue", status, created)   "Message_Created2" btree
("destQueue",created)   "Message_Expiration" btree (expiration)   "Message_LastModified" btree ("destQueue",
"lastModified")  "Message_ReplyToQueue" btree ("replyToQueue") 
Foreign-key constraints:   "Message_fk1" FOREIGN KEY ("destQueue") REFERENCES "Queue"(name)   "Message_fk2" FOREIGN KEY
("replyToQueue")REFERENCES "Queue"(name) 



pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Ready for beta2?
Next
From: Tom Lane
Date:
Subject: Re: Latin vs non-Latin words in text search parsing