anti-join chosen even when slower than old plan - Mailing list pgsql-performance

From Kevin Grittner
Subject anti-join chosen even when slower than old plan
Date
Msg-id 4CD9663502000025000374B1@gw.wicourts.gov
Whole thread Raw
Responses Re: anti-join chosen even when slower than old plan
Re: anti-join chosen even when slower than old plan
List pgsql-performance
The semi-join and anti-join have helped us quite a bit, but we have
seen a situation where anti-join is chosen even though it is slower
than the "old fashioned" plan.  I know there have been other reports
of this, but I just wanted to go on record with my details.

The query:

delete from "DbTranLogRecord"
  where not exists
        (select * from "DbTranRepository" r
           where r."countyNo" = "DbTranLogRecord"."countyNo"
             and r."tranImageSeqNo"
                 = "DbTranLogRecord"."tranImageSeqNo");

Old plan on 8.3.7:

 Seq Scan on "DbTranLogRecord"  (cost=0.00..1224227790.06
rows=333387520 width=6)
   Filter: (NOT (subplan))
   SubPlan
     ->  Index Scan using "DbTranRepositoryPK" on "DbTranRepository"
r  (cost=0.00..1.83 rows=1 width=974)
           Index Cond: ((("countyNo")::smallint = ($0)::smallint)
AND (("tranImageSeqNo")::numeric = ($1)::numeric))

Deletes about 9.2 million rows in 7 hours and 20 minutes.

New plan on 9.0.1:

 Delete  (cost=0.00..93918390.38 rows=1 width=12)
   ->  Merge Anti Join  (cost=0.00..93918390.38 rows=1 width=12)
         Merge Cond: ((("DbTranLogRecord"."countyNo")::smallint =
(r."countyNo")::smallint) AND
(("DbTranLogRecord"."tranImageSeqNo")::numeric =
(r."tranImageSeqNo")::numeric))
         ->  Index Scan using "DbTranLogRecordPK" on
"DbTranLogRecord"  (cost=0.00..73143615.91 rows=675405504 width=20)
         ->  Index Scan using "DbTranRepositoryPK" on
"DbTranRepository" r  (cost=0.00..16328700.43 rows=152541168
width=20)

Cancelled after 39 hours and 25 minutes.

I know how to work around it by using OFFSET 0 or tweaking the
costing for that one query; just sharing the information.

Also, we know these tables might be good candidates for
partitioning, but that's an issue for another day.

         Table "public.DbTranLogRecord"
     Column     |       Type        | Modifiers
----------------+-------------------+-----------
 countyNo       | "CountyNoT"       | not null
 tranImageSeqNo | "TranImageSeqNoT" | not null
 logRecordSeqNo | "LogRecordSeqNoT" | not null
 operation      | "OperationT"      | not null
 tableName      | "TableNameT"      | not null
Indexes:
    "DbTranLogRecordPK" PRIMARY KEY, btree ("countyNo",
"tranImageSeqNo", "logRecordSeqNo")
    "DbTranLogRecord_TableNameSeqNo" btree ("countyNo", "tableName",
"tranImageSeqNo", operation)

            Table "public.DbTranRepository"
      Column      |          Type          | Modifiers
------------------+------------------------+-----------
 countyNo         | "CountyNoT"            | not null
 tranImageSeqNo   | "TranImageSeqNoT"      | not null
 timestampValue   | "TimestampT"           | not null
 transactionImage | "ImageT"               |
 status           | character(1)           | not null
 queryName        | "QueryNameT"           |
 runDuration      | numeric(15,0)          |
 userId           | "UserIdT"              |
 functionalArea   | "FunctionalAreaT"      |
 sourceRef        | character varying(255) |
 url              | "URLT"                 |
 tranImageSize    | numeric(15,0)          |
Indexes:
    "DbTranRepositoryPK" PRIMARY KEY, btree ("countyNo",
"tranImageSeqNo") CLUSTER
    "DbTranRepository_UserId" btree ("countyNo", "userId",
"tranImageSeqNo")
    "DbTranRepository_timestamp" btree ("countyNo", "timestampValue")

            relname             | relpages |  reltuples  |
pg_relation_size
--------------------------------+----------+-------------+------------------
 DbTranLogRecord                |  5524411 | 6.75406e+08 | 42 GB
 DbTranLogRecordPK              |  6581122 | 6.75406e+08 | 50 GB
 DbTranLogRecord_TableNameSeqNo |  6803441 | 6.75406e+08 | 52 GB
 DbTranRepository               | 22695447 | 1.52376e+08 | 173 GB
 DbTranRepositoryPK             |  1353643 | 1.52376e+08 | 10 GB
 DbTranRepository_UserId        |  1753793 | 1.52376e+08 | 13 GB
 DbTranRepository_timestamp     |  1353682 | 1.52376e+08 | 10 GB
(7 rows)

oprofile while not much but this delete is running:

samples  %        symbol name
2320174  33.7617  index_getnext
367268    5.3443  LWLockAcquire
299131    4.3528  hash_search_with_hash_value
249459    3.6300  HeapTupleSatisfiesMVCC
229558    3.3404  PinBuffer
222673    3.2402  _bt_checkkeys
204416    2.9745  LWLockRelease
194336    2.8279  heap_page_prune_opt
152353    2.2169  XidInMVCCSnapshot
121131    1.7626  AllocSetAlloc
91123     1.3260  SearchCatCache
88394     1.2863  nocache_index_getattr
85936     1.2505  pglz_compress
76531     1.1136  heap_hot_search_buffer
69532     1.0118  _mdfd_getseg
68743     1.0003  FunctionCall2
64720     0.9418  TransactionIdPrecedes
45298     0.6591  texteq
43183     0.6284  UnpinBuffer
40666     0.5917  base_yyparse

If you want more details or the opannotate level, let me know.

-Kevin

pgsql-performance by date:

Previous
From: Bob Lunney
Date:
Subject: Re: out of memory problem
Next
From: bricklen
Date:
Subject: Huge overestimation in rows expected results in bad plan