Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance
From | Kevin Grittner |
---|---|
Subject | Re: anti-join chosen even when slower than old plan |
Date | |
Msg-id | 4CDA62C60200002500037512@gw.wicourts.gov Whole thread Raw |
In response to | Re: anti-join chosen even when slower than old plan (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: anti-join chosen even when slower than old plan
Re: anti-join chosen even when slower than old plan |
List | pgsql-performance |
Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: >> The semi-join and anti-join have helped us quite a bit, but we >> have seen a situation where anti-join is chosen even though it is >> slower than the "old fashioned" plan. I know there have been >> other reports of this, but I just wanted to go on record with my >> details. > > In principle, the old-style plan ought to be equivalent to a > nestloop antijoin with a seqscan of DbTranLogRecord on the outside > and an indexscan of DbTranRepository on the inside. Can you force > it to choose such a plan by setting enable_mergejoin off (and > maybe enable_hashjoin too)? Well, I got what I think is the equivalent plan by adding OFFSET 0 to the subquery: Delete (cost=0.00..1239005015.67 rows=337702752 width=6) -> Seq Scan on "DbTranLogRecord" (cost=0.00..1239005015.67 rows=337702752 width=6) Filter: (NOT (SubPlan 1)) SubPlan 1 -> Limit (cost=0.00..1.82 rows=1 width=974) -> Index Scan using "DbTranRepositoryPK" on "DbTranRepository" r (cost=0.00..1.82 rows=1 width=974) Index Cond: ((("countyNo")::smallint = ($0)::smallint) AND (("tranImageSeqNo")::numeric = ($1)::numeric)) > If so, it'd be interesting to see the estimated costs and actual > runtime on 9.0 for that plan. Unfortunately, based on the oprofile information I decided to check out the plan I would get by boosting cpu_index_tuple_cost by a factor of 20. The resulting plan was: Delete (cost=132623778.83..139506491.18 rows=1 width=12) -> Merge Anti Join (cost=132623778.83..139506491.18 rows=1 width=12) Merge Cond: ((("DbTranLogRecord"."tranImageSeqNo")::numeric = (r."tranImageSeqNo")::numeric) AND (("DbTranLogRecord"."countyNo")::smallint = (r."countyNo")::smallint)) -> Sort (cost=107941675.79..109630189.55 rows=675405504 width=20) Sort Key: "DbTranLogRecord"."tranImageSeqNo", "DbTranLogRecord"."countyNo" -> Seq Scan on "DbTranLogRecord" (cost=0.00..7306496.14 rows=675405504 width=20) -> Materialize (cost=24682103.04..25443983.12 rows=152376016 width=20) -> Sort (cost=24682103.04..25063043.08 rows=152376016 width=20) Sort Key: r."tranImageSeqNo", r."countyNo" -> Seq Scan on "DbTranRepository" r (cost=0.00..3793304.86 rows=152376016 width=20) That looked like it had potential, so I started that off and went home before I got your post. It finished in 3 hours and 31 minutes -- more than twice as fast as the nestloop plan used under 8.3. But wait -- it turns out that this pain was self-inflicted. Based on heavy testing of the interactive queries which users run against this database we tuned the database for "fully-cached" settings, with both random_page_cost and _seq_page_cost at 0.1. In a practical sense, the users are almost always running these queries against very recent data which is, in fact, heavily cached -- so it's no surprise that the queries they run perform best with plans based on such costing. The problem is that these weekly maintenance runs need to pass the entire database, so caching effects are far less pronounced. If I set seq_page_cost = 1 and random_page_cost = 2 I get exactly the same (fast) plan as above. I guess the lesson here is not to use the same costing for database-wide off-hours maintenance queries as for ad hoc queries against a smaller set of recent data by users who expect quick response time. I'm fine with tweaking the costs in our maintenance scripts, but it does tend to make me daydream about how the optimizer might possibly auto-tweak such things.... I assume there's now no need to get timings for the old plan? -Kevin
pgsql-performance by date: