Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: anti-join chosen even when slower than old plan
Date
Msg-id 4CDA62C60200002500037512@gw.wicourts.gov
Whole thread Raw
In response to Re: anti-join chosen even when slower than old plan  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: anti-join chosen even when slower than old plan
Re: anti-join chosen even when slower than old plan
List pgsql-performance
Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> The semi-join and anti-join have helped us quite a bit, but we
>> have seen a situation where anti-join is chosen even though it is
>> slower than the "old fashioned" plan.  I know there have been
>> other reports of this, but I just wanted to go on record with my
>> details.
>
> In principle, the old-style plan ought to be equivalent to a
> nestloop antijoin with a seqscan of DbTranLogRecord on the outside
> and an indexscan of DbTranRepository on the inside.  Can you force
> it to choose such a plan by setting enable_mergejoin off (and
> maybe enable_hashjoin too)?

Well, I got what I think is the equivalent plan by adding OFFSET 0
to the subquery:

 Delete  (cost=0.00..1239005015.67 rows=337702752 width=6)
   ->  Seq Scan on "DbTranLogRecord"  (cost=0.00..1239005015.67
rows=337702752 width=6)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Limit  (cost=0.00..1.82 rows=1 width=974)
                 ->  Index Scan using "DbTranRepositoryPK" on
"DbTranRepository" r  (cost=0.00..1.82 rows=1 width=974)
                       Index Cond: ((("countyNo")::smallint =
($0)::smallint) AND (("tranImageSeqNo")::numeric = ($1)::numeric))

> If so, it'd be interesting to see the estimated costs and actual
> runtime on 9.0 for that plan.

Unfortunately, based on the oprofile information I decided to check
out the plan I would get by boosting cpu_index_tuple_cost by a
factor of 20.  The resulting plan was:

 Delete  (cost=132623778.83..139506491.18 rows=1 width=12)
   ->  Merge Anti Join  (cost=132623778.83..139506491.18 rows=1
width=12)
         Merge Cond: ((("DbTranLogRecord"."tranImageSeqNo")::numeric
= (r."tranImageSeqNo")::numeric) AND
(("DbTranLogRecord"."countyNo")::smallint =
(r."countyNo")::smallint))
         ->  Sort  (cost=107941675.79..109630189.55 rows=675405504
width=20)
               Sort Key: "DbTranLogRecord"."tranImageSeqNo",
"DbTranLogRecord"."countyNo"
               ->  Seq Scan on "DbTranLogRecord"
(cost=0.00..7306496.14 rows=675405504 width=20)
         ->  Materialize  (cost=24682103.04..25443983.12
rows=152376016 width=20)
               ->  Sort  (cost=24682103.04..25063043.08
rows=152376016 width=20)
                     Sort Key: r."tranImageSeqNo", r."countyNo"
                     ->  Seq Scan on "DbTranRepository" r
(cost=0.00..3793304.86 rows=152376016 width=20)

That looked like it had potential, so I started that off and went
home before I got your post.  It finished in 3 hours and 31 minutes
-- more than twice as fast as the nestloop plan used under 8.3.

But wait -- it turns out that this pain was self-inflicted.  Based
on heavy testing of the interactive queries which users run against
this database we tuned the database for "fully-cached" settings,
with both random_page_cost and _seq_page_cost at 0.1.  In a
practical sense, the users are almost always running these queries
against very recent data which is, in fact, heavily cached -- so
it's no surprise that the queries they run perform best with plans
based on such costing.  The problem is that these weekly maintenance
runs need to pass the entire database, so caching effects are far
less pronounced.  If I set seq_page_cost = 1 and random_page_cost =
2 I get exactly the same (fast) plan as above.

I guess the lesson here is not to use the same costing for
database-wide off-hours maintenance queries as for ad hoc queries
against a smaller set of recent data by users who expect quick
response time.  I'm fine with tweaking the costs in our maintenance
scripts, but it does tend to make me daydream about how the
optimizer might possibly auto-tweak such things....

I assume there's now no need to get timings for the old plan?

-Kevin

pgsql-performance by date:

Previous
From: tv@fuzzy.cz
Date:
Subject: Re: Why dose the planner select one bad scan plan.
Next
From: Tom Lane
Date:
Subject: Re: anti-join chosen even when slower than old plan