Home > mailing lists

Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance

From	Kevin Grittner
Subject	Re: anti-join chosen even when slower than old plan
Date	November 10, 2010 11:21:27
Msg-id	4CDA62C60200002500037512@gw.wicourts.gov Whole thread Raw
In response to	Re: anti-join chosen even when slower than old plan (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: anti-join chosen even when slower than old plan Re: anti-join chosen even when slower than old plan
List	pgsql-performance

Tree view

Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
>> The semi-join and anti-join have helped us quite a bit, but we
>> have seen a situation where anti-join is chosen even though it is
>> slower than the "old fashioned" plan.  I know there have been
>> other reports of this, but I just wanted to go on record with my
>> details.
>
> In principle, the old-style plan ought to be equivalent to a
> nestloop antijoin with a seqscan of DbTranLogRecord on the outside
> and an indexscan of DbTranRepository on the inside.  Can you force
> it to choose such a plan by setting enable_mergejoin off (and
> maybe enable_hashjoin too)?

Well, I got what I think is the equivalent plan by adding OFFSET 0
to the subquery:

 Delete  (cost=0.00..1239005015.67 rows=337702752 width=6)
   ->  Seq Scan on "DbTranLogRecord"  (cost=0.00..1239005015.67
rows=337702752 width=6)
         Filter: (NOT (SubPlan 1))
         SubPlan 1
           ->  Limit  (cost=0.00..1.82 rows=1 width=974)
                 ->  Index Scan using "DbTranRepositoryPK" on
"DbTranRepository" r  (cost=0.00..1.82 rows=1 width=974)
                       Index Cond: ((("countyNo")::smallint =
($0)::smallint) AND (("tranImageSeqNo")::numeric = ($1)::numeric))

> If so, it'd be interesting to see the estimated costs and actual
> runtime on 9.0 for that plan.

Unfortunately, based on the oprofile information I decided to check
out the plan I would get by boosting cpu_index_tuple_cost by a
factor of 20.  The resulting plan was:

 Delete  (cost=132623778.83..139506491.18 rows=1 width=12)
   ->  Merge Anti Join  (cost=132623778.83..139506491.18 rows=1
width=12)
         Merge Cond: ((("DbTranLogRecord"."tranImageSeqNo")::numeric
= (r."tranImageSeqNo")::numeric) AND
(("DbTranLogRecord"."countyNo")::smallint =
(r."countyNo")::smallint))
         ->  Sort  (cost=107941675.79..109630189.55 rows=675405504
width=20)
               Sort Key: "DbTranLogRecord"."tranImageSeqNo",
"DbTranLogRecord"."countyNo"
               ->  Seq Scan on "DbTranLogRecord"
(cost=0.00..7306496.14 rows=675405504 width=20)
         ->  Materialize  (cost=24682103.04..25443983.12
rows=152376016 width=20)
               ->  Sort  (cost=24682103.04..25063043.08
rows=152376016 width=20)
                     Sort Key: r."tranImageSeqNo", r."countyNo"
                     ->  Seq Scan on "DbTranRepository" r
(cost=0.00..3793304.86 rows=152376016 width=20)

That looked like it had potential, so I started that off and went
home before I got your post.  It finished in 3 hours and 31 minutes
-- more than twice as fast as the nestloop plan used under 8.3.

But wait -- it turns out that this pain was self-inflicted.  Based
on heavy testing of the interactive queries which users run against
this database we tuned the database for "fully-cached" settings,
with both random_page_cost and _seq_page_cost at 0.1.  In a
practical sense, the users are almost always running these queries
against very recent data which is, in fact, heavily cached -- so
it's no surprise that the queries they run perform best with plans
based on such costing.  The problem is that these weekly maintenance
runs need to pass the entire database, so caching effects are far
less pronounced.  If I set seq_page_cost = 1 and random_page_cost =
2 I get exactly the same (fast) plan as above.

I guess the lesson here is not to use the same costing for
database-wide off-hours maintenance queries as for ad hoc queries
against a smaller set of recent data by users who expect quick
response time.  I'm fine with tweaking the costs in our maintenance
scripts, but it does tend to make me daydream about how the
optimizer might possibly auto-tweak such things....

I assume there's now no need to get timings for the old plan?

-Kevin

pgsql-performance by date:

From: tv@fuzzy.cz
Date: 10 November 2010, 08:54:01
Subject: Re: Why dose the planner select one bad scan plan.

From: Tom Lane
Date: 10 November 2010, 11:36:03
Subject: Re: anti-join chosen even when slower than old plan

Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance

Previous

Next