Re: anti-join chosen even when slower than old plan - Mailing list pgsql-performance

From Tom Lane
Subject Re: anti-join chosen even when slower than old plan
Date
Msg-id 20961.1289347680@sss.pgh.pa.us
Whole thread Raw
In response to anti-join chosen even when slower than old plan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: anti-join chosen even when slower than old plan  (Grzegorz Jaśkiewicz <gryzman@gmail.com>)
Re: anti-join chosen even when slower than old plan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-performance
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes:
> The semi-join and anti-join have helped us quite a bit, but we have
> seen a situation where anti-join is chosen even though it is slower
> than the "old fashioned" plan.  I know there have been other reports
> of this, but I just wanted to go on record with my details.

In principle, the old-style plan ought to be equivalent to a nestloop
antijoin with a seqscan of DbTranLogRecord on the outside and an
indexscan of DbTranRepository on the inside.  Can you force it to choose
such a plan by setting enable_mergejoin off (and maybe enable_hashjoin
too)?  If so, it'd be interesting to see the estimated costs and actual
runtime on 9.0 for that plan.

It would also be interesting to check estimated and actual costs for the
SELECT COUNT(*) versions of these queries, ie, no actual delete.  I'm
suspicious that the cost differential has nothing to do with antijoin
vs. subplan, and everything to do with whether the targeted tuples are
being deleted in physical order (thus improving locality of access for
the deletions).  If it's the latter, see previous discussions about
possibly sorting update/delete targets by CTID before applying the
actions.

            regards, tom lane

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: Huge overestimation in rows expected results in bad plan
Next
From: bricklen
Date:
Subject: Re: Huge overestimation in rows expected results in bad plan