Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance

From Simon Riggs
Subject Re: hashjoin chosen over 1000x faster plan
Date
Msg-id 1192035955.4233.306.camel@ebony.site
Whole thread Raw
In response to Re: hashjoin chosen over 1000x faster plan  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses Re: hashjoin chosen over 1000x faster plan
List pgsql-performance
On Wed, 2007-10-10 at 09:15 -0500, Kevin Grittner wrote:
> >>> On Wed, Oct 10, 2007 at  1:31 AM, in message
> <1191997904.4233.125.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com>
> wrote:
> > On Tue, 2007-10-09 at 15:09 -0500, Kevin Grittner wrote:
> >
> >> I have a situation where a query is running much slower than I would
> >> expect.  The ANALYZE showed that it is hashing some information which
> >> is rarely needed.  When I set enable_hashjoin = off for the
> >> connection the query run in 1/1000 the time.
> >
> > Can you confirm the two queries give identical outputs?
>
> I checked; the output is identical.
>
> > It isn't clear
> > to me why the second sort is (never executed) in your second plan, which
> > I would only expect to see for an inner merge join.
>
> I assume that is because there were no rows to sort.  The
> CaseTypeHistEvent view is only needed if there is a link to an event
> which reopens the charge after it is disposed.  This only happens for
> about 1% of the Charge records.

So CHST.EventType is mostly NULL? So the good news is that the default
plan works best when it does actually find a match. So for 1% of cases
you will have an execution time of about 1s, <1ms for the others if you
fiddle with the planner methods.

The planner thinks every row will find a match, yet the actual number is
only 1%. Hmmm, same section of code as last week.

Basically the planner doesn't ever optimise for the possibility of the
never-executed case because even a single row returned would destroy
that assumption.

If we had an Option node in there, we could run the first part of the
plan before deciding whether to do an MJ or an HJ. Doing that would
avoid doing 2 sorts and return even quicker in the common case (about
80% time) without being slower in the slowest.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


pgsql-performance by date:

Previous
From: "Scott Marlowe"
Date:
Subject: Re: Shared Buffer setting in postgresql.conf
Next
From: Josh Trutwin
Date:
Subject: Re: SQL Monitoring