Home > mailing lists

Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance

From	Simon Riggs
Subject	Re: hashjoin chosen over 1000x faster plan
Date	October 10, 2007 14:05:57
Msg-id	1192035955.4233.306.camel@ebony.site Whole thread Raw
In response to	Re: hashjoin chosen over 1000x faster plan ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Responses	Re: hashjoin chosen over 1000x faster plan
List	pgsql-performance

Tree view

On Wed, 2007-10-10 at 09:15 -0500, Kevin Grittner wrote:
> >>> On Wed, Oct 10, 2007 at  1:31 AM, in message
> <1191997904.4233.125.camel@ebony.site>, Simon Riggs <simon@2ndquadrant.com>
> wrote:
> > On Tue, 2007-10-09 at 15:09 -0500, Kevin Grittner wrote:
> >
> >> I have a situation where a query is running much slower than I would
> >> expect.  The ANALYZE showed that it is hashing some information which
> >> is rarely needed.  When I set enable_hashjoin = off for the
> >> connection the query run in 1/1000 the time.
> >
> > Can you confirm the two queries give identical outputs?
>
> I checked; the output is identical.
>
> > It isn't clear
> > to me why the second sort is (never executed) in your second plan, which
> > I would only expect to see for an inner merge join.
>
> I assume that is because there were no rows to sort.  The
> CaseTypeHistEvent view is only needed if there is a link to an event
> which reopens the charge after it is disposed.  This only happens for
> about 1% of the Charge records.

So CHST.EventType is mostly NULL? So the good news is that the default
plan works best when it does actually find a match. So for 1% of cases
you will have an execution time of about 1s, <1ms for the others if you
fiddle with the planner methods.

The planner thinks every row will find a match, yet the actual number is
only 1%. Hmmm, same section of code as last week.

Basically the planner doesn't ever optimise for the possibility of the
never-executed case because even a single row returned would destroy
that assumption.

If we had an Option node in there, we could run the first part of the
plan before deciding whether to do an MJ or an HJ. Doing that would
avoid doing 2 sorts and return even quicker in the common case (about
80% time) without being slower in the slowest.

--
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com

pgsql-performance by date:

From: "Scott Marlowe"
Date: 10 October 2007, 12:20:14
Subject: Re: Shared Buffer setting in postgresql.conf

From: Josh Trutwin
Date: 10 October 2007, 14:25:49
Subject: Re: SQL Monitoring

Re: hashjoin chosen over 1000x faster plan - Mailing list pgsql-performance

Previous

Next