Home > mailing lists

Re: Index condition in a Nested Loop - Mailing list pgsql-performance

From	Mark Hills
Subject	Re: Index condition in a Nested Loop
Date	February 27, 2012 19:14:22
Msg-id	alpine.LNX.2.01.1202272211230.12105@stax.localdomain Whole thread
In response to	Re: Index condition in a Nested Loop (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: Index condition in a Nested Loop
List	pgsql-performance

Tree view

On Sun, 26 Feb 2012, Tom Lane wrote:

> Mark Hills <mark@pogo.org.uk> writes:
> > What is that prevents the index condition from being used in earlier parts
> > of the query? Only where a single condition is present is it be used below
> > the final join.
>
> "WHERE job.jid IN (1234)" is simplified to "WHERE job.jid = 1234", and
> that in combination with "JOIN ON job.jid = middle.jid" allows deduction
> of "middle.jid = 1234" a/k/a "task.jid = 1234", leading to the
> recognition that only one row from "task" is needed.  There is no such
> transitive propagation of general IN clauses.  The problem with your
> slower queries is not that they're using merge joins, it's that there's
> no scan-level restriction on the task table so that whole table has to
> be scanned.
>
> Another thing that's biting you is that the GROUP BY in the view acts as
> a partial optimization fence: there's only a limited amount of stuff
> that can get pushed down through that.  You might consider rewriting the
> view to avoid that, along the lines of
>
> create view middle2 as
>   SELECT task.jid, task.tid,
>     (select count(resource.name) from resource where task.tid = resource.tid) AS nresource
>   FROM task;
>
> This is not perfect: this formulation forces the system into essentially
> a nestloop join between task and resource.  In cases where you actually
> want results for a lot of task rows, that's going to lose badly.  But in
> the examples you're showing here, it's going to work better.

Thanks for this. Indeed it does work better, and it's exactly the method I
was hoping the planner could use to execute the query.

I modified the report on the previous week's data, and it now runs 6x
faster (in a database containing approx. 2 years of data). There are
several similar reports. Some queries work on only a hanful of jobs and
this change ensures they are instant.

I hadn't realised that sub-queries restrict the planner so much. Although
at some point I've picked up a habit of avoiding them, presumably for this
reason.

If you have time to explain, I'd be interested in a suggestion for any
change to the planner that could make a small contribution towards
improving this. eg. a small project that could get me into the planner
code.

Many thanks for your help,

--
Mark

pgsql-performance by date:

From: Filippos Kalamidas
Date: 27 February 2012, 18:00:12
Subject: Re: set autovacuum=off

From: Tom Lane
Date: 27 February 2012, 20:17:14
Subject: Re: Index condition in a Nested Loop

Re: Index condition in a Nested Loop - Mailing list pgsql-performance

Previous

Next