Home > mailing lists

Re: multiple joins + Order by + LIMIT query performance issue - Mailing list pgsql-performance

From	Shaun Thomas
Subject	Re: multiple joins + Order by + LIMIT query performance issue
Date	May 6, 2008 13:49:22
Msg-id	1210092220.14833.33.camel@berners-lee Whole thread Raw
In response to	multiple joins + Order by + LIMIT query performance issue (Antoine Baudoux <ab@taktik.be>)
Responses	Re: multiple joins + Order by + LIMIT query performance issue Re: multiple joins + Order by + LIMIT query performance issue
List	pgsql-performance

Tree view

On Tue, 2008-05-06 at 16:03 +0100, Antoine Baudoux wrote:

> My understanding is that in the first case the sort is
> done after all the table joins and filtering, but in the
> second case ALL the rows in t_event are scanned and sorted
> before the join.

You've actually run into a problem that's bitten us in the ass a couple
of times.  The problem with your second query is that it's *too*
efficient.  You'll notice the first plan uses a bevy of nest-loops,
which is very risky if the row estimates are not really really accurate.
The planner says "Hey, customer_id=1 could be several rows in the
t_network table, but not too many... I better check them one by one."
I've turned off nest-loops sometimes to avoid queries that would run
several hours due to mis-estimation, but it looks like yours was just
fine.

The second query says "Awesome!  Only one network... I can just search
the index of t_event backwards for this small result set!"

But here's the rub... try your query *without* the limit clause, and you
may find it's actually faster, because the planner suddenly thinks it
will have to scan the whole table, so it choses an alternate plan
(probably back to the nest-loop).  Alternatively, take off the order-by
clause, and it'll remove the slow backwards index-scan.

I'm not sure what causes this, but the problem with indexes is that
they're not necessarily in the order you want unless you also cluster
them, so a backwards index scan is almost always the wrong answer.
Personally I consider this a bug, and it's been around since at least
the 8.1 tree.  The only real answer is that you have a fast version of
the query, so try and play with it until it acts the way you want.

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com

pgsql-performance by date:

From: Antoine Baudoux
Date: 06 May 2008, 13:42:42
Subject: Re: multiple joins + Order by + LIMIT query performance issue

From: Shaun Thomas
Date: 06 May 2008, 13:50:33
Subject: Re: need to speed up query

Re: multiple joins + Order by + LIMIT query performance issue - Mailing list pgsql-performance

Previous

Next