Re: multiple joins + Order by + LIMIT query performance issue - Mailing list pgsql-performance

From Shaun Thomas
Subject Re: multiple joins + Order by + LIMIT query performance issue
Date
Msg-id 1210092220.14833.33.camel@berners-lee
Whole thread Raw
In response to multiple joins + Order by + LIMIT query performance issue  (Antoine Baudoux <ab@taktik.be>)
Responses Re: multiple joins + Order by + LIMIT query performance issue  (Antoine Baudoux <ab@taktik.be>)
Re: multiple joins + Order by + LIMIT query performance issue  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
On Tue, 2008-05-06 at 16:03 +0100, Antoine Baudoux wrote:

> My understanding is that in the first case the sort is
> done after all the table joins and filtering, but in the
> second case ALL the rows in t_event are scanned and sorted
> before the join.

You've actually run into a problem that's bitten us in the ass a couple
of times.  The problem with your second query is that it's *too*
efficient.  You'll notice the first plan uses a bevy of nest-loops,
which is very risky if the row estimates are not really really accurate.
The planner says "Hey, customer_id=1 could be several rows in the
t_network table, but not too many... I better check them one by one."
I've turned off nest-loops sometimes to avoid queries that would run
several hours due to mis-estimation, but it looks like yours was just
fine.

The second query says "Awesome!  Only one network... I can just search
the index of t_event backwards for this small result set!"

But here's the rub... try your query *without* the limit clause, and you
may find it's actually faster, because the planner suddenly thinks it
will have to scan the whole table, so it choses an alternate plan
(probably back to the nest-loop).  Alternatively, take off the order-by
clause, and it'll remove the slow backwards index-scan.

I'm not sure what causes this, but the problem with indexes is that
they're not necessarily in the order you want unless you also cluster
them, so a backwards index scan is almost always the wrong answer.
Personally I consider this a bug, and it's been around since at least
the 8.1 tree.  The only real answer is that you have a fast version of
the query, so try and play with it until it acts the way you want.

--

Shaun Thomas
Database Administrator

Leapfrog Online
807 Greenwood Street
Evanston, IL 60201
Tel. 847-440-8253
Fax. 847-570-5750
www.leapfrogonline.com



pgsql-performance by date:

Previous
From: Antoine Baudoux
Date:
Subject: Re: multiple joins + Order by + LIMIT query performance issue
Next
From: Shaun Thomas
Date:
Subject: Re: need to speed up query