Thread: Yet another question on LIMIT performance :/

Yet another question on LIMIT performance :/

From

Hannes Dorbath

Date:

06 November 2006, 10:09:45

Though I've read recent threads, I'm unsure if any matches my case.

We have 2 tables: revisions and revisions_active. revisions contains
117707 rows, revisions_active 17827 rows.

DDL: http://hannes.imos.net/ddl.sql.txt

Joining the 2 tables without an additional condition seems ok for me
(given our outdated hardware): http://hannes.imos.net/query_1.sql.txt

What worries me is the performance when limiting the recordset:
http://hannes.imos.net/query_2.sql.txt

Though it should only have to join a few rows it seems to scan all rows.
 From experience I thought that adding an ORDER BY on the index columns
should speed it up. But no effect: http://hannes.imos.net/query_3.sql.txt

I'm on 8.1.5, statistics (ANALYZE) are up to date, the tables have each
been CLUSTERed by PK, statistic target for the join columns has been set
to 100 (without any effect).


Thanks in advance!


--
Regards,
Hannes Dorbath

Re: Yet another question on LIMIT performance :/

From

"Heikki Linnakangas"

Date:

06 November 2006, 10:14:06

Hannes Dorbath wrote:
> Though it should only have to join a few rows it seems to scan all rows.

What makes you think that's the case?

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Re: Yet another question on LIMIT performance :/

From

Hannes Dorbath

Date:

06 November 2006, 10:18:07

On 06.11.2006 15:13, Heikki Linnakangas wrote:
> Hannes Dorbath wrote:
>> Though it should only have to join a few rows it seems to scan all rows.
>
> What makes you think that's the case?

Sorry, not all rows, but 80753. It's not clear to me why this number is
so high with LIMIT 10.

--
Regards,
Hannes Dorbath

Re: Yet another question on LIMIT performance :/

From

Tom Lane

Date:

06 November 2006, 10:33:10

"Heikki Linnakangas" <heikki@enterprisedb.com> writes:
> Hannes Dorbath wrote:
>> Though it should only have to join a few rows it seems to scan all rows.

> What makes you think that's the case?

What it looks like to me is that the range of keys present in
pk_revisions_active corresponds to just the upper end of the range of
keys present in pk_revisions (somehow not too surprising).  So the
mergejoin isn't the most effective plan possible for this case --- it
has to scan through much of pk_revisions before it starts getting
matches.  The planner doesn't have any model for that though, and is
costing the plan on the assumption of uniformly-distributed matches.

A nestloop plan would be faster for this specific case, but much
slower if a large number of rows were requested.

            regards, tom lane