Re: Optimizer: limit not taken into account - Mailing list pgsql-performance

From Bruno Wolff III
Subject Re: Optimizer: limit not taken into account
Date
Msg-id 20060517174426.GA14567@wolff.to
Whole thread Raw
In response to Optimizer: limit not taken into account  ("Craig A. James" <cjames@modgraph-usa.com>)
Responses Re: Optimizer: limit not taken into account  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
Please don't reply to previous messages to start new threads. This makes it
harder to find stuff in the archives and may keep people from noticing your
message.

On Wed, May 17, 2006 at 08:54:52 -0700,
  "Craig A. James" <cjames@modgraph-usa.com> wrote:
> Here's a "corner case" that might interest someone.  It tripped up one of
> our programmers.
>
> We have a table with > 10 million rows.  The ID column is indexed, the
> table has been vacuum/analyzed.  Compare these two queries:
>
>   select * from tbl where id >= 10000000 limit 1;
>   select * from tbl where id >= 10000000 order by id limit 1;
>
> The first takes 4 seconds, and uses a full table scan.  The second takes 32
> msec and uses the index.  Details are below.

I suspect it wasn't intended to be a full table scan. But rather a sequential
scan until it found a matching row. If the data in the table is ordered by
by id, this strategy may not work out well. Where as if the data is randomly
ordered, it would be expected to find a match quickly.

Have you analyzed the table recently? If the planner has bad stats on the
table, that is going to make it more likely to choose a bad plan.


> I understand why the planner makes the choices it does -- the "id >
> 10000000" isn't very selective and under normal circumstances a full table
> scan is probably the right choice.  But the "limit 1" apparently doesn't
> alter the planner's strategy at all.  We were surprised by this.
>
> Adding the "order by" was a simple solution.
>
> Craig
>
>
>
> pg=> explain analyze select url, url_digest from url_queue where priority
> >= 10000000 limit 1;
>                       QUERY PLAN
> ------------------------------------------------------------------------------------------
> Limit  (cost=0.00..0.65 rows=1 width=108) (actual time=4036.113..4036.117
> rows=1 loops=1)
>   ->  Seq Scan on url_queue  (cost=0.00..391254.35 rows=606176 width=108)
>   (actual time=4036.101..4036.101 rows=1 loops=1)
>         Filter: (priority >= 10000000)
> Total runtime: 4036.200 ms
> (4 rows)
>
> pg=> explain analyze select url, url_digest from url_queue where priority
> >= 10000000 order by priority limit 1;
>                       QUERY PLAN
> --------------------------------------------------------------------------------------
> Limit  (cost=0.00..2.38 rows=1 width=112) (actual time=32.445..32.448
> rows=1 loops=1)
>   ->  Index Scan using url_queue_priority on url_queue
>   (cost=0.00..1440200.41 rows=606176 width=112) (actual time=32.434..32.434
>   rows=1 loops=1)
>         Index Cond: (priority >= 10000000)
> Total runtime: 32.566 ms
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

pgsql-performance by date:

Previous
From: john_oshea@wordbank.com
Date:
Subject: Re: SQL CPU time usage
Next
From: Chris Mckenzie
Date:
Subject: Performance/Maintenance test result collection