Thread: ORDER BY, LIMIT and indexes

ORDER BY, LIMIT and indexes

From

Ivan Voras

Date:

05 August 2013, 23:04:57

Hello,

Assuming I have a huge table (doesn't fit in RAM), of which the most
important fields are "id" which is a SERIAL PRIMARY KEY and "active"
which is a boolean, and I'm issuing a query like:

SELECT * FROM table ORDER BY id DESC LIMIT 10

... is pgsql smart enough to use the index to fetch only the 10
required rows instead of reading the whole table, then sorting it,
then trimming the result set? How about in the following queries:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10

SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10

Or, more generally, is there some set of circumstances under which the
catastrophic scenario will happen?

Re: ORDER BY, LIMIT and indexes

From

Claudio Freire

Date:

05 August 2013, 23:26:05

On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@freebsd.org> wrote:
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>
> SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10

Did you try explain?

Re: ORDER BY, LIMIT and indexes

From

Michael Paquier

Date:

06 August 2013, 00:20:23

On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire <klaussfreire@gmail.com> wrote:

On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@freebsd.org> wrote:
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>
> SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10

Did you try explain?

And did you run ANALYZE on your table to be sure that you generate correct plans?

--
Michael

Re: ORDER BY, LIMIT and indexes

From

Josh Berkus

Date:

06 August 2013, 01:22:23

Ivan,

> Or, more generally, is there some set of circumstances under which the
> catastrophic scenario will happen?

Yes:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 100000

This is the "high offset" problem, and affects all databases which
support applications with paginated results, including non-relational
ones like SOLR.  The basic problem is that you can't figure out what is
OFFSET 100000 without first sorting the first 100000 results.

The easiest solution is to limit the number of pages your users can
"flip through".  Generally anyone asking for page 10,000 is a bot
screen-scraping your site, anyway.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: ORDER BY, LIMIT and indexes

From

Sergey Konoplev

Date:

06 August 2013, 01:43:16

On Mon, Aug 5, 2013 at 6:22 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> Or, more generally, is there some set of circumstances under which the
>> catastrophic scenario will happen?
>
> Yes:
>
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 100000
>
> This is the "high offset" problem, and affects all databases which
> support applications with paginated results, including non-relational
> ones like SOLR.  The basic problem is that you can't figure out what is
> OFFSET 100000 without first sorting the first 100000 results.
>
> The easiest solution is to limit the number of pages your users can
> "flip through".  Generally anyone asking for page 10,000 is a bot
> screen-scraping your site, anyway.

In addition to Josh's answer I would like to mention that it might be
worth to use partial index like this

CREATE INDEX i_table_id_active ON table (is) WHERE active

in this particular case

SELECT * FROM table
WHERE active
ORDER BY id DESC
LIMIT 10 OFFSET 10

so it will prevent from long filtering tons of rows in case of long
"NOT active" gaps in the beginning of the scanning sequence.

As an alternative solution for pagination (OFFSET) problem you might
also use the "prev/next" technique, like

SELECT * FROM table
WHERE id > :current_last_id
ORDER BY id LIMIT 10

for "next", and

SELECT * FROM (
    SELECT * FROM table
    WHERE id < :current_first_id
    ORDER BY id DESC
    LIMIT 10
) AS sq ORDER BY id

for "prev". It will be very fast.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: ORDER BY, LIMIT and indexes

From

David Johnston

Date:

06 August 2013, 01:54:36

Sergey Konoplev-2 wrote
> As an alternative solution for pagination (OFFSET) problem you might
> also use the "prev/next" technique, like
>
> SELECT * FROM table
> WHERE id > :current_last_id
> ORDER BY id LIMIT 10
>
> for "next", and
>
> SELECT * FROM (
>     SELECT * FROM table
>     WHERE id < :current_first_id
>     ORDER BY id DESC
>     LIMIT 10
> ) AS sq ORDER BY id
>
> for "prev". It will be very fast.

Even being fairly experienced at SQL generally because I haven't explored
pagination that much my awareness of the OFFSET issue led me to conclude bad
things.  Thank you for thinking to take the time for a brief moment of
enlightenment of something you likely take for granted by now.

Curious how much slower/faster these queries would run if you added:

SELECT *, first_value(id) OVER (...), last_value(id) OVER (...)
--note the window specifications need to overcome the "ORDER BY" limitation
noted in the documentation.

to the query.  Using the window functions you know at each record what the
first and last ids are for its window.  Applicability would be
application/need specific but it would avoid having to calculate/maintain
these two values in a separate part of the application.

David J.





--
View this message in context: http://postgresql.1045698.n5.nabble.com/ORDER-BY-LIMIT-and-indexes-tp5766413p5766429.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: ORDER BY, LIMIT and indexes

From

Ivan Voras

Date:

06 August 2013, 10:05:00

On 6 August 2013 02:20, Michael Paquier <michael.paquier@gmail.com> wrote:
>
> On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire <klaussfreire@gmail.com>
> wrote:
>>
>> On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@freebsd.org> wrote:
>> > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>> >
>> > SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10
>>
>> Did you try explain?
>
> And did you run ANALYZE on your table to be sure that you generate correct
> plans?

My question was a theoretical one - about general pgsql abilities, not
a specific case.

But after prodded by you, here's an EXPLAIN ANALYZE for the first case:

ivoras=# explain analyze select * from lt order by id desc limit 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.29 rows=10 width=9) (actual time=0.034..0.053
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
rows=1000000 width=9) (actual time=0.031..0.042 rows=10 loops=1)
 Total runtime: 0.115 ms
(3 rows)

(This test table has only 1 mil. records.)

I'm not sure how to interpret the last line:
(cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.031..0.042
rows=10 loops=1)

It seems to me like the planner thinks the Index Scan operation will
return the entire table, and expects both a huge cost and all of the
records, but its actual implementation does return only 10 records.
The Limit operation is a NOP in this case. Is this correct?

In the second case (with OFFSET):
ivoras=# explain analyze select * from lt order by id desc limit 10 offset 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..0.57 rows=10 width=9) (actual time=0.060..0.082
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
rows=1000000 width=9) (actual time=0.040..0.061 rows=20 loops=1)
 Total runtime: 0.136 ms
(3 rows)

It looks like the Index Scan implementation actually returns the first
20 records - which means that for the last "page" of the supposed
paginated display, pgsql will actually internally read the entire
table as an operation result in memory and discard almost all of it in
the Limit operation. Right?

And for the third case (with another column in WITH):
ivoras=# update lt set active = false where (id % 2) = 0;
UPDATE 500000
ivoras=# explain analyze select * from lt where active order by id
desc limit 10;
                                                              QUERY
PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.32 rows=10 width=9) (actual time=0.135..0.186
rows=10 loops=1)
   ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..47380.43
rows=1500000 width=9) (actual time=0.132..0.174 rows=10 loops=1)
         Filter: active
 Total runtime: 0.244 ms
(4 rows)

It looks like filtering is performed in the Index Scan operation -
which I never expected. Is this usually done or only for simple
queries. In other words, is there a circumstance where it is NOT done
in this way, and should I care?

Based on the results with OFFSET it looks like it would be a bad idea
to use pgsql for allowing the user to browse through the records in a
paginated form if the table is huge. I would very much like to be
proven wrong :)

Re: ORDER BY, LIMIT and indexes

From

Ivan Voras

Date:

06 August 2013, 10:47:14

Here are two more unexpected results. Same test table (1 mil. records,
"id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
before the experiments):

ivoras=# explain analyze select * from lt where id > 900000 limit 10;
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.71 rows=10 width=9) (actual
time=142.669..142.680 rows=10 loops=1)
   ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
(actual time=142.665..142.672 rows=10 loops=1)
         Filter: (id > 900000)
 Total runtime: 142.735 ms
(4 rows)

Note the Seq Scan.

ivoras=# explain analyze select * from lt where id > 900000;
                                                          QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on lt  (cost=1683.97..7856.35 rows=101630 width=9)
(actual time=38.462..85.780 rows=100000 loops=1)
   Recheck Cond: (id > 900000)
   ->  Bitmap Index Scan on lt_pkey  (cost=0.00..1658.56 rows=101630
width=0) (actual time=38.310..38.310 rows=100000 loops=1)
         Index Cond: (id > 900000)
 Total runtime: 115.674 ms
(5 rows)

This somewhat explains the above case - we are simply fetching 100,000
records here, and it's slow enough even with the index scan, so
planner skips the index in the former case. BUT, if it did use the
index, it would have been expectedly fast:

ivoras=# set enable_seqscan to off;
SET
ivoras=# explain analyze select * from lt where id > 900000 limit 10;
                                                         QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
rows=10 loops=1)
   ->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
         Index Cond: (id > 900000)
 Total runtime: 0.175 ms
(4 rows)

It looks like the problem is in the difference between what the
planner expects and what the Filter or Index operations deliver:
(cost=0.00..17402.00 rows=101630 width=9) (actual
time=142.665..142.672 rows=10 loops=1).

Re: ORDER BY, LIMIT and indexes

From

Claudio Freire

Date:

06 August 2013, 15:47:06

On Tue, Aug 6, 2013 at 7:46 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..1.71 rows=10 width=9) (actual
> time=142.669..142.680 rows=10 loops=1)
>    ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
> (actual time=142.665..142.672 rows=10 loops=1)
>          Filter: (id > 900000)
>  Total runtime: 142.735 ms
> (4 rows)

I think the problem lies in assuming the sequential scan will have 0
startup cost, which is not the case here (it will have to scan up to
the first page with an id > 900000).

If that were to be fixed, the index scan would be chosen instead.

I don't see a way to fix it without querying the index, which could be
a costly operation... except with the newly proposed minmax indexes. I
guess here's another case where those indexes would be useful.

Re: ORDER BY, LIMIT and indexes

From

Nikolas Everett

Date:

06 August 2013, 16:10:11

On Mon, Aug 5, 2013 at 9:22 PM, Josh Berkus <josh@agliodbs.com> wrote:

Ivan,

> Or, more generally, is there some set of circumstances under which the
> catastrophic scenario will happen?

Yes:

SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 100000

This is the "high offset" problem, and affects all databases which
support applications with paginated results, including non-relational
ones like SOLR. The basic problem is that you can't figure out what is
OFFSET 100000 without first sorting the first 100000 results.

The easiest solution is to limit the number of pages your users can
"flip through". Generally anyone asking for page 10,000 is a bot
screen-scraping your site, anyway.

Another solution is to build pages from the maximum id you pulled in the last page so page one is:
SELECT * FROM table ORDER BY id DESC LIMIT 10

and page 2 is:
SELECT * FROM table WHERE id > 19 ORDER BY id DESC LIMIT 10
and page 3 is:
SELECT * FROM table WHERE id > 37 ORDER BY id DESC LIMIT 10

and so on. You build your urls like this:
http://yousite.com/browse
http://yousite.com/browse?after=19
http://yousite.com/browse?after=37

and so on.

Re: ORDER BY, LIMIT and indexes

From

Jeff Janes

Date:

06 August 2013, 16:57:22

On Tue, Aug 6, 2013 at 3:04 AM, Ivan Voras <ivoras@freebsd.org> wrote:
>
> ivoras=# explain analyze select * from lt order by id desc limit 10;
>                                                               QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.29 rows=10 width=9) (actual time=0.034..0.053
> rows=10 loops=1)
>    ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
> rows=1000000 width=9) (actual time=0.031..0.042 rows=10 loops=1)
>  Total runtime: 0.115 ms
> (3 rows)
>
> (This test table has only 1 mil. records.)
>
> I'm not sure how to interpret the last line:
> (cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.031..0.042
> rows=10 loops=1)
>
> It seems to me like the planner thinks the Index Scan operation will
> return the entire table, and expects both a huge cost and all of the
> records, but its actual implementation does return only 10 records.
> The Limit operation is a NOP in this case. Is this correct?

The Index Scan line reports the cost estimate of the entire scan, the
Limit line chops that cost estimate down based on the limit.

>
> In the second case (with OFFSET):
> ivoras=# explain analyze select * from lt order by id desc limit 10 offset 10;
>                                                               QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.29..0.57 rows=10 width=9) (actual time=0.060..0.082
> rows=10 loops=1)
>    ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..28673.34
> rows=1000000 width=9) (actual time=0.040..0.061 rows=20 loops=1)
>  Total runtime: 0.136 ms
> (3 rows)
>
> It looks like the Index Scan implementation actually returns the first
> 20 records - which means that for the last "page" of the supposed
> paginated display, pgsql will actually internally read the entire
> table as an operation result in memory and discard almost all of it in
> the Limit operation. Right?

It would not necessarily do it in memory.  If it used an index scan
(which it probably would not), it would just need to count the rows
skipped over, not store them.  If it used a sort, it might spill to
disk.


>
> And for the third case (with another column in WITH):
> ivoras=# update lt set active = false where (id % 2) = 0;
> UPDATE 500000
> ivoras=# explain analyze select * from lt where active order by id
> desc limit 10;
>                                                               QUERY
> PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.32 rows=10 width=9) (actual time=0.135..0.186
> rows=10 loops=1)
>    ->  Index Scan Backward using lt_pkey on lt  (cost=0.00..47380.43
> rows=1500000 width=9) (actual time=0.132..0.174 rows=10 loops=1)
>          Filter: active
>  Total runtime: 0.244 ms
> (4 rows)
>
> It looks like filtering is performed in the Index Scan operation -
> which I never expected. Is this usually done or only for simple
> queries. In other words, is there a circumstance where it is NOT done
> in this way, and should I care?

I would expect the filter to be done at the first point it can
conveniently be done at.  You probably shouldn't care.

> Based on the results with OFFSET it looks like it would be a bad idea
> to use pgsql for allowing the user to browse through the records in a
> paginated form if the table is huge. I would very much like to be
> proven wrong :)

Do you expect to have a lot of users who will hit "next page" 50,000
times and carefully read through each set of results?  I think this is
a problem I would worry about when it knocked on my door.  Unless I
was mostly worried about denial of service attacks.

Cheers,

Jeff

Re: ORDER BY, LIMIT and indexes

From

Mark Kirkwood

Date:

06 August 2013, 22:56:45

On 06/08/13 22:46, Ivan Voras wrote:
> Here are two more unexpected results. Same test table (1 mil. records,
> "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
> before the experiments):
>
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                     QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..1.71 rows=10 width=9) (actual
> time=142.669..142.680 rows=10 loops=1)
>     ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
> (actual time=142.665..142.672 rows=10 loops=1)
>           Filter: (id > 900000)
>   Total runtime: 142.735 ms
> (4 rows)
>
> Note the Seq Scan.
>
> ivoras=# explain analyze select * from lt where id > 900000;
>                                                            QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------
>   Bitmap Heap Scan on lt  (cost=1683.97..7856.35 rows=101630 width=9)
> (actual time=38.462..85.780 rows=100000 loops=1)
>     Recheck Cond: (id > 900000)
>     ->  Bitmap Index Scan on lt_pkey  (cost=0.00..1658.56 rows=101630
> width=0) (actual time=38.310..38.310 rows=100000 loops=1)
>           Index Cond: (id > 900000)
>   Total runtime: 115.674 ms
> (5 rows)
>
> This somewhat explains the above case - we are simply fetching 100,000
> records here, and it's slow enough even with the index scan, so
> planner skips the index in the former case. BUT, if it did use the
> index, it would have been expectedly fast:
>
> ivoras=# set enable_seqscan to off;
> SET
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                           QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>   Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
> rows=10 loops=1)
>     ->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
> rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
>           Index Cond: (id > 900000)
>   Total runtime: 0.175 ms
> (4 rows)
>
> It looks like the problem is in the difference between what the
> planner expects and what the Filter or Index operations deliver:
> (cost=0.00..17402.00 rows=101630 width=9) (actual
> time=142.665..142.672 rows=10 loops=1).
>
>

Hmm - I wonder if the lack or ORDER BY is part of the problem here.
Consider a similar query on pgbench_accounts:

bench=# explain analyze select aid from pgbench_accounts where aid >
100000 limit 20;
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464
rows=20 loops=1)
    ->  Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846
width=4) (actual time=0.005..0.463 rows=20 loops=1)
          Filter: (aid > 100000)
  Total runtime: 0.474 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid >
10000000 limit 20;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018
rows=20 loops=1)
    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017
rows=20 loops=1)
          Index Cond: (aid > 10000000)
  Total runtime: 0.030 ms
(4 rows)


So at some point you get index scans. Now add an ORDER BY:

bench=# explain analyze select aid from pgbench_accounts where aid >
100000 order by aid limit 20;
QUERY PLAN


----------------------------------------------------------------------------------------------------------------------------------------------------------
--
  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012
rows=20 loops=1)
    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011
rows=20 loops=1
)
          Index Cond: (aid > 100000)
  Total runtime: 0.023 ms
(4 rows)

bench=# explain analyze select aid from pgbench_accounts where aid >
10000000 order by aid limit 20;
QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018
rows=20 loops=1)
    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
(cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016
rows=20 loops=1)
          Index Cond: (aid > 10000000)
  Total runtime: 0.029 ms
(4 rows)


...and we have index scans for both cases.

Cheers

Mark

Re: ORDER BY, LIMIT and indexes

From

Claudio Freire

Date:

06 August 2013, 23:04:04

On Tue, Aug 6, 2013 at 7:56 PM, Mark Kirkwood
<mark.kirkwood@catalyst.net.nz> wrote:
> Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider
> a similar query on pgbench_accounts:
>
> bench=# explain analyze select aid from pgbench_accounts where aid > 100000
> limit 20;
> QUERY PLAN
>
-----------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20
> loops=1)
>    ->  Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846
> width=4) (actual time=0.005..0.463 rows=20 loops=1)
>          Filter: (aid > 100000)
>  Total runtime: 0.474 ms
> (4 rows)
>
> bench=# explain analyze select aid from pgbench_accounts where aid >
> 10000000 limit 20;
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20
> loops=1)
>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017
> rows=20 loops=1)
>          Index Cond: (aid > 10000000)
>  Total runtime: 0.030 ms
> (4 rows)
>
>
> So at some point you get index scans. Now add an ORDER BY:
>
> bench=# explain analyze select aid from pgbench_accounts where aid > 100000
> order by aid limit 20;
> QUERY PLAN
>
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
> --
>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20
> loops=1)
>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
> (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011
> rows=20 loops=1
> )
>          Index Cond: (aid > 100000)
>  Total runtime: 0.023 ms
> (4 rows)
>
> bench=# explain analyze select aid from pgbench_accounts where aid >
> 10000000 order by aid limit 20;
> QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20
> loops=1)
>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016
> rows=20 loops=1)
>          Index Cond: (aid > 10000000)
>  Total runtime: 0.029 ms
> (4 rows)
>
>
> ...and we have index scans for both cases.
>
> Cheers
>
> Mark

Yes, but those index scans decisions are driven by the wrong factor.
In the last two cases, the need for rows to be ordered. In the second
case, the estimated number of tuples in the scan.

In both cases, that's not the driving factor for the right decision.
The driving factor *should* be startup cost, which is nonzero because
there is a filter being applied to that sequential scan that filters
many of the initial tuples. With a nonzero startup cost, the cost of
the limited plan would be "startup cost + scan cost * scanned
fraction". When scanned fraction is low enough, startup cost dominates
the equation.

With a min/max index, a cheap query to that index could estimate at
least a lower bound to that initial zero-rows-output cost. With b-tree
indexes, not so much (the b-tree would have to be traversed until the
first filter-passing tuple, which could be a while).

Re: ORDER BY, LIMIT and indexes

From

Claudio Freire

Date:

06 August 2013, 23:10:03

On Tue, Aug 6, 2013 at 8:03 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Tue, Aug 6, 2013 at 7:56 PM, Mark Kirkwood
> <mark.kirkwood@catalyst.net.nz> wrote:
>> Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider
>> a similar query on pgbench_accounts:
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid > 100000
>> limit 20;
>> QUERY PLAN
>>
-----------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20
>> loops=1)
>>    ->  Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846
>> width=4) (actual time=0.005..0.463 rows=20 loops=1)
>>          Filter: (aid > 100000)
>>  Total runtime: 0.474 ms
>> (4 rows)
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid >
>> 10000000 limit 20;
>> QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20
>> loops=1)
>>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
>> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017
>> rows=20 loops=1)
>>          Index Cond: (aid > 10000000)
>>  Total runtime: 0.030 ms
>> (4 rows)
>>
>>
>> So at some point you get index scans. Now add an ORDER BY:
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid > 100000
>> order by aid limit 20;
>> QUERY PLAN
>>
>>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>> --
>>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20
>> loops=1)
>>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
>> (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011
>> rows=20 loops=1
>> )
>>          Index Cond: (aid > 100000)
>>  Total runtime: 0.023 ms
>> (4 rows)
>>
>> bench=# explain analyze select aid from pgbench_accounts where aid >
>> 10000000 order by aid limit 20;
>> QUERY PLAN
>>
----------------------------------------------------------------------------------------------------------------------------------------------------------
>>  Limit  (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20
>> loops=1)
>>    ->  Index Scan using pgbench_accounts_pkey on pgbench_accounts
>> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016
>> rows=20 loops=1)
>>          Index Cond: (aid > 10000000)
>>  Total runtime: 0.029 ms
>> (4 rows)
>>
>>
>> ...and we have index scans for both cases.
>>
>> Cheers
>>
>> Mark
>
> Yes, but those index scans decisions are driven by the wrong factor.
> In the last two cases, the need for rows to be ordered. In the second
> case, the estimated number of tuples in the scan.
>
> In both cases, that's not the driving factor for the right decision.
> The driving factor *should* be startup cost, which is nonzero because
> there is a filter being applied to that sequential scan that filters
> many of the initial tuples. With a nonzero startup cost, the cost of
> the limited plan would be "startup cost + scan cost * scanned
> fraction". When scanned fraction is low enough, startup cost dominates
> the equation.
>
> With a min/max index, a cheap query to that index could estimate at
> least a lower bound to that initial zero-rows-output cost. With b-tree
> indexes, not so much (the b-tree would have to be traversed until the
> first filter-passing tuple, which could be a while).

Ok, silly me. A min/max index would *elliminate* the startup cost.

So, what's a good way to estimate it?

Perhaps value-page correlation could be used, but it would cover a
really narrow case (monotonous sequences).

Alternatively, a token startup cost could be added to those kinds of
filtered sequential scans, when the filtering term is selective
enough. That would offset the cost just a little bit, but enough to
favor index over sequential on the right cases.

Re: ORDER BY, LIMIT and indexes

From

Sergey Konoplev

Date:

06 August 2013, 23:51:14

On Mon, Aug 5, 2013 at 6:54 PM, David Johnston <polobo@yahoo.com> wrote:
> Curious how much slower/faster these queries would run if you added:
>
> SELECT *, first_value(id) OVER (...), last_value(id) OVER (...)
> --note the window specifications need to overcome the "ORDER BY" limitation
> noted in the documentation.

To be honest I can not understand how are you going to specify partition here.

Or you are talking about wrapping the original query like this

SELECT *, first_value(id) OVER (), last_value(id) OVER () FROM (
    SELECT * FROM table
    WHERE id > :current_last_id
    ORDER BY id LIMIT 10
) AS sq2;

?

However, in this case using min()/max() instead of
fist_value()/last_value() will be faster as it does not require to do
additional scan on subquery results.

In general I do not think it would be much slower if we are not
talking about thousands of results on one page.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: ORDER BY, LIMIT and indexes

From

Sergey Konoplev

Date:

07 August 2013, 00:00:48

On Tue, Aug 6, 2013 at 3:46 AM, Ivan Voras <ivoras@freebsd.org> wrote:
> Here are two more unexpected results. Same test table (1 mil. records,
> "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
> before the experiments):
>
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..1.71 rows=10 width=9) (actual
> time=142.669..142.680 rows=10 loops=1)
>    ->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
> (actual time=142.665..142.672 rows=10 loops=1)
>          Filter: (id > 900000)
>  Total runtime: 142.735 ms
> (4 rows)

[skipped]

> ivoras=# set enable_seqscan to off;
> SET
> ivoras=# explain analyze select * from lt where id > 900000 limit 10;
>                                                          QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------------
>  Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
> rows=10 loops=1)
>    ->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
> rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
>          Index Cond: (id > 900000)
>  Total runtime: 0.175 ms
> (4 rows)
>
> It looks like the problem is in the difference between what the
> planner expects and what the Filter or Index operations deliver:
> (cost=0.00..17402.00 rows=101630 width=9) (actual
> time=142.665..142.672 rows=10 loops=1).

This might be caused by not accurate random_page_cost setting. This
parameter gives planner a hint of how much it would cost to perform a
random page read used by index scans. It looks like you need to
decrease random_page_cost.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray.ru@gmail.com

Re: ORDER BY, LIMIT and indexes

From

Tom Lane

Date:

07 August 2013, 04:53:19

Claudio Freire <klaussfreire@gmail.com> writes:
> Alternatively, a token startup cost could be added to those kinds of
> filtered sequential scans, when the filtering term is selective
> enough. That would offset the cost just a little bit, but enough to
> favor index over sequential on the right cases.

Maybe not so "token".  Really, if there's a filter condition having a
selectivity of say 1/N, we should expect to have to skip over O(N) tuples
before finding a match.  (Not sure at this late hour if the expectation is
N, or N/2, or something else ... but anyway it's in that ballpark.)  We
don't take that into account in computing the startup time of a plan node,
but I'm thinking we should.  In this particular example, that would
penalize the seqscan plan and not the indexscan plan, because in the
indexscan case the condition is being applied by the index; it's not an
after-the-fact filter.

            regards, tom lane