Thread: ORDER BY, LIMIT and indexes
Hello, Assuming I have a huge table (doesn't fit in RAM), of which the most important fields are "id" which is a SERIAL PRIMARY KEY and "active" which is a boolean, and I'm issuing a query like: SELECT * FROM table ORDER BY id DESC LIMIT 10 ... is pgsql smart enough to use the index to fetch only the 10 required rows instead of reading the whole table, then sorting it, then trimming the result set? How about in the following queries: SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10 SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10 Or, more generally, is there some set of circumstances under which the catastrophic scenario will happen?
On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@freebsd.org> wrote: > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10 > > SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10 Did you try explain?
On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire <klaussfreire@gmail.com> wrote:
-- On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@freebsd.org> wrote:Did you try explain?
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>
> SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10
And did you run ANALYZE on your table to be sure that you generate correct plans?
Michael
Ivan, > Or, more generally, is there some set of circumstances under which the > catastrophic scenario will happen? Yes: SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 100000 This is the "high offset" problem, and affects all databases which support applications with paginated results, including non-relational ones like SOLR. The basic problem is that you can't figure out what is OFFSET 100000 without first sorting the first 100000 results. The easiest solution is to limit the number of pages your users can "flip through". Generally anyone asking for page 10,000 is a bot screen-scraping your site, anyway. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Aug 5, 2013 at 6:22 PM, Josh Berkus <josh@agliodbs.com> wrote: >> Or, more generally, is there some set of circumstances under which the >> catastrophic scenario will happen? > > Yes: > > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 100000 > > This is the "high offset" problem, and affects all databases which > support applications with paginated results, including non-relational > ones like SOLR. The basic problem is that you can't figure out what is > OFFSET 100000 without first sorting the first 100000 results. > > The easiest solution is to limit the number of pages your users can > "flip through". Generally anyone asking for page 10,000 is a bot > screen-scraping your site, anyway. In addition to Josh's answer I would like to mention that it might be worth to use partial index like this CREATE INDEX i_table_id_active ON table (is) WHERE active in this particular case SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10 so it will prevent from long filtering tons of rows in case of long "NOT active" gaps in the beginning of the scanning sequence. As an alternative solution for pagination (OFFSET) problem you might also use the "prev/next" technique, like SELECT * FROM table WHERE id > :current_last_id ORDER BY id LIMIT 10 for "next", and SELECT * FROM ( SELECT * FROM table WHERE id < :current_first_id ORDER BY id DESC LIMIT 10 ) AS sq ORDER BY id for "prev". It will be very fast. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
Sergey Konoplev-2 wrote > As an alternative solution for pagination (OFFSET) problem you might > also use the "prev/next" technique, like > > SELECT * FROM table > WHERE id > :current_last_id > ORDER BY id LIMIT 10 > > for "next", and > > SELECT * FROM ( > SELECT * FROM table > WHERE id < :current_first_id > ORDER BY id DESC > LIMIT 10 > ) AS sq ORDER BY id > > for "prev". It will be very fast. Even being fairly experienced at SQL generally because I haven't explored pagination that much my awareness of the OFFSET issue led me to conclude bad things. Thank you for thinking to take the time for a brief moment of enlightenment of something you likely take for granted by now. Curious how much slower/faster these queries would run if you added: SELECT *, first_value(id) OVER (...), last_value(id) OVER (...) --note the window specifications need to overcome the "ORDER BY" limitation noted in the documentation. to the query. Using the window functions you know at each record what the first and last ids are for its window. Applicability would be application/need specific but it would avoid having to calculate/maintain these two values in a separate part of the application. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/ORDER-BY-LIMIT-and-indexes-tp5766413p5766429.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com.
On 6 August 2013 02:20, Michael Paquier <michael.paquier@gmail.com> wrote: > > On Tue, Aug 6, 2013 at 8:25 AM, Claudio Freire <klaussfreire@gmail.com> > wrote: >> >> On Mon, Aug 5, 2013 at 8:04 PM, Ivan Voras <ivoras@freebsd.org> wrote: >> > SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10 >> > >> > SELECT * FROM table WHERE active ORDER BY id DESC LIMIT 10 OFFSET 10 >> >> Did you try explain? > > And did you run ANALYZE on your table to be sure that you generate correct > plans? My question was a theoretical one - about general pgsql abilities, not a specific case. But after prodded by you, here's an EXPLAIN ANALYZE for the first case: ivoras=# explain analyze select * from lt order by id desc limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.29 rows=10 width=9) (actual time=0.034..0.053 rows=10 loops=1) -> Index Scan Backward using lt_pkey on lt (cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.031..0.042 rows=10 loops=1) Total runtime: 0.115 ms (3 rows) (This test table has only 1 mil. records.) I'm not sure how to interpret the last line: (cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.031..0.042 rows=10 loops=1) It seems to me like the planner thinks the Index Scan operation will return the entire table, and expects both a huge cost and all of the records, but its actual implementation does return only 10 records. The Limit operation is a NOP in this case. Is this correct? In the second case (with OFFSET): ivoras=# explain analyze select * from lt order by id desc limit 10 offset 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.29..0.57 rows=10 width=9) (actual time=0.060..0.082 rows=10 loops=1) -> Index Scan Backward using lt_pkey on lt (cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.040..0.061 rows=20 loops=1) Total runtime: 0.136 ms (3 rows) It looks like the Index Scan implementation actually returns the first 20 records - which means that for the last "page" of the supposed paginated display, pgsql will actually internally read the entire table as an operation result in memory and discard almost all of it in the Limit operation. Right? And for the third case (with another column in WITH): ivoras=# update lt set active = false where (id % 2) = 0; UPDATE 500000 ivoras=# explain analyze select * from lt where active order by id desc limit 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.32 rows=10 width=9) (actual time=0.135..0.186 rows=10 loops=1) -> Index Scan Backward using lt_pkey on lt (cost=0.00..47380.43 rows=1500000 width=9) (actual time=0.132..0.174 rows=10 loops=1) Filter: active Total runtime: 0.244 ms (4 rows) It looks like filtering is performed in the Index Scan operation - which I never expected. Is this usually done or only for simple queries. In other words, is there a circumstance where it is NOT done in this way, and should I care? Based on the results with OFFSET it looks like it would be a bad idea to use pgsql for allowing the user to browse through the records in a paginated form if the table is huge. I would very much like to be proven wrong :)
Here are two more unexpected results. Same test table (1 mil. records, "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed before the experiments): ivoras=# explain analyze select * from lt where id > 900000 limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.71 rows=10 width=9) (actual time=142.669..142.680 rows=10 loops=1) -> Seq Scan on lt (cost=0.00..17402.00 rows=101630 width=9) (actual time=142.665..142.672 rows=10 loops=1) Filter: (id > 900000) Total runtime: 142.735 ms (4 rows) Note the Seq Scan. ivoras=# explain analyze select * from lt where id > 900000; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on lt (cost=1683.97..7856.35 rows=101630 width=9) (actual time=38.462..85.780 rows=100000 loops=1) Recheck Cond: (id > 900000) -> Bitmap Index Scan on lt_pkey (cost=0.00..1658.56 rows=101630 width=0) (actual time=38.310..38.310 rows=100000 loops=1) Index Cond: (id > 900000) Total runtime: 115.674 ms (5 rows) This somewhat explains the above case - we are simply fetching 100,000 records here, and it's slow enough even with the index scan, so planner skips the index in the former case. BUT, if it did use the index, it would have been expectedly fast: ivoras=# set enable_seqscan to off; SET ivoras=# explain analyze select * from lt where id > 900000 limit 10; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112 rows=10 loops=1) -> Index Scan using lt_pkey on lt (cost=0.00..17644.17 rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1) Index Cond: (id > 900000) Total runtime: 0.175 ms (4 rows) It looks like the problem is in the difference between what the planner expects and what the Filter or Index operations deliver: (cost=0.00..17402.00 rows=101630 width=9) (actual time=142.665..142.672 rows=10 loops=1).
On Tue, Aug 6, 2013 at 7:46 AM, Ivan Voras <ivoras@freebsd.org> wrote: > ivoras=# explain analyze select * from lt where id > 900000 limit 10; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..1.71 rows=10 width=9) (actual > time=142.669..142.680 rows=10 loops=1) > -> Seq Scan on lt (cost=0.00..17402.00 rows=101630 width=9) > (actual time=142.665..142.672 rows=10 loops=1) > Filter: (id > 900000) > Total runtime: 142.735 ms > (4 rows) I think the problem lies in assuming the sequential scan will have 0 startup cost, which is not the case here (it will have to scan up to the first page with an id > 900000). If that were to be fixed, the index scan would be chosen instead. I don't see a way to fix it without querying the index, which could be a costly operation... except with the newly proposed minmax indexes. I guess here's another case where those indexes would be useful.
On Mon, Aug 5, 2013 at 9:22 PM, Josh Berkus <josh@agliodbs.com> wrote:
Ivan,Yes:
> Or, more generally, is there some set of circumstances under which the
> catastrophic scenario will happen?
SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 100000
This is the "high offset" problem, and affects all databases which
support applications with paginated results, including non-relational
ones like SOLR. The basic problem is that you can't figure out what is
OFFSET 100000 without first sorting the first 100000 results.
The easiest solution is to limit the number of pages your users can
"flip through". Generally anyone asking for page 10,000 is a bot
screen-scraping your site, anyway.
Another solution is to build pages from the maximum id you pulled in the last page so page one is:
SELECT * FROM table ORDER BY id DESC LIMIT 10
SELECT * FROM table ORDER BY id DESC LIMIT 10
and page 2 is:
SELECT * FROM table WHERE id > 19 ORDER BY id DESC LIMIT 10
and page 3 is:
SELECT * FROM table WHERE id > 37 ORDER BY id DESC LIMIT 10
SELECT * FROM table WHERE id > 19 ORDER BY id DESC LIMIT 10
and page 3 is:
SELECT * FROM table WHERE id > 37 ORDER BY id DESC LIMIT 10
and so on. You build your urls like this:
http://yousite.com/browse
http://yousite.com/browse?after=19
http://yousite.com/browse?after=37
http://yousite.com/browse
http://yousite.com/browse?after=19
http://yousite.com/browse?after=37
and so on.
On Tue, Aug 6, 2013 at 3:04 AM, Ivan Voras <ivoras@freebsd.org> wrote: > > ivoras=# explain analyze select * from lt order by id desc limit 10; > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..0.29 rows=10 width=9) (actual time=0.034..0.053 > rows=10 loops=1) > -> Index Scan Backward using lt_pkey on lt (cost=0.00..28673.34 > rows=1000000 width=9) (actual time=0.031..0.042 rows=10 loops=1) > Total runtime: 0.115 ms > (3 rows) > > (This test table has only 1 mil. records.) > > I'm not sure how to interpret the last line: > (cost=0.00..28673.34 rows=1000000 width=9) (actual time=0.031..0.042 > rows=10 loops=1) > > It seems to me like the planner thinks the Index Scan operation will > return the entire table, and expects both a huge cost and all of the > records, but its actual implementation does return only 10 records. > The Limit operation is a NOP in this case. Is this correct? The Index Scan line reports the cost estimate of the entire scan, the Limit line chops that cost estimate down based on the limit. > > In the second case (with OFFSET): > ivoras=# explain analyze select * from lt order by id desc limit 10 offset 10; > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.29..0.57 rows=10 width=9) (actual time=0.060..0.082 > rows=10 loops=1) > -> Index Scan Backward using lt_pkey on lt (cost=0.00..28673.34 > rows=1000000 width=9) (actual time=0.040..0.061 rows=20 loops=1) > Total runtime: 0.136 ms > (3 rows) > > It looks like the Index Scan implementation actually returns the first > 20 records - which means that for the last "page" of the supposed > paginated display, pgsql will actually internally read the entire > table as an operation result in memory and discard almost all of it in > the Limit operation. Right? It would not necessarily do it in memory. If it used an index scan (which it probably would not), it would just need to count the rows skipped over, not store them. If it used a sort, it might spill to disk. > > And for the third case (with another column in WITH): > ivoras=# update lt set active = false where (id % 2) = 0; > UPDATE 500000 > ivoras=# explain analyze select * from lt where active order by id > desc limit 10; > QUERY > PLAN > -------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..0.32 rows=10 width=9) (actual time=0.135..0.186 > rows=10 loops=1) > -> Index Scan Backward using lt_pkey on lt (cost=0.00..47380.43 > rows=1500000 width=9) (actual time=0.132..0.174 rows=10 loops=1) > Filter: active > Total runtime: 0.244 ms > (4 rows) > > It looks like filtering is performed in the Index Scan operation - > which I never expected. Is this usually done or only for simple > queries. In other words, is there a circumstance where it is NOT done > in this way, and should I care? I would expect the filter to be done at the first point it can conveniently be done at. You probably shouldn't care. > Based on the results with OFFSET it looks like it would be a bad idea > to use pgsql for allowing the user to browse through the records in a > paginated form if the table is huge. I would very much like to be > proven wrong :) Do you expect to have a lot of users who will hit "next page" 50,000 times and carefully read through each set of results? I think this is a problem I would worry about when it knocked on my door. Unless I was mostly worried about denial of service attacks. Cheers, Jeff
On 06/08/13 22:46, Ivan Voras wrote: > Here are two more unexpected results. Same test table (1 mil. records, > "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed > before the experiments): > > ivoras=# explain analyze select * from lt where id > 900000 limit 10; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..1.71 rows=10 width=9) (actual > time=142.669..142.680 rows=10 loops=1) > -> Seq Scan on lt (cost=0.00..17402.00 rows=101630 width=9) > (actual time=142.665..142.672 rows=10 loops=1) > Filter: (id > 900000) > Total runtime: 142.735 ms > (4 rows) > > Note the Seq Scan. > > ivoras=# explain analyze select * from lt where id > 900000; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------- > Bitmap Heap Scan on lt (cost=1683.97..7856.35 rows=101630 width=9) > (actual time=38.462..85.780 rows=100000 loops=1) > Recheck Cond: (id > 900000) > -> Bitmap Index Scan on lt_pkey (cost=0.00..1658.56 rows=101630 > width=0) (actual time=38.310..38.310 rows=100000 loops=1) > Index Cond: (id > 900000) > Total runtime: 115.674 ms > (5 rows) > > This somewhat explains the above case - we are simply fetching 100,000 > records here, and it's slow enough even with the index scan, so > planner skips the index in the former case. BUT, if it did use the > index, it would have been expectedly fast: > > ivoras=# set enable_seqscan to off; > SET > ivoras=# explain analyze select * from lt where id > 900000 limit 10; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112 > rows=10 loops=1) > -> Index Scan using lt_pkey on lt (cost=0.00..17644.17 > rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1) > Index Cond: (id > 900000) > Total runtime: 0.175 ms > (4 rows) > > It looks like the problem is in the difference between what the > planner expects and what the Filter or Index operations deliver: > (cost=0.00..17402.00 rows=101630 width=9) (actual > time=142.665..142.672 rows=10 loops=1). > > Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider a similar query on pgbench_accounts: bench=# explain analyze select aid from pgbench_accounts where aid > 100000 limit 20; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20 loops=1) -> Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846 width=4) (actual time=0.005..0.463 rows=20 loops=1) Filter: (aid > 100000) Total runtime: 0.474 ms (4 rows) bench=# explain analyze select aid from pgbench_accounts where aid > 10000000 limit 20; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017 rows=20 loops=1) Index Cond: (aid > 10000000) Total runtime: 0.030 ms (4 rows) So at some point you get index scans. Now add an ORDER BY: bench=# explain analyze select aid from pgbench_accounts where aid > 100000 order by aid limit 20; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- -- Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011 rows=20 loops=1 ) Index Cond: (aid > 100000) Total runtime: 0.023 ms (4 rows) bench=# explain analyze select aid from pgbench_accounts where aid > 10000000 order by aid limit 20; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 loops=1) -> Index Scan using pgbench_accounts_pkey on pgbench_accounts (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016 rows=20 loops=1) Index Cond: (aid > 10000000) Total runtime: 0.029 ms (4 rows) ...and we have index scans for both cases. Cheers Mark
On Tue, Aug 6, 2013 at 7:56 PM, Mark Kirkwood <mark.kirkwood@catalyst.net.nz> wrote: > Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider > a similar query on pgbench_accounts: > > bench=# explain analyze select aid from pgbench_accounts where aid > 100000 > limit 20; > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20 > loops=1) > -> Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846 > width=4) (actual time=0.005..0.463 rows=20 loops=1) > Filter: (aid > 100000) > Total runtime: 0.474 ms > (4 rows) > > bench=# explain analyze select aid from pgbench_accounts where aid > > 10000000 limit 20; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 > loops=1) > -> Index Scan using pgbench_accounts_pkey on pgbench_accounts > (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017 > rows=20 loops=1) > Index Cond: (aid > 10000000) > Total runtime: 0.030 ms > (4 rows) > > > So at some point you get index scans. Now add an ORDER BY: > > bench=# explain analyze select aid from pgbench_accounts where aid > 100000 > order by aid limit 20; > QUERY PLAN > > ---------------------------------------------------------------------------------------------------------------------------------------------------------- > -- > Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20 > loops=1) > -> Index Scan using pgbench_accounts_pkey on pgbench_accounts > (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011 > rows=20 loops=1 > ) > Index Cond: (aid > 100000) > Total runtime: 0.023 ms > (4 rows) > > bench=# explain analyze select aid from pgbench_accounts where aid > > 10000000 order by aid limit 20; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 > loops=1) > -> Index Scan using pgbench_accounts_pkey on pgbench_accounts > (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016 > rows=20 loops=1) > Index Cond: (aid > 10000000) > Total runtime: 0.029 ms > (4 rows) > > > ...and we have index scans for both cases. > > Cheers > > Mark Yes, but those index scans decisions are driven by the wrong factor. In the last two cases, the need for rows to be ordered. In the second case, the estimated number of tuples in the scan. In both cases, that's not the driving factor for the right decision. The driving factor *should* be startup cost, which is nonzero because there is a filter being applied to that sequential scan that filters many of the initial tuples. With a nonzero startup cost, the cost of the limited plan would be "startup cost + scan cost * scanned fraction". When scanned fraction is low enough, startup cost dominates the equation. With a min/max index, a cheap query to that index could estimate at least a lower bound to that initial zero-rows-output cost. With b-tree indexes, not so much (the b-tree would have to be traversed until the first filter-passing tuple, which could be a while).
On Tue, Aug 6, 2013 at 8:03 PM, Claudio Freire <klaussfreire@gmail.com> wrote: > On Tue, Aug 6, 2013 at 7:56 PM, Mark Kirkwood > <mark.kirkwood@catalyst.net.nz> wrote: >> Hmm - I wonder if the lack or ORDER BY is part of the problem here. Consider >> a similar query on pgbench_accounts: >> >> bench=# explain analyze select aid from pgbench_accounts where aid > 100000 >> limit 20; >> QUERY PLAN >> ----------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..0.91 rows=20 width=4) (actual time=0.005..0.464 rows=20 >> loops=1) >> -> Seq Scan on pgbench_accounts (cost=0.00..499187.31 rows=10994846 >> width=4) (actual time=0.005..0.463 rows=20 loops=1) >> Filter: (aid > 100000) >> Total runtime: 0.474 ms >> (4 rows) >> >> bench=# explain analyze select aid from pgbench_accounts where aid > >> 10000000 limit 20; >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 >> loops=1) >> -> Index Scan using pgbench_accounts_pkey on pgbench_accounts >> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.017 >> rows=20 loops=1) >> Index Cond: (aid > 10000000) >> Total runtime: 0.030 ms >> (4 rows) >> >> >> So at some point you get index scans. Now add an ORDER BY: >> >> bench=# explain analyze select aid from pgbench_accounts where aid > 100000 >> order by aid limit 20; >> QUERY PLAN >> >> ---------------------------------------------------------------------------------------------------------------------------------------------------------- >> -- >> Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.008..0.012 rows=20 >> loops=1) >> -> Index Scan using pgbench_accounts_pkey on pgbench_accounts >> (cost=0.00..1235355.34 rows=10994846 width=4) (actual time=0.008..0.011 >> rows=20 loops=1 >> ) >> Index Cond: (aid > 100000) >> Total runtime: 0.023 ms >> (4 rows) >> >> bench=# explain analyze select aid from pgbench_accounts where aid > >> 10000000 order by aid limit 20; >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=0.00..2.25 rows=20 width=4) (actual time=0.014..0.018 rows=20 >> loops=1) >> -> Index Scan using pgbench_accounts_pkey on pgbench_accounts >> (cost=0.00..207204.06 rows=1844004 width=4) (actual time=0.014..0.016 >> rows=20 loops=1) >> Index Cond: (aid > 10000000) >> Total runtime: 0.029 ms >> (4 rows) >> >> >> ...and we have index scans for both cases. >> >> Cheers >> >> Mark > > Yes, but those index scans decisions are driven by the wrong factor. > In the last two cases, the need for rows to be ordered. In the second > case, the estimated number of tuples in the scan. > > In both cases, that's not the driving factor for the right decision. > The driving factor *should* be startup cost, which is nonzero because > there is a filter being applied to that sequential scan that filters > many of the initial tuples. With a nonzero startup cost, the cost of > the limited plan would be "startup cost + scan cost * scanned > fraction". When scanned fraction is low enough, startup cost dominates > the equation. > > With a min/max index, a cheap query to that index could estimate at > least a lower bound to that initial zero-rows-output cost. With b-tree > indexes, not so much (the b-tree would have to be traversed until the > first filter-passing tuple, which could be a while). Ok, silly me. A min/max index would *elliminate* the startup cost. So, what's a good way to estimate it? Perhaps value-page correlation could be used, but it would cover a really narrow case (monotonous sequences). Alternatively, a token startup cost could be added to those kinds of filtered sequential scans, when the filtering term is selective enough. That would offset the cost just a little bit, but enough to favor index over sequential on the right cases.
On Mon, Aug 5, 2013 at 6:54 PM, David Johnston <polobo@yahoo.com> wrote: > Curious how much slower/faster these queries would run if you added: > > SELECT *, first_value(id) OVER (...), last_value(id) OVER (...) > --note the window specifications need to overcome the "ORDER BY" limitation > noted in the documentation. To be honest I can not understand how are you going to specify partition here. Or you are talking about wrapping the original query like this SELECT *, first_value(id) OVER (), last_value(id) OVER () FROM ( SELECT * FROM table WHERE id > :current_last_id ORDER BY id LIMIT 10 ) AS sq2; ? However, in this case using min()/max() instead of fist_value()/last_value() will be faster as it does not require to do additional scan on subquery results. In general I do not think it would be much slower if we are not talking about thousands of results on one page. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
On Tue, Aug 6, 2013 at 3:46 AM, Ivan Voras <ivoras@freebsd.org> wrote: > Here are two more unexpected results. Same test table (1 mil. records, > "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed > before the experiments): > > ivoras=# explain analyze select * from lt where id > 900000 limit 10; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..1.71 rows=10 width=9) (actual > time=142.669..142.680 rows=10 loops=1) > -> Seq Scan on lt (cost=0.00..17402.00 rows=101630 width=9) > (actual time=142.665..142.672 rows=10 loops=1) > Filter: (id > 900000) > Total runtime: 142.735 ms > (4 rows) [skipped] > ivoras=# set enable_seqscan to off; > SET > ivoras=# explain analyze select * from lt where id > 900000 limit 10; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112 > rows=10 loops=1) > -> Index Scan using lt_pkey on lt (cost=0.00..17644.17 > rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1) > Index Cond: (id > 900000) > Total runtime: 0.175 ms > (4 rows) > > It looks like the problem is in the difference between what the > planner expects and what the Filter or Index operations deliver: > (cost=0.00..17402.00 rows=101630 width=9) (actual > time=142.665..142.672 rows=10 loops=1). This might be caused by not accurate random_page_cost setting. This parameter gives planner a hint of how much it would cost to perform a random page read used by index scans. It looks like you need to decrease random_page_cost. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA Profile: http://www.linkedin.com/in/grayhemp Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979 Skype: gray-hemp Jabber: gray.ru@gmail.com
Claudio Freire <klaussfreire@gmail.com> writes: > Alternatively, a token startup cost could be added to those kinds of > filtered sequential scans, when the filtering term is selective > enough. That would offset the cost just a little bit, but enough to > favor index over sequential on the right cases. Maybe not so "token". Really, if there's a filter condition having a selectivity of say 1/N, we should expect to have to skip over O(N) tuples before finding a match. (Not sure at this late hour if the expectation is N, or N/2, or something else ... but anyway it's in that ballpark.) We don't take that into account in computing the startup time of a plan node, but I'm thinking we should. In this particular example, that would penalize the seqscan plan and not the indexscan plan, because in the indexscan case the condition is being applied by the index; it's not an after-the-fact filter. regards, tom lane