Thread: quickly getting the top N rows
If I have this: create table foo (bar int primary key); ...then in my ideal world, Postgres would be able to use that index on bar to help me with this: select bar from foo order by bar desc limit 20; But in my experience, PG8.2 is doing a full table scan on foo, then sorting it, then doing the limit. I have a more complex primary key, but I was hoping the same concept would still apply. Am I doing something wrong, or just expecting something that doesn't exist?
On Thu, 2007-10-04 at 11:00 -0700, Ben wrote: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? It has to do with the way that NULL values are stored in the index. This page has details and instructions for how to get it to work: http://developer.postgresql.org/pgdocs/postgres/indexes-ordering.html -- Mark Lewis
In response to Ben <bench@silentmedia.com>: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? Show us the explain. However, 2 guesses: 1) You never analyzed the table, thus PG has awful statistics and doesn't know how to pick a good plan. 2) You have so few rows in the table that a seq scan is actually faster than an index scan, which is why PG uses it instead. -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ wmoran@collaborativefusion.com Phone: 412-422-3463x4023
Ben <bench@silentmedia.com> schrieb: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then sorting > it, then doing the limit. I have a more complex primary key, but I was Please show us the output from EXPLAIN ANALYSE select bar from foo order by bar desc limit 20; I try it, with 8.1, and i can see an index scan. You have, maybe, wrong statistics ot not enough (to few) rows in your table. Andreas -- Really, I'm not out to destroy Microsoft. That will just be a completely unintentional side effect. (Linus Torvalds) "If I was god, I would recompile penguin with --enable-fly." (unknow) Kaufbach, Saxony, Germany, Europe. N 51.05082°, E 13.56889°
On Thu, 4 Oct 2007, Bill Moran wrote: > However, 2 guesses: > 1) You never analyzed the table, thus PG has awful statistics and > doesn't know how to pick a good plan. > 2) You have so few rows in the table that a seq scan is actually > faster than an index scan, which is why PG uses it instead. No, the tables are recently analyzed and there are a couple hundred thousand rows in there. But I think I just figured it out.... it's a 3-column index, and two columns of that index are the same for every row. When I drop those two columns from the ordering restriction, the index gets used and things speed up 5 orders of magnitude. Maybe the planner is smart enough to think that if a column in the order by clause is identical for most rows, then using an index won't help.... but not smart enough to realize that if said column is at the *end* of the order by arguments, after columns which do sort quite well, then it should use an index after all.
Ben <bench@silentmedia.com> writes: > No, the tables are recently analyzed and there are a couple hundred > thousand rows in there. But I think I just figured it out.... it's a > 3-column index, and two columns of that index are the same for every row. > When I drop those two columns from the ordering restriction, the index > gets used and things speed up 5 orders of magnitude. > Maybe the planner is smart enough to think that if a column in the order > by clause is identical for most rows, then using an index won't help.... > but not smart enough to realize that if said column is at the *end* of the > order by arguments, after columns which do sort quite well, then it should > use an index after all. You're being about as clear as mud here, except that you obviously lied about what you were doing in your first message. If you have a planner problem, show us the *exact* query, the *exact* table definition, and unfaked EXPLAIN ANALYZE output. regards, tom lane
On Thu, 4 Oct 2007, Tom Lane wrote: > You're being about as clear as mud here, except that you obviously lied > about what you were doing in your first message. If you have a planner > problem, show us the *exact* query, the *exact* table definition, and > unfaked EXPLAIN ANALYZE output. I didn't realize that simplification was viewed as so sinister, but thanks, I'll remember that in the future. The table: Table "public.log" Column | Type | Modifiers ----------------+-----------------------------+--------------------- clientkey | character(30) | not null premiseskey | character(30) | not null logkey | character(30) | not null logicaldel | character(1) | default 'N'::bpchar lockey | character(30) | devlockey | character(30) | eventkey | character(30) | logshorttext | character varying(255) | logdesc | character varying(255) | loguserkey | character(30) | logassetkey | character(30) | logresourcekey | character(30) | logtime | timestamp without time zone | logip | character varying(50) | logarchived | character(1) | logarchivedate | timestamp without time zone | loghasvideo | character(1) | loghasaudio | character(1) | resvehiclekey | character(30) | synccreated | character(1) | logtypekey | character(30) | Indexes: "log_pkey" PRIMARY KEY, btree (clientkey, premiseskey, logkey) "eventkey_idx" btree (eventkey), "log_ak1" btree (clientkey, premiseskey, logtime, logkey) The original, slow query: explain analyze SELECT * FROM log WHERE clientkey in ('000000004500000000010000000001') AND premiseskey in ('000000004500000000010000000001') and logicaldel = 'N' ORDER BY logtime desc, logkey desc, clientkey desc, premiseskey desc LIMIT 20 offset 0; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=356402.58..356402.63 rows=20 width=563) (actual time=215858.481..215858.527 rows=20 loops=1) -> Sort (cost=356402.58..357598.25 rows=478267 width=563) (actual time=215858.478..215858.498 rows=20 loops=1) Sort Key: logtime, logkey, clientkey, premiseskey -> Seq Scan on log (cost=0.00..52061.67 rows=478267 width=563) (actual time=29.340..100043.313 rows=475669 loops=1) Filter: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar)AND (logicaldel = 'N'::bpchar)) Total runtime: 262462.582 ms (6 rows) Every row in log has identical clientkey and premiseskey values, so if I just remove those columns from the order by clause, I get this far superior plan: explain analyze SELECT * FROM log WHERE clientkey in ('000000004500000000010000000001') AND premiseskey in ('000000004500000000010000000001') and logicaldel = 'N' ORDER BY logtime desc, logkey desc LIMIT 20 offset 0; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..12.33 rows=20 width=563) (actual time=0.047..0.105 rows=20 loops=1) -> Index Scan Backward using log_ak1 on log (cost=0.00..294735.70 rows=478267 width=563) (actual time=0.044..0.076rows=20 loops=1) Index Cond: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar)) Filter: (logicaldel = 'N'::bpchar) Total runtime: 0.165 ms (5 rows) ...which made me to think that maybe postgres is not using log_ak1 in the former case because two of the columns in the order by match every row. Unfortunately, in this case it's not an option to alter the query. I'm just trying to figure out an explaination.
On Thu, 2007-10-04 at 12:52 -0700, Ben wrote: > The original, slow query: > > explain analyze SELECT * FROM log WHERE clientkey in > ('000000004500000000010000000001') AND premiseskey in > ('000000004500000000010000000001') and logicaldel = 'N' > ORDER BY logtime desc, logkey desc, clientkey desc, premiseskey desc LIMIT 20 offset 0; > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=356402.58..356402.63 rows=20 width=563) (actual time=215858.481..215858.527 rows=20 loops=1) > -> Sort (cost=356402.58..357598.25 rows=478267 width=563) (actual time=215858.478..215858.498 rows=20 loops=1) > Sort Key: logtime, logkey, clientkey, premiseskey > -> Seq Scan on log (cost=0.00..52061.67 rows=478267 width=563) (actual time=29.340..100043.313 rows=475669loops=1) > Filter: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar)AND (logicaldel = 'N'::bpchar)) > Total runtime: 262462.582 ms > (6 rows) > > > Every row in log has identical clientkey and premiseskey values, so if I > just remove those columns from the order by clause, I get this far > superior plan: > > explain analyze SELECT * FROM log WHERE clientkey in > ('000000004500000000010000000001') AND premiseskey in > ('000000004500000000010000000001') and logicaldel = 'N' > ORDER BY logtime desc, logkey desc LIMIT 20 offset 0; > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=0.00..12.33 rows=20 width=563) (actual time=0.047..0.105 rows=20 loops=1) > -> Index Scan Backward using log_ak1 on log (cost=0.00..294735.70 rows=478267 width=563) (actual time=0.044..0.076rows=20 loops=1) > Index Cond: ((clientkey = '000000004500000000010000000001'::bpchar) AND (premiseskey = '000000004500000000010000000001'::bpchar)) > Filter: (logicaldel = 'N'::bpchar) > Total runtime: 0.165 ms > (5 rows) > > > ...which made me to think that maybe postgres is not using log_ak1 in the > former case because two of the columns in the order by match every row. > > Unfortunately, in this case it's not an option to alter the query. I'm > just trying to figure out an explaination. In the first query, Postgres cannot use the index because the sort order of the index does not match the sort order of the query. When you change the sort order of the query so that it matches that of the index, then the index is used. If you define your index on (logtime, logkey, clientkey, premiseskey) rather than on (clientkey, premiseskey, logtime, logkey) you will have a fast query. Yes, the column order matters. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
On 10/4/07, Ben <bench@silentmedia.com> wrote: > If I have this: > > create table foo (bar int primary key); > > ...then in my ideal world, Postgres would be able to use that index on bar > to help me with this: > > select bar from foo order by bar desc limit 20; > > But in my experience, PG8.2 is doing a full table scan on foo, then > sorting it, then doing the limit. I have a more complex primary key, but I > was hoping the same concept would still apply. Am I doing something wrong, > or just expecting something that doesn't exist? pg uses an intelligent planner. It looks at the table, the number of rows, the distribution of values, and makes a decision whether to use seq scan or index. Do you have any evidence that in your case seq scan is a bad choice? try this experiment: psql mydb =# select * from foo; -- this will prime the db and put the table in memory if it will fit =# \timing =# set enable_seqscan=off; =# select bar from foo order by bar desc limit 20; =# set enable_seqscan=on; =# select bar from foo order by bar desc limit 20; and compare the times each takes. run each way a few times to be sure you're not getting random variance. On my reporting db with somewhere around 75 million tables, a similar query 0.894 mS and uses an index scan. Which is good, because a sequential scan on that table takes about 15 to 30 minutes.
On Thu, 4 Oct 2007, Simon Riggs wrote: > In the first query, Postgres cannot use the index because the sort order > of the index does not match the sort order of the query. When you change > the sort order of the query so that it matches that of the index, then > the index is used. > > If you define your index on (logtime, logkey, clientkey, premiseskey) > rather than on (clientkey, premiseskey, logtime, logkey) you will have a > fast query. Yes, the column order matters. I thought that might explain it, but then I'm surprised that it can still use an index when the first two columns of the index aren't in the query. Wouldn't that mean that it might have to walk the entire index to find matching rows? ....unless it's smart enough to realize that the first two columns will match everything. Which would be cool.
On 10/4/07, Ben <bench@silentmedia.com> wrote: > On Thu, 4 Oct 2007, Tom Lane wrote: > > > You're being about as clear as mud here, except that you obviously lied > > about what you were doing in your first message. If you have a planner > > problem, show us the *exact* query, the *exact* table definition, and > > unfaked EXPLAIN ANALYZE output. > > I didn't realize that simplification was viewed as so sinister, but > thanks, I'll remember that in the future. It's not sinister, it's counterproductive and wastes people's time.
Scott Marlowe wrote: > On 10/4/07, Ben <bench@silentmedia.com> wrote: >> On Thu, 4 Oct 2007, Tom Lane wrote: >> >>> You're being about as clear as mud here, except that you obviously lied >>> about what you were doing in your first message. If you have a planner >>> problem, show us the *exact* query, the *exact* table definition, and >>> unfaked EXPLAIN ANALYZE output. >> I didn't realize that simplification was viewed as so sinister, but >> thanks, I'll remember that in the future. > > It's not sinister, it's counterproductive and wastes people's time. Unless "Ben" is an alias for someone high up with an unnamed rival database, seeking to distract us all... How do you find out the IP address of a yacht? ;-) -- Richard Huxton Archonet Ltd
Ben <bench@silentmedia.com> writes: > On Thu, 4 Oct 2007, Simon Riggs wrote: > I thought that might explain it, but then I'm surprised that it can still > use an index when the first two columns of the index aren't in the query. > Wouldn't that mean that it might have to walk the entire index to find > matching rows? > ....unless it's smart enough to realize that the first two columns will > match everything. Which would be cool. There's some limited smarts in there about deciding that leading columns of an index don't matter to the sort ordering if they're constrained to just one value by the query. But it doesn't catch the case you need, which is that columns of an ORDER BY request are no-ops when they're constrained to just one value. That whole area has been rewritten for 8.3 and I believe it will handle this case. No time to try it right now though. regards, tom lane
On Thu, 4 Oct 2007, Tom Lane wrote: > There's some limited smarts in there about deciding that leading columns > of an index don't matter to the sort ordering if they're constrained to > just one value by the query. But it doesn't catch the case you need, > which is that columns of an ORDER BY request are no-ops when they're > constrained to just one value. Oh, no, that explains it perfectly, because that's precisely the case I have - I dropped the columns from the ordering, but not the where clause. Thanks, now I understand the current behavior.