Thread: index speed and failed expectations?
sol=> \d stats; Table "public.stats" Column | Type | Modifiers --------------+--------------------------------+----------- id | integer | not null start_time | timestamp(0) without time zone | not null ... Indexes: "stats_start_time_idx" btree (start_time) "stats_id_key" btree (id) There are roughly half a million rows. This query from the console: select * from stats order by start_time; takes 8 seconds before starting its output. Am I wrong in assuming that the index on start_time should make ORDER BY orders of magnitude faster? Or is this already fast enough? Or should I max up some memory (buffer) setting to achieve greater speeds? Not that the speed is crucial, just curious. TIA.
> This query from the console: > > select * from stats order by start_time; > > takes 8 seconds before starting its output. Am I wrong in assuming that > the index on start_time should make ORDER BY orders of magnitude > faster? > Or is this already fast enough? Or should I max up some memory (buffer) > setting to achieve greater speeds? Not that the speed is crucial, just > curious. > Postgresql won't use the index for queries like this. Due to the MVCC implementation, the index does not contain all necessary information and would therefore be slower than using the table data alone. (What postgresql lacks is a first_row/all_rows hint like oracle) However, if you limit the number of rows enough, you might force it to use an index: select * from stats order by start_time limit 1000;
Adam Rich wrote: >> This query from the console: >> >> select * from stats order by start_time; >> >> takes 8 seconds before starting its output. Am I wrong in assuming that >> the index on start_time should make ORDER BY orders of magnitude >> faster? >> Or is this already fast enough? Or should I max up some memory (buffer) >> setting to achieve greater speeds? Not that the speed is crucial, just >> curious. >> > > Postgresql won't use the index for queries like this. Due to the > MVCC implementation, the index does not contain all necessary information > and would therefore be slower than using the table data alone. > > (What postgresql lacks is a first_row/all_rows hint like oracle) > > However, if you limit the number of rows enough, you might force it > to use an index: > > select * from stats order by start_time limit 1000; > Thanks! Since LIMIT/OFFSET is the typical usage pattern for a paginated data set accessed from the Web (which is my case), it immediately becomes a non-issue.
"Adam Rich" <adam.r@sbcglobal.net> writes: >> This query from the console: >> select * from stats order by start_time; >> takes 8 seconds before starting its output. Am I wrong in assuming that >> the index on start_time should make ORDER BY orders of magnitude >> faster? > Postgresql won't use the index for queries like this. "won't" -> "might not". It all depends on the relative cost estimates for indexscan vs seqscan + sort. For a large table it's quite likely that the latter will be cheaper, because it has a better-localized access pattern. > (What postgresql lacks is a first_row/all_rows hint like oracle) That's spelled "LIMIT" ;-). Also, you can bias the choice in favor of a fast-start plan if you use a cursor rather than a plain SELECT. In that case the planner makes some allowance for the idea that you might not want all the rows, or might be more interested in getting the first ones quickly than minimizing the total time to fetch all the rows. regards, tom lane
> > > > However, if you limit the number of rows enough, you > might force it > > to use an index: > > > > select * from stats order by start_time limit 1000; > > > > Thanks! Since LIMIT/OFFSET is the typical usage pattern for > a paginated > data set accessed from the Web (which is my case), it > immediately > becomes a non-issue. > We do a lot of queries with order by limit n, and from my experience setting enable_sort to off on the database also makesa massive difference. http://www.postgresql.org/docs/8.3/static/indexes-ordering.html __________________________________________________________ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html
On Mon, Aug 04, 2008 at 08:35:28AM -0500, Adam Rich wrote: > > This query from the console: > > > > select * from stats order by start_time; > > > > takes 8 seconds before starting its output. Am I wrong in assuming that > > the index on start_time should make ORDER BY orders of magnitude > > faster? > > Postgresql won't use the index for queries like this. Due to the > MVCC implementation, the index does not contain all necessary information > and would therefore be slower than using the table data alone. Not necessarily true. Despite the index not having enough information, the planner might still decide that using the index would be faster than executing a sort. create table stats ( id serial primary key, start_time timestamp with time zone not null ); insert into stats (start_time) select now() - random() * '1 year'::interval from generate_series(1, 100000); create index stats_start_time_idx on stats (start_time); analyze stats; explain analyze select * from stats order by start_time; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using stats_start_time_idx on stats (cost=0.00..4767.83 rows=100000 width=12) (actual time=0.146..994.674 rows=100000loops=1) Total runtime: 1419.943 ms (2 rows) set enable_indexscan to off; explain analyze select * from stats order by start_time; QUERY PLAN --------------------------------------------------------------------------------------------------------------------- Sort (cost=9845.82..10095.82 rows=100000 width=12) (actual time=3240.976..3800.038 rows=100000 loops=1) Sort Key: start_time -> Seq Scan on stats (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.091..500.853 rows=100000 loops=1) Total runtime: 4226.870 ms (4 rows) -- Michael Fuhr