Thread: index speed and failed expectations?

index speed and failed expectations?

From
rihad
Date:
sol=> \d stats;
                  Table "public.stats"
     Column    |              Type              | Modifiers
--------------+--------------------------------+-----------
  id           | integer                        | not null
  start_time   | timestamp(0) without time zone | not null
...
Indexes:
     "stats_start_time_idx" btree (start_time)
     "stats_id_key" btree (id)

There are roughly half a million rows.

This query from the console:

select * from stats order by start_time;

takes 8 seconds before starting its output. Am I wrong in assuming that
the index on start_time should make ORDER BY orders of magnitude faster?
Or is this already fast enough? Or should I max up some memory (buffer)
setting to achieve greater speeds? Not that the speed is crucial, just
curious.

TIA.

Re: index speed and failed expectations?

From
"Adam Rich"
Date:
> This query from the console:
>
> select * from stats order by start_time;
>
> takes 8 seconds before starting its output. Am I wrong in assuming that
> the index on start_time should make ORDER BY orders of magnitude
> faster?
> Or is this already fast enough? Or should I max up some memory (buffer)
> setting to achieve greater speeds? Not that the speed is crucial, just
> curious.
>

Postgresql won't use the index for queries like this.  Due to the
MVCC implementation, the index does not contain all necessary information
and would therefore be slower than using the table data alone.

(What postgresql lacks is a first_row/all_rows hint like oracle)

However, if you limit the number of rows enough, you might force it
to use an index:

select * from stats order by start_time limit 1000;




Re: index speed and failed expectations?

From
rihad
Date:
Adam Rich wrote:
>> This query from the console:
>>
>> select * from stats order by start_time;
>>
>> takes 8 seconds before starting its output. Am I wrong in assuming that
>> the index on start_time should make ORDER BY orders of magnitude
>> faster?
>> Or is this already fast enough? Or should I max up some memory (buffer)
>> setting to achieve greater speeds? Not that the speed is crucial, just
>> curious.
>>
>
> Postgresql won't use the index for queries like this.  Due to the
> MVCC implementation, the index does not contain all necessary information
> and would therefore be slower than using the table data alone.
>
> (What postgresql lacks is a first_row/all_rows hint like oracle)
>
> However, if you limit the number of rows enough, you might force it
> to use an index:
>
> select * from stats order by start_time limit 1000;
>

Thanks! Since LIMIT/OFFSET is the typical usage pattern for a paginated
data set accessed from the Web (which is my case), it immediately
becomes a non-issue.

Re: index speed and failed expectations?

From
Tom Lane
Date:
"Adam Rich" <adam.r@sbcglobal.net> writes:
>> This query from the console:
>> select * from stats order by start_time;
>> takes 8 seconds before starting its output. Am I wrong in assuming that
>> the index on start_time should make ORDER BY orders of magnitude
>> faster?

> Postgresql won't use the index for queries like this.

"won't" -> "might not".  It all depends on the relative cost estimates
for indexscan vs seqscan + sort.  For a large table it's quite likely
that the latter will be cheaper, because it has a better-localized
access pattern.

> (What postgresql lacks is a first_row/all_rows hint like oracle)

That's spelled "LIMIT" ;-).  Also, you can bias the choice in favor
of a fast-start plan if you use a cursor rather than a plain SELECT.
In that case the planner makes some allowance for the idea that
you might not want all the rows, or might be more interested in
getting the first ones quickly than minimizing the total time to
fetch all the rows.

            regards, tom lane

Re: index speed and failed expectations?

From
Glyn Astill
Date:
> >
> > However, if you limit the number of rows enough, you
> might force it
> > to use an index:
> >
> > select * from stats order by start_time limit 1000;
> >
>
> Thanks! Since LIMIT/OFFSET is the typical usage pattern for
> a paginated
> data set accessed from the Web (which is my case), it
> immediately
> becomes a non-issue.
>

We do a lot of queries with order by limit n, and from my experience setting enable_sort to off on the database also
makesa massive difference. 

http://www.postgresql.org/docs/8.3/static/indexes-ordering.html


      __________________________________________________________
Not happy with your email address?.
Get the one you really want - millions of new email addresses available now at Yahoo!
http://uk.docs.yahoo.com/ymail/new.html

Re: index speed and failed expectations?

From
Michael Fuhr
Date:
On Mon, Aug 04, 2008 at 08:35:28AM -0500, Adam Rich wrote:
> > This query from the console:
> >
> > select * from stats order by start_time;
> >
> > takes 8 seconds before starting its output. Am I wrong in assuming that
> > the index on start_time should make ORDER BY orders of magnitude
> > faster?
>
> Postgresql won't use the index for queries like this.  Due to the
> MVCC implementation, the index does not contain all necessary information
> and would therefore be slower than using the table data alone.

Not necessarily true.  Despite the index not having enough information,
the planner might still decide that using the index would be faster
than executing a sort.

create table stats (
  id          serial primary key,
  start_time  timestamp with time zone not null
);

insert into stats (start_time)
select now() - random() * '1 year'::interval
  from generate_series(1, 100000);

create index stats_start_time_idx on stats (start_time);

analyze stats;

explain analyze select * from stats order by start_time;
                                                                 QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using stats_start_time_idx on stats  (cost=0.00..4767.83 rows=100000 width=12) (actual time=0.146..994.674
rows=100000loops=1) 
 Total runtime: 1419.943 ms
(2 rows)

set enable_indexscan to off;
explain analyze select * from stats order by start_time;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Sort  (cost=9845.82..10095.82 rows=100000 width=12) (actual time=3240.976..3800.038 rows=100000 loops=1)
   Sort Key: start_time
   ->  Seq Scan on stats  (cost=0.00..1541.00 rows=100000 width=12) (actual time=0.091..500.853 rows=100000 loops=1)
 Total runtime: 4226.870 ms
(4 rows)

--
Michael Fuhr