Thread: index on large table

index on large table

From
Kacper Nowicki
Date:
I have a large table, with >1 million rows, which I would like to see page
by page. It is being displayed in the oid order.

Without any index a cost for such query is huge, since it is sequential
scan of about 300MB of data.

"explain select * from events order by oid limit 10 offset 0"
NOTICE:  QUERY PLAN:
Limit  (cost=69.83..69.83 rows=10 width=130)
   ->  Sort  (cost=69.83..69.83 rows=1000 width=130)
         ->  Seq Scan on events  (cost=0.00..20.00 rows=1000 width=130)

The cost is really higher, but the important fact is, that sequential scan
is required. Do the obvious: create an index:

"create index events_oid on events(oid)"

Now the same query is instantaneous, since index is used:

"explain select * from events order by oid limit 10 offset 0"
NOTICE:  QUERY PLAN:
Limit  (cost=0.00..38.56 rows=10 width=130)
   ->  Index Scan using events_oid on events  (cost=0.00..3953085.63
rows=1025245 width=130)

so, let's see further down the table, the offset is shifted to 1M, we still
want to see just 10 entries.

"explain select * from events order by oid limit 10 offset 1000000"
NOTICE:  QUERY PLAN:
Limit  (cost=424863.54..424863.54 rows=10 width=130)
   ->  Sort  (cost=424863.54..424863.54 rows=1025245 width=130)
         ->  Seq Scan on events  (cost=0.00..35645.45 rows=1025245 width=130)

Bummer. This is very slow again, sequential scan again. Why the index is
not used for this query? Use of index would make it very fast!

Does anybody know what is going on?

Thanks,
Kacper


Re: index on large table

From
Stephan Szabo
Date:
On Tue, 12 Mar 2002, Kacper Nowicki wrote:

> so, let's see further down the table, the offset is shifted to 1M, we still
> want to see just 10 entries.
>
> "explain select * from events order by oid limit 10 offset 1000000"
> NOTICE:  QUERY PLAN:
> Limit  (cost=424863.54..424863.54 rows=10 width=130)
>    ->  Sort  (cost=424863.54..424863.54 rows=1025245 width=130)
>          ->  Seq Scan on events  (cost=0.00..35645.45 rows=1025245 width=130)
>
> Bummer. This is very slow again, sequential scan again. Why the index is
> not used for this query? Use of index would make it very fast!

What gets shown for explain with set enable_seqscan=off?  If you're using
7.2, try explain analyze both ways as well.

The row grabbing with the index would be slower than the sequence scan,
but most of the cost seems to be going into the sort.  Another thing to
try would be raising sort_mem I guess.


Re: index on large table

From
Tom Lane
Date:
>> "explain select * from events order by oid limit 10 offset 1000000"
>> NOTICE:  QUERY PLAN:
>> Limit  (cost=424863.54..424863.54 rows=10 width=130)
>> ->  Sort  (cost=424863.54..424863.54 rows=1025245 width=130)
>> ->  Seq Scan on events  (cost=0.00..35645.45 rows=1025245 width=130)
>>
>> Bummer. This is very slow again, sequential scan again. Why the index is
>> not used for this query? Use of index would make it very fast!

Not necessarily.  Using the index for this would require fetching
1000000+10 values in the indexscan (and throwing away all but 10).

The planner is counting on its fingers and guessing that the sort
is faster.  It might or might not be right about that (have you
compared timings?) but certainly the index method won't be
instantaneous.

            regards, tom lane

Re: index on large table

From
Bruce Momjian
Date:
Tom Lane wrote:
> >> "explain select * from events order by oid limit 10 offset 1000000"
> >> NOTICE:  QUERY PLAN:
> >> Limit  (cost=424863.54..424863.54 rows=10 width=130)
> >> ->  Sort  (cost=424863.54..424863.54 rows=1025245 width=130)
> >> ->  Seq Scan on events  (cost=0.00..35645.45 rows=1025245 width=130)
> >>
> >> Bummer. This is very slow again, sequential scan again. Why the index is
> >> not used for this query? Use of index would make it very fast!
>
> Not necessarily.  Using the index for this would require fetching
> 1000000+10 values in the indexscan (and throwing away all but 10).
>
> The planner is counting on its fingers and guessing that the sort
> is faster.  It might or might not be right about that (have you
> compared timings?) but certainly the index method won't be
> instantaneous.

This question is being asked a lot. I hope my new FAQ item 4.8 wording
helps, but it will take time for people to read the new version.  I will
add it to 7.2.X CVS.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026