Thread: index on large table
I have a large table, with >1 million rows, which I would like to see page by page. It is being displayed in the oid order. Without any index a cost for such query is huge, since it is sequential scan of about 300MB of data. "explain select * from events order by oid limit 10 offset 0" NOTICE: QUERY PLAN: Limit (cost=69.83..69.83 rows=10 width=130) -> Sort (cost=69.83..69.83 rows=1000 width=130) -> Seq Scan on events (cost=0.00..20.00 rows=1000 width=130) The cost is really higher, but the important fact is, that sequential scan is required. Do the obvious: create an index: "create index events_oid on events(oid)" Now the same query is instantaneous, since index is used: "explain select * from events order by oid limit 10 offset 0" NOTICE: QUERY PLAN: Limit (cost=0.00..38.56 rows=10 width=130) -> Index Scan using events_oid on events (cost=0.00..3953085.63 rows=1025245 width=130) so, let's see further down the table, the offset is shifted to 1M, we still want to see just 10 entries. "explain select * from events order by oid limit 10 offset 1000000" NOTICE: QUERY PLAN: Limit (cost=424863.54..424863.54 rows=10 width=130) -> Sort (cost=424863.54..424863.54 rows=1025245 width=130) -> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130) Bummer. This is very slow again, sequential scan again. Why the index is not used for this query? Use of index would make it very fast! Does anybody know what is going on? Thanks, Kacper
On Tue, 12 Mar 2002, Kacper Nowicki wrote: > so, let's see further down the table, the offset is shifted to 1M, we still > want to see just 10 entries. > > "explain select * from events order by oid limit 10 offset 1000000" > NOTICE: QUERY PLAN: > Limit (cost=424863.54..424863.54 rows=10 width=130) > -> Sort (cost=424863.54..424863.54 rows=1025245 width=130) > -> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130) > > Bummer. This is very slow again, sequential scan again. Why the index is > not used for this query? Use of index would make it very fast! What gets shown for explain with set enable_seqscan=off? If you're using 7.2, try explain analyze both ways as well. The row grabbing with the index would be slower than the sequence scan, but most of the cost seems to be going into the sort. Another thing to try would be raising sort_mem I guess.
>> "explain select * from events order by oid limit 10 offset 1000000" >> NOTICE: QUERY PLAN: >> Limit (cost=424863.54..424863.54 rows=10 width=130) >> -> Sort (cost=424863.54..424863.54 rows=1025245 width=130) >> -> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130) >> >> Bummer. This is very slow again, sequential scan again. Why the index is >> not used for this query? Use of index would make it very fast! Not necessarily. Using the index for this would require fetching 1000000+10 values in the indexscan (and throwing away all but 10). The planner is counting on its fingers and guessing that the sort is faster. It might or might not be right about that (have you compared timings?) but certainly the index method won't be instantaneous. regards, tom lane
Tom Lane wrote: > >> "explain select * from events order by oid limit 10 offset 1000000" > >> NOTICE: QUERY PLAN: > >> Limit (cost=424863.54..424863.54 rows=10 width=130) > >> -> Sort (cost=424863.54..424863.54 rows=1025245 width=130) > >> -> Seq Scan on events (cost=0.00..35645.45 rows=1025245 width=130) > >> > >> Bummer. This is very slow again, sequential scan again. Why the index is > >> not used for this query? Use of index would make it very fast! > > Not necessarily. Using the index for this would require fetching > 1000000+10 values in the indexscan (and throwing away all but 10). > > The planner is counting on its fingers and guessing that the sort > is faster. It might or might not be right about that (have you > compared timings?) but certainly the index method won't be > instantaneous. This question is being asked a lot. I hope my new FAQ item 4.8 wording helps, but it will take time for people to read the new version. I will add it to 7.2.X CVS. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026