Re: SeqScan costs - Mailing list pgsql-hackers
From | Decibel! |
---|---|
Subject | Re: SeqScan costs |
Date | |
Msg-id | EB312F7D-78CC-4034-9871-AC0A419238D5@decibel.org Whole thread Raw |
In response to | Re: SeqScan costs (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
List | pgsql-hackers |
On Aug 12, 2008, at 4:52 PM, Andrew Gierth wrote: >>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > >>> Proposal: Make the first block of a seq scan cost >>> random_page_cost, then after that every additional block costs >>> seq_page_cost. > > ?Tom> This is only going to matter for a table of 1 block (or at least > ?Tom> very few blocks), and for such a table it's highly likely that > ?Tom> it's in RAM anyway.? So I'm unconvinced that the proposed change > ?Tom> represents a better model of reality. > > Simple example which demonstrates a 10x speed improvement for index > scan over seqscan for a 1-block table (on 8.3.3): > > create table oneblock (id integer primary key, value text not null);? > insert into oneblock select i, 'row ' || i from generate_series > (1,200) i; > > test=> select pg_relation_size('oneblock'); > ?pg_relation_size? > ------------------ > ?? ? ? ? ? ? 8192 > > analyze oneblock; > > set enable_seqscan=true; > > select (select value from oneblock where id = i) > ? from generate_series(1,200) i, generate_series(1,5000) j; > Time: 25596.709 ms? (that's 25.6 us per row) > > set enable_seqscan=false; > > select (select value from oneblock where id = i) > ? from generate_series(1,200) i, generate_series(1,5000) j; > Time: 2415.691 ms ? (that's 2.4 us per row) Roughly what I get on my MBP (I'm about a factor of 2 slower). This makes me think it's an issue of having to slog through an entire page one row at a time vs just using the index. To test this I tested selecting i=200 (remember we start filling data at the back of the page, so 200 would actually be the front, and I'm assuming the first value that would be hit) vs i=1. With seqscans, I saw about a 10% difference. With index scans the difference was moot, but also note that now index scans are in-between seqscans in performance. decibel@platter.local=# explain analyze select (select value from oneblock where id = 200) from generate_series(1,200) i, generate_series(1,500000) j; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Nested Loop (cost=17.00..20029.50 rows=1000000 width=0)(actual time=270.867..65821.373 rows=100000000 loops=1) InitPlan -> Seq Scan on oneblock (cost=0.00..3.50 rows=1 width=7) (actual time=0.052..0.053 rows=1 loops=1) Filter: (id = 200) -> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.062..0.351 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.368..164.634 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=270.743..459.335 rows=500000 loops=1) Total runtime: 79055.822 ms (8 rows) decibel@platter.local=# explain analyze select (select value from oneblock where id = 1) from generate_series(1,200) i, generate_series(1,500000) j; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Nested Loop (cost=17.00..20029.50 rows=1000000 width=0)(actual time=261.941..72937.226 rows=100000000 loops=1) InitPlan -> Seq Scan on oneblock (cost=0.00..3.50 rows=1 width=7) (actual time=0.025..0.056 rows=1 loops=1) Filter: (id = 1) -> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.060..0.346 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.375..182.474 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=261.815..448.652 rows=500000 loops=1) Total runtime: 87702.315 ms (8 rows) decibel@platter.local=# set enable_seqscan =off; SET decibel@platter.local=# explain analyze select (select value from oneblock where id = 200) from generate_series(1,200) i, generate_series(1,500000) j; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Nested Loop (cost=21.77..20034.27 rows=1000000 width=0)(actual time=262.219..69851.786 rows=100000000 loops=1) InitPlan -> Index Scan using oneblock_pkey on oneblock (cost=0.00..8.27 rows=1 width=7) (actual time=0.024..0.026 rows=1 loops=1) Index Cond: (id = 200) -> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.062..0.355 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.325..174.314 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=262.119..449.383 rows=500000 loops=1) Total runtime: 83024.952 ms (8 rows) decibel@platter.local=# explain analyze select (select value from oneblock where id = 1) from generate_series(1,200) i, generate_series(1,500000) j; QUERY PLAN ------------------------------------------------------------------------ ----------------------------------------------------------------- Nested Loop (cost=21.77..20034.27 rows=1000000 width=0)(actual time=262.175..68943.985 rows=100000000 loops=1) InitPlan -> Index Scan using oneblock_pkey on oneblock (cost=0.00..8.27 rows=1 width=7) (actual time=0.023..0.025 rows=1 loops=1) Index Cond: (id = 1) -> Function Scan on generate_series i (cost=0.00..12.50 rows=1000 width=0) (actual time=0.062..0.339 rows=200 loops=1) -> Materialize (cost=13.50..23.50 rows=1000 width=0) (actual time=1.325..176.056 rows=500000 loops=200) -> Function Scan on generate_series j (cost=0.00..12.50 rows=1000 width=0) (actual time=262.079..454.692 rows=500000 loops=1) Total runtime: 82598.556 ms (8 rows) decibel@platter.local=# -- Decibel!, aka Jim C. Nasby, Database Architect decibel@decibel.org Give your computer some brain candy! www.distributed.net Team #1828
pgsql-hackers by date: