Re: SeqScan costs - Mailing list pgsql-hackers

From Decibel!
Subject Re: SeqScan costs
Date
Msg-id EB312F7D-78CC-4034-9871-AC0A419238D5@decibel.org
Whole thread Raw
In response to Re: SeqScan costs  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
List pgsql-hackers
On Aug 12, 2008, at 4:52 PM, Andrew Gierth wrote:
>>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>
>>> Proposal: Make the first block of a seq scan cost
>>> random_page_cost, then after that every additional block costs
>>> seq_page_cost.
>
> ?Tom> This is only going to matter for a table of 1 block (or at least
> ?Tom> very few blocks), and for such a table it's highly likely that
> ?Tom> it's in RAM anyway.? So I'm unconvinced that the proposed change
> ?Tom> represents a better model of reality.
>
> Simple example which demonstrates a 10x speed improvement for index
> scan over seqscan for a 1-block table (on 8.3.3):
>
> create table oneblock (id integer primary key, value text not null);?
> insert into oneblock select i, 'row ' || i from generate_series 
> (1,200) i;
>
> test=> select pg_relation_size('oneblock');
> ?pg_relation_size?
> ------------------
> ?? ? ? ? ? ? 8192
>
> analyze oneblock;
>
> set enable_seqscan=true;
>
> select (select value from oneblock where id = i)
> ? from generate_series(1,200) i, generate_series(1,5000) j;
> Time: 25596.709 ms? (that's 25.6 us per row)
>
> set enable_seqscan=false;
>
> select (select value from oneblock where id = i)
> ? from generate_series(1,200) i, generate_series(1,5000) j;
> Time: 2415.691 ms ? (that's 2.4 us per row)

Roughly what I get on my MBP (I'm about a factor of 2 slower). This  
makes me think it's an issue of having to slog through an entire page  
one row at a time vs just using the index. To test this I tested  
selecting i=200 (remember we start filling data at the back of the  
page, so 200 would actually be the front, and I'm assuming the first  
value that would be hit) vs i=1. With seqscans, I saw about a 10%  
difference. With index scans the difference was moot, but also note  
that now index scans are in-between seqscans in performance.

decibel@platter.local=# explain analyze select (select value from  
oneblock where id = 200)  from generate_series(1,200) i, generate_series(1,500000) j;
                           QUERY  
 
PLAN
------------------------------------------------------------------------ 
----------------------------------------------------------------- Nested Loop  (cost=17.00..20029.50 rows=1000000
width=0)(actual  
 
time=270.867..65821.373 rows=100000000 loops=1)   InitPlan     ->  Seq Scan on oneblock  (cost=0.00..3.50 rows=1
width=7) 
 
(actual time=0.052..0.053 rows=1 loops=1)           Filter: (id = 200)   ->  Function Scan on generate_series i
(cost=0.00..12.50 
 
rows=1000 width=0) (actual time=0.062..0.351 rows=200 loops=1)   ->  Materialize  (cost=13.50..23.50 rows=1000 width=0)
(actual 
 
time=1.368..164.634 rows=500000 loops=200)         ->  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=270.743..459.335 rows=500000 loops=1) Total runtime: 79055.822 ms
(8 rows)

decibel@platter.local=# explain analyze select (select value from  
oneblock where id = 1)  from generate_series(1,200) i, generate_series(1,500000) j;
                         QUERY  
 
PLAN
------------------------------------------------------------------------ 
----------------------------------------------------------------- Nested Loop  (cost=17.00..20029.50 rows=1000000
width=0)(actual  
 
time=261.941..72937.226 rows=100000000 loops=1)   InitPlan     ->  Seq Scan on oneblock  (cost=0.00..3.50 rows=1
width=7) 
 
(actual time=0.025..0.056 rows=1 loops=1)           Filter: (id = 1)   ->  Function Scan on generate_series i
(cost=0.00..12.50 
 
rows=1000 width=0) (actual time=0.060..0.346 rows=200 loops=1)   ->  Materialize  (cost=13.50..23.50 rows=1000 width=0)
(actual 
 
time=1.375..182.474 rows=500000 loops=200)         ->  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=261.815..448.652 rows=500000 loops=1) Total runtime: 87702.315 ms
(8 rows)

decibel@platter.local=# set enable_seqscan =off;
SET
decibel@platter.local=# explain analyze select (select value from  
oneblock where id = 200)  from generate_series(1,200) i, generate_series(1,500000) j;
                           QUERY  
 
PLAN
------------------------------------------------------------------------ 
----------------------------------------------------------------- Nested Loop  (cost=21.77..20034.27 rows=1000000
width=0)(actual  
 
time=262.219..69851.786 rows=100000000 loops=1)   InitPlan     ->  Index Scan using oneblock_pkey on oneblock   
(cost=0.00..8.27 rows=1 width=7) (actual time=0.024..0.026 rows=1  
loops=1)           Index Cond: (id = 200)   ->  Function Scan on generate_series i  (cost=0.00..12.50  
rows=1000 width=0) (actual time=0.062..0.355 rows=200 loops=1)   ->  Materialize  (cost=13.50..23.50 rows=1000 width=0)
(actual 
 
time=1.325..174.314 rows=500000 loops=200)         ->  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=262.119..449.383 rows=500000 loops=1) Total runtime: 83024.952 ms
(8 rows)

decibel@platter.local=# explain analyze select (select value from  
oneblock where id = 1)  from generate_series(1,200) i, generate_series(1,500000) j;
                         QUERY  
 
PLAN
------------------------------------------------------------------------ 
----------------------------------------------------------------- Nested Loop  (cost=21.77..20034.27 rows=1000000
width=0)(actual  
 
time=262.175..68943.985 rows=100000000 loops=1)   InitPlan     ->  Index Scan using oneblock_pkey on oneblock   
(cost=0.00..8.27 rows=1 width=7) (actual time=0.023..0.025 rows=1  
loops=1)           Index Cond: (id = 1)   ->  Function Scan on generate_series i  (cost=0.00..12.50  
rows=1000 width=0) (actual time=0.062..0.339 rows=200 loops=1)   ->  Materialize  (cost=13.50..23.50 rows=1000 width=0)
(actual 
 
time=1.325..176.056 rows=500000 loops=200)         ->  Function Scan on generate_series j  (cost=0.00..12.50  
rows=1000 width=0) (actual time=262.079..454.692 rows=500000 loops=1) Total runtime: 82598.556 ms
(8 rows)

decibel@platter.local=#
--
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828



pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Transaction-controlled robustness for replication
Next
From: Bruce Momjian
Date:
Subject: Re: Transaction-controlled robustness for replication