Re: Startup cost of sequential scan - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: Startup cost of sequential scan |
Date | |
Msg-id | CAPpHfduHuC6ebeVXx1MeRGHszFOHooXhEvWafDtcRN5V9wOV6g@mail.gmail.com Whole thread Raw |
In response to | Re: Startup cost of sequential scan (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Thu, Aug 30, 2018 at 6:55 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alexander Korotkov <a.korotkov@postgrespro.ru> writes: > > I understand that startup cost is not "time to find the first row". > > But I think this example highlight not one but two issues. > > 1) Row count estimates for joins are wrong. > > Yup. > > > 2) Rows are assumed to be continuous while in reality they are > > discrete. > > Where do you get that from? I made a similar example, where estimates are good. It's quite artificial, because it selects small limit from enormous virtual table constructed by cross joins. But it illustrates how our model, assuming tuples to be continuous, can differ from reality. create table t1 (id int primary key, k int); create table t2 (id int); insert into t1 (select i, i from generate_series(1,1000000) i); insert into t2 (select 0 from generate_series(1,100) i); vacuum analyze t1, t2; By default, the query is processed using sequential scan for t1. # explain analyze select * from t1,t2 x1,t2 x2,t2 x3,t2 x4,t2 x5 where t1.id = 500000 limit 100; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.26 rows=100 width=28) (actual time=32.879..32.926 rows=100 loops=1) -> Nested Loop (cost=0.00..126279562.00 rows=10000000000 width=28) (actual time=32.878..32.912 rows=100 loops=1) -> Nested Loop (cost=0.00..1279559.75 rows=100000000 width=24) (actual time=32.873..32.873 rows=1 loops=1) -> Nested Loop (cost=0.00..29557.50 rows=1000000 width=20) (actual time=32.868..32.868 rows=1 loops=1) -> Nested Loop (cost=0.00..17055.25 rows=10000 width=16) (actual time=32.864..32.864 rows=1 loops=1) -> Nested Loop (cost=0.00..16928.00 rows=100 width=12) (actual time=32.856..32.856 rows=1 loops=1) -> Seq Scan on t1 (cost=0.00..16925.00 rows=1 width=8) (actual time=32.841..32.841 rows=1 loops=1) Filter: (id = 500000) Rows Removed by Filter: 501983 -> Seq Scan on t2 x1 (cost=0.00..2.00 rows=100 width=4) (actual time=0.011..0.011 rows=1 loops=1) -> Materialize (cost=0.00..2.50 rows=100 width=4) (actual time=0.007..0.007 rows=1 loops=1) -> Seq Scan on t2 x2 (cost=0.00..2.00 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Materialize (cost=0.00..2.50 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Seq Scan on t2 x3 (cost=0.00..2.00 rows=100 width=4) (actual time=0.002..0.003 rows=1 loops=1) -> Materialize (cost=0.00..2.50 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Seq Scan on t2 x4 (cost=0.00..2.00 rows=100 width=4) (actual time=0.002..0.003 rows=1 loops=1) -> Materialize (cost=0.00..2.50 rows=100 width=4) (actual time=0.004..0.024 rows=100 loops=1) -> Seq Scan on t2 x5 (cost=0.00..2.00 rows=100 width=4) (actual time=0.003..0.010 rows=100 loops=1) Planning Time: 0.372 ms Execution Time: 32.987 ms (20 rows) But if we disable sequential scan and bitmap scan then it would be index scan, which is much faster. # set enable_seqscan = off; # set enable_bitmapscan = off; # explain analyze select * from t1,t2 x1,t2 x2,t2 x3,t2 x4,t2 x5 where t1.id = 500000 limit 100; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=50000000000.43..50000000001.69 rows=100 width=28) (actual time=0.041..0.092 rows=100 loops=1) -> Nested Loop (cost=50000000000.43..50126262645.44 rows=10000000000 width=28) (actual time=0.040..0.079 rows=100 loops=1) -> Nested Loop (cost=40000000000.43..40001262643.19 rows=100000000 width=24) (actual time=0.036..0.036 rows=1 loops=1) -> Nested Loop (cost=30000000000.42..30000012640.94 rows=1000000 width=20) (actual time=0.033..0.033 rows=1 loops=1) -> Nested Loop (cost=20000000000.42..20000000138.69 rows=10000 width=16) (actual time=0.029..0.029 rows=1 loops=1) -> Nested Loop (cost=10000000000.42..10000000011.44 rows=100 width=12) (actual time=0.023..0.023 rows=1 loops=1) -> Index Scan using t1_pkey on t1 (cost=0.42..8.44 rows=1 width=8) (actual time=0.015..0.015 rows=1 loops=1) Index Cond: (id = 500000) -> Seq Scan on t2 x1 (cost=10000000000.00..10000000002.00 rows=100 width=4) (actual time=0.007..0.007 rows=1 loops=1) -> Materialize (cost=10000000000.00..10000000002.50 rows=100 width=4) (actual time=0.004..0.005 rows=1 loops=1) -> Seq Scan on t2 x2 (cost=10000000000.00..10000000002.00 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Materialize (cost=10000000000.00..10000000002.50 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Seq Scan on t2 x3 (cost=10000000000.00..10000000002.00 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Materialize (cost=10000000000.00..10000000002.50 rows=100 width=4) (actual time=0.003..0.003 rows=1 loops=1) -> Seq Scan on t2 x4 (cost=10000000000.00..10000000002.00 rows=100 width=4) (actual time=0.002..0.003 rows=1 loops=1) -> Materialize (cost=10000000000.00..10000000002.50 rows=100 width=4) (actual time=0.003..0.026 rows=100 loops=1) -> Seq Scan on t2 x5 (cost=10000000000.00..10000000002.00 rows=100 width=4) (actual time=0.003..0.010 rows=100 loops=1) Planning Time: 0.274 ms Execution Time: 0.150 ms (19 rows) From this example, I get that there is a distinct issue that we assume rows to be continuous. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: