Thread: Choosing between seqscan and bitmap scan
Hi! There is some strange on current CVS with correct choosing of scans. Although bitmap scan is cheaper but postgresql chooses seqscan. Test suite: CREATE OR REPLACE FUNCTION genvect() RETURNS tsvector AS $$ SELECT array_to_string( ARRAY( SELECT (random()*random()*random()*1000.0)::int::text FROM generate_series(1, 10 + (100.0*random())::bigint) ), ' ' )::tsvector; $$ LANGUAGE SQL VOLATILE; SELECT t::int4 AS id, genvect() AS ts INTO foo FROM generate_series(1, 100000) AS t; CREATE INDEX foo_idx ON foo USING gin (ts); VACCUM ANALYZE foo; postgres=# explain select count(*) from foo where ts @@ '259'; QUERY PLAN --------------------------------------------------------------- Aggregate (cost=5817.27..5817.28 rows=1 width=0) -> SeqScan on foo (cost=0.00..5805.00 rows=4907 width=0) Filter: (ts @@ '''259'''::tsquery) (3 rows) Time: 6,370 ms postgres=# set enable_seqscan = off; SET Time: 2,014 ms postgres=# explain select count(*) from foo where ts @@ '259'; QUERY PLAN --------------------------------------------------------------------------------- Aggregate (cost=5767.35..5767.36 rows=1width=0) -> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0) Recheck Cond: (ts @@ '''259'''::tsquery) -> Bitmap Index Scan on foo_idx (cost=0.00..941.24 rows=4907 width=0) Index Cond:(ts @@ '''259'''::tsquery) (5 rows) Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)? Changed options in postgresql.conf: shared_buffers=128MB temp_buffers=16MB work_mem=16MB maintenance_work_mem=256MB effective_cache_size=1024MB -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
2010/4/29 Teodor Sigaev <teodor@sigaev.ru>: > Hi! > > There is some strange on current CVS with correct choosing of scans. Also true with 8.4, default configuration. > Although bitmap scan is cheaper but postgresql chooses seqscan. Test suite: > > CREATE OR REPLACE FUNCTION genvect() > RETURNS tsvector AS > $$ > SELECT > > array_to_string( > ARRAY( > SELECT > (random()*random()*random()*1000.0)::int::text > FROM > generate_series(1, 10 + (100.0*random())::bigint) > ), > ' ' > )::tsvector; > $$ > LANGUAGE SQL VOLATILE; > > SELECT > t::int4 AS id, genvect() AS ts INTO foo > FROM > generate_series(1, 100000) AS t; > > CREATE INDEX foo_idx ON foo USING gin (ts); > > VACCUM ANALYZE foo; > > postgres=# explain select count(*) from foo where ts @@ '259'; > QUERY PLAN > --------------------------------------------------------------- > Aggregate (cost=5817.27..5817.28 rows=1 width=0) > -> Seq Scan on foo (cost=0.00..5805.00 rows=4907 width=0) > Filter: (ts @@ '''259'''::tsquery) > (3 rows) > > Time: 6,370 ms > postgres=# set enable_seqscan = off; > SET > Time: 2,014 ms > postgres=# explain select count(*) from foo where ts @@ '259'; > QUERY PLAN > --------------------------------------------------------------------------------- > Aggregate (cost=5767.35..5767.36 rows=1 width=0) > -> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0) > Recheck Cond: (ts @@ '''259'''::tsquery) > -> Bitmap Index Scan on foo_idx (cost=0.00..941.24 rows=4907 > width=0) > Index Cond: (ts @@ '''259'''::tsquery) > (5 rows) > > Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)? > > Changed options in postgresql.conf: > shared_buffers=128MB > temp_buffers=16MB > work_mem=16MB > maintenance_work_mem=256MB > effective_cache_size=1024MB > > > > -- > Teodor Sigaev E-mail: teodor@sigaev.ru > WWW: http://www.sigaev.ru/ > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Cédric Villemain
Teodor Sigaev <teodor@sigaev.ru> writes: > [ planner prefers ] > -> Seq Scan on foo (cost=0.00..5805.00 rows=4907 width=0) > to > -> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0) > Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)? There's a fuzz factor of (IIRC) 1% in path cost comparisons. It's deciding that the seqscan and bitmapscan total costs are not meaningfully different; then since the startup costs *are* meaningfully different, it's making the choice on the basis of cheaper startup cost. regards, tom lane
On Thu, 29 Apr 2010, Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: >> [ planner prefers ] >> -> Seq Scan on foo (cost=0.00..5805.00 rows=4907 width=0) >> to >> -> Bitmap Heap Scan on foo (cost=942.46..5755.08 rows=4907 width=0) > >> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)? > > There's a fuzz factor of (IIRC) 1% in path cost comparisons. It's > deciding that the seqscan and bitmapscan total costs are not > meaningfully different; then since the startup costs *are* meaningfully > different, it's making the choice on the basis of cheaper startup cost. hmm, what if we add index scan preference inside 1% tolerance ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov <oleg@sai.msu.su> writes: > On Thu, 29 Apr 2010, Tom Lane wrote: >> There's a fuzz factor of (IIRC) 1% in path cost comparisons. It's >> deciding that the seqscan and bitmapscan total costs are not >> meaningfully different; then since the startup costs *are* meaningfully >> different, it's making the choice on the basis of cheaper startup cost. > hmm, what if we add index scan preference inside 1% tolerance ? Why? IMO this behavior is perfectly reasonable; in fact I've sometimes thought the fuzz threshold should be a lot more than 1%. There is no reason for the planner to believe that the bitmapscan is meaningfully superior on total cost, while it is clearly inferior on startup cost. If your problem is that the seqscan is a lot slower in reality, the answer to that is to twiddle the cost parameters so that the planner knows that, not to object to this logic. regards, tom lane