Thread: Choosing between seqscan and bitmap scan

Choosing between seqscan and bitmap scan

From

Teodor Sigaev

Date:

29 April 2010, 06:43:49

Hi!

There is some strange on current CVS with correct choosing of scans. Although 
bitmap scan is cheaper but postgresql chooses seqscan. Test suite:

CREATE OR REPLACE FUNCTION genvect()
RETURNS tsvector AS
$$    SELECT
        array_to_string(            ARRAY(                SELECT
(random()*random()*random()*1000.0)::int::text               FROM                    generate_series(1, 10 +
(100.0*random())::bigint)           ),            ' '        )::tsvector;
 
$$
LANGUAGE SQL VOLATILE;

SELECT    t::int4 AS id, genvect() AS ts INTO foo
FROM    generate_series(1, 100000) AS t;

CREATE INDEX foo_idx ON foo USING gin (ts);

VACCUM ANALYZE foo;

postgres=# explain  select count(*) from foo where ts @@ '259';                          QUERY PLAN
--------------------------------------------------------------- Aggregate  (cost=5817.27..5817.28 rows=1 width=0)   ->
SeqScan on foo  (cost=0.00..5805.00 rows=4907 width=0)         Filter: (ts @@ '''259'''::tsquery)
 
(3 rows)

Time: 6,370 ms
postgres=# set enable_seqscan = off;
SET
Time: 2,014 ms
postgres=# explain  select count(*) from foo where ts @@ '259';                                   QUERY PLAN
--------------------------------------------------------------------------------- Aggregate  (cost=5767.35..5767.36
rows=1width=0)   ->  Bitmap Heap Scan on foo  (cost=942.46..5755.08 rows=4907 width=0)         Recheck Cond: (ts @@
'''259'''::tsquery)        ->  Bitmap Index Scan on foo_idx  (cost=0.00..941.24 rows=4907 width=0)               Index
Cond:(ts @@ '''259'''::tsquery)
 
(5 rows)

Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?

Changed options in postgresql.conf:
shared_buffers=128MB
temp_buffers=16MB
work_mem=16MB
maintenance_work_mem=256MB
effective_cache_size=1024MB



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Choosing between seqscan and bitmap scan

From

Cédric Villemain

Date:

29 April 2010, 07:02:43

2010/4/29 Teodor Sigaev <teodor@sigaev.ru>:
> Hi!
>
> There is some strange on current CVS with correct choosing of scans.

Also true with 8.4, default configuration.

> Although bitmap scan is cheaper but postgresql chooses seqscan. Test suite:
>
> CREATE OR REPLACE FUNCTION genvect()
> RETURNS tsvector AS
> $$
>    SELECT
>
>        array_to_string(
>            ARRAY(
>                SELECT
>                    (random()*random()*random()*1000.0)::int::text
>                FROM
>                    generate_series(1, 10 + (100.0*random())::bigint)
>            ),
>            ' '
>        )::tsvector;
> $$
> LANGUAGE SQL VOLATILE;
>
> SELECT
>    t::int4 AS id, genvect() AS ts INTO foo
> FROM
>    generate_series(1, 100000) AS t;
>
> CREATE INDEX foo_idx ON foo USING gin (ts);
>
> VACCUM ANALYZE foo;
>
> postgres=# explain  select count(*) from foo where ts @@ '259';
>                          QUERY PLAN
> ---------------------------------------------------------------
>  Aggregate  (cost=5817.27..5817.28 rows=1 width=0)
>   ->  Seq Scan on foo  (cost=0.00..5805.00 rows=4907 width=0)
>         Filter: (ts @@ '''259'''::tsquery)
> (3 rows)
>
> Time: 6,370 ms
> postgres=# set enable_seqscan = off;
> SET
> Time: 2,014 ms
> postgres=# explain  select count(*) from foo where ts @@ '259';
>                                   QUERY PLAN
> ---------------------------------------------------------------------------------
>  Aggregate  (cost=5767.35..5767.36 rows=1 width=0)
>   ->  Bitmap Heap Scan on foo  (cost=942.46..5755.08 rows=4907 width=0)
>         Recheck Cond: (ts @@ '''259'''::tsquery)
>         ->  Bitmap Index Scan on foo_idx  (cost=0.00..941.24 rows=4907
> width=0)
>               Index Cond: (ts @@ '''259'''::tsquery)
> (5 rows)
>
> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?
>
> Changed options in postgresql.conf:
> shared_buffers=128MB
> temp_buffers=16MB
> work_mem=16MB
> maintenance_work_mem=256MB
> effective_cache_size=1024MB
>
>
>
> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                   WWW: http://www.sigaev.ru/
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



--
Cédric Villemain

Re: Choosing between seqscan and bitmap scan

From

Tom Lane

Date:

29 April 2010, 10:38:35

Teodor Sigaev <teodor@sigaev.ru> writes:
> [ planner prefers ]
>     ->  Seq Scan on foo  (cost=0.00..5805.00 rows=4907 width=0)
> to
>     ->  Bitmap Heap Scan on foo  (cost=942.46..5755.08 rows=4907 width=0)

> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?

There's a fuzz factor of (IIRC) 1% in path cost comparisons.  It's
deciding that the seqscan and bitmapscan total costs are not
meaningfully different; then since the startup costs *are* meaningfully
different, it's making the choice on the basis of cheaper startup cost.
        regards, tom lane

Re: Choosing between seqscan and bitmap scan

From

Oleg Bartunov

Date:

29 April 2010, 11:02:55

On Thu, 29 Apr 2010, Tom Lane wrote:

> Teodor Sigaev <teodor@sigaev.ru> writes:
>> [ planner prefers ]
>>     ->  Seq Scan on foo  (cost=0.00..5805.00 rows=4907 width=0)
>> to
>>     ->  Bitmap Heap Scan on foo  (cost=942.46..5755.08 rows=4907 width=0)
>
>> Why does pgsql choose seqscan (5817.28) instead of bitmap one (5767.36)?
>
> There's a fuzz factor of (IIRC) 1% in path cost comparisons.  It's
> deciding that the seqscan and bitmapscan total costs are not
> meaningfully different; then since the startup costs *are* meaningfully
> different, it's making the choice on the basis of cheaper startup cost.

hmm, what if we add index scan preference inside 1% tolerance ?

    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Choosing between seqscan and bitmap scan

From

Tom Lane

Date:

29 April 2010, 11:10:49

Oleg Bartunov <oleg@sai.msu.su> writes:
> On Thu, 29 Apr 2010, Tom Lane wrote:
>> There's a fuzz factor of (IIRC) 1% in path cost comparisons.  It's
>> deciding that the seqscan and bitmapscan total costs are not
>> meaningfully different; then since the startup costs *are* meaningfully
>> different, it's making the choice on the basis of cheaper startup cost.

> hmm, what if we add index scan preference inside 1% tolerance ?

Why?  IMO this behavior is perfectly reasonable; in fact I've sometimes
thought the fuzz threshold should be a lot more than 1%.  There is no
reason for the planner to believe that the bitmapscan is meaningfully
superior on total cost, while it is clearly inferior on startup cost.

If your problem is that the seqscan is a lot slower in reality,
the answer to that is to twiddle the cost parameters so that the
planner knows that, not to object to this logic.
        regards, tom lane