Thread: bitmap heap scan way cheaper than seq scan on the same amount of tuples (fts-search).

Hi.

I'm currently trying to figure out why the tsearch performance seems to
vary a lot between different queryplans. I have created a sample dataset
that sort of resembles the data I have to work on.

The script that builds the dataset is at:
http://krogh.cc/~jesper/build-test.pl
and http://krogh.cc/~jesper/words.txt is needed for it to run.

Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.

The dataset consists of words randomized, but .. all records contains
"commonterm", around 80% contains commonterm80 and so on..

    my $rand = rand();
    push @doc,"commonterm" if $commonpos == $j;
    push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;

Results are run multiple times after each other so they should be
reproducible:

ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                   QUERY PLAN

----------------------------------------------------------------------------------------------------------------
 Seq Scan on ftstest  (cost=0.00..10750.00 rows=40188 width=4) (actual
time=0.102..1792.215 rows=40082 loops=1)
   Filter: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 1809.437 ms
(3 rows)

ftstest=# set enable_seqscan=off;
SET
ftstest=# explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                              QUERY PLAN


---------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=115389.14..125991.96 rows=40188
width=4) (actual time=17.445..197.356 rows=40082 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..115379.09
rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 204.201 ms
(5 rows)

Given that the seq-scan have to visit 50K row to create the result and
the bitmap heap scan only have to visit 40K (but search the index) we
would expect the seq-scan to be at most 25% more expensive than the
bitmap-heap scan.. e.g. less than 300ms.

Jesper
--
Jesper

On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote:

> Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.
>
> The dataset consists of words randomized, but .. all records contains
> "commonterm", around 80% contains commonterm80 and so on..
>
>     my $rand = rand();
>     push @doc,"commonterm" if $commonpos == $j;
>     push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;

You should probably re-generate your random value for each call rather
than store it. Currently, every document with commonterm20 is guaranteed
to also have commonterm40, commonterm60, etc, which probably isn't very
realistic, and also makes doc size correlate with word rarity.

> Given that the seq-scan have to visit 50K row to create the result and
> the bitmap heap scan only have to visit 40K (but search the index) we
> would expect the seq-scan to be at most 25% more expensive than the
> bitmap-heap scan.. e.g. less than 300ms.

I suspect table bloat. Try VACUUMing your table and trying again.

In this sort of test it's often a good idea to TRUNCATE the table before
populating it with a newly generated data set. That helps avoid any
residual effects from table bloat etc from lingering between test runs.

--
Craig Ringer


Craig Ringer wrote:
> On Mon, 2009-10-26 at 21:02 +0100, Jesper Krogh wrote:
>
>> Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.
>>
>> The dataset consists of words randomized, but .. all records contains
>> "commonterm", around 80% contains commonterm80 and so on..
>>
>>     my $rand = rand();
>>     push @doc,"commonterm" if $commonpos == $j;
>>     push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;
>
> You should probably re-generate your random value for each call rather
> than store it. Currently, every document with commonterm20 is guaranteed
> to also have commonterm40, commonterm60, etc, which probably isn't very
> realistic, and also makes doc size correlate with word rarity.

I had that in the first version, but I wanted to have the gaurantee that
a commonterm60 was indeed a subset of commonterm80, so that why its
sturctured like that. I know its not realistic, but it gives measureable
results since I know my queries will hit the same tuples.

I fail to see how this should have any direct effect on query time?

>> Given that the seq-scan have to visit 50K row to create the result and
>> the bitmap heap scan only have to visit 40K (but search the index) we
>> would expect the seq-scan to be at most 25% more expensive than the
>> bitmap-heap scan.. e.g. less than 300ms.
>
> I suspect table bloat. Try VACUUMing your table and trying again.

No bloat here:
ftstest=# VACUUM FULL VERBOSE ftstest;
INFO:  vacuuming "public.ftstest"
INFO:  "ftstest": found 0 removable, 50000 nonremovable row versions in
10000 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 1352 to 1652 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 6859832 bytes.
0 pages are or will become empty, including 0 at the end of the table.
536 pages containing 456072 free bytes are potential move destinations.
CPU 0.03s/0.03u sec elapsed 0.06 sec.
INFO:  index "ftstest_id_key" now contains 50000 row versions in 139 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.13 sec.
INFO:  index "ftstest_gin_idx" now contains 50000 row versions in 35792
pages
DETAIL:  0 index pages have been deleted, 25022 are currently reusable.
CPU 0.46s/0.11u sec elapsed 11.16 sec.
INFO:  "ftstest": moved 0 row versions, truncated 10000 to 10000 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_908525"
INFO:  "pg_toast_908525": found 0 removable, 100000 nonremovable row
versions in 16710 pages
DETAIL:  0 dead row versions cannot be removed yet.
Nonremovable row versions range from 270 to 2032 bytes long.
There were 0 unused item pointers.
Total free space (including removable row versions) is 3695712 bytes.
0 pages are or will become empty, including 0 at the end of the table.
5063 pages containing 1918692 free bytes are potential move destinations.
CPU 0.38s/0.17u sec elapsed 2.64 sec.
INFO:  index "pg_toast_908525_index" now contains 100000 row versions in
276 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.28 sec.
INFO:  "pg_toast_908525": moved 0 row versions, truncated 16710 to 16710
pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
VACUUM
ftstest=#


> In this sort of test it's often a good idea to TRUNCATE the table before
> populating it with a newly generated data set. That helps avoid any
> residual effects from table bloat etc from lingering between test runs.

As you could see in the scripts, the table is dropped just before its
recreated and filled with data.

Did you try to re-run the test?

Jesper
--
Jesper

On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote:

> > You should probably re-generate your random value for each call rather
> > than store it. Currently, every document with commonterm20 is guaranteed
> > to also have commonterm40, commonterm60, etc, which probably isn't very
> > realistic, and also makes doc size correlate with word rarity.
>
> I had that in the first version, but I wanted to have the gaurantee that
> a commonterm60 was indeed a subset of commonterm80, so that why its
> sturctured like that. I know its not realistic, but it gives measureable
> results since I know my queries will hit the same tuples.
>
> I fail to see how this should have any direct effect on query time?

Probably not, in truth, but with the statistics-based planner I'm
occasionally surprised by what can happen.

>
> > In this sort of test it's often a good idea to TRUNCATE the table before
> > populating it with a newly generated data set. That helps avoid any
> > residual effects from table bloat etc from lingering between test runs.
>
> As you could see in the scripts, the table is dropped just before its
> recreated and filled with data.
>
> Did you try to re-run the test?

No, I didn't. I thought it worth checking if bloat might be the result
first, though I should've read the scripts to confirm you weren't
already handling that possibility.

Anyway, I've done a run to generate your data set and run a test. After
executing the test statement twice (once with and once without
enable_seqscan) to make sure all data is in cache and not being read
from disk, when I run the tests here are my results:


test=> set enable_seqscan=on;
SET
test=> explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                           QUERY PLAN
        

--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=36.96..227.10 rows=50 width=4) (actual time=15.830..134.194 rows=40061 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..36.95 rows=50 width=0) (actual time=11.905..11.905 rows=40061
loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 148.477 ms
(5 rows)

test=> set enable_seqscan=off;
SET
test=> explain analyze select id from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                           QUERY PLAN
        

--------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=36.96..227.10 rows=50 width=4) (actual time=15.427..134.156 rows=40061 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..36.95 rows=50 width=0) (actual time=11.739..11.739 rows=40061
loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 148.583 ms
(5 rows)



Any chance your disk cache was cold on the first test run, so Pg was
having to read the table from disk during the seqscan, and could just
use shared_buffers when you repeated the test for the index scan?



--
Craig Ringer


Craig Ringer wrote:
> On Tue, 2009-10-27 at 06:08 +0100, Jesper Krogh wrote:
>
>>> You should probably re-generate your random value for each call rather
>>> than store it. Currently, every document with commonterm20 is guaranteed
>>> to also have commonterm40, commonterm60, etc, which probably isn't very
>>> realistic, and also makes doc size correlate with word rarity.
>> I had that in the first version, but I wanted to have the gaurantee that
>> a commonterm60 was indeed a subset of commonterm80, so that why its
>> sturctured like that. I know its not realistic, but it gives measureable
>> results since I know my queries will hit the same tuples.
>>
>> I fail to see how this should have any direct effect on query time?
>
> Probably not, in truth, but with the statistics-based planner I'm
> occasionally surprised by what can happen.
>
>>> In this sort of test it's often a good idea to TRUNCATE the table before
>>> populating it with a newly generated data set. That helps avoid any
>>> residual effects from table bloat etc from lingering between test runs.
>> As you could see in the scripts, the table is dropped just before its
>> recreated and filled with data.
>>
>> Did you try to re-run the test?
>
> No, I didn't. I thought it worth checking if bloat might be the result
> first, though I should've read the scripts to confirm you weren't
> already handling that possibility.
>
> Anyway, I've done a run to generate your data set and run a test. After
> executing the test statement twice (once with and once without
> enable_seqscan) to make sure all data is in cache and not being read
> from disk, when I run the tests here are my results:
>
>
> test=> set enable_seqscan=on;
> SET
> test=> explain analyze select id from ftstest where body_fts @@
> to_tsquery('commonterm80');

Here you should search for "commonterm" not "commonterm80", commonterm
will go into a seq-scan. You're not testing the same thing as I did.

> Any chance your disk cache was cold on the first test run, so Pg was
> having to read the table from disk during the seqscan, and could just
> use shared_buffers when you repeated the test for the index scan?

they were run repeatedly.

On Tue, 2009-10-27 at 06:44 +0100, Jesper Krogh wrote:

> Here you should search for "commonterm" not "commonterm80", commonterm
> will go into a seq-scan. You're not testing the same thing as I did.

Point taken. I ran the same commands as you, but as the planner picked
different plans it wasn't much use. The fact that I didn't notice that
is a bit worrying, as it suggests and even worse than normal degree of
brain-fade. Sorry for the waste of time.

Anyway, testing more usefully:

On 8.4 on a different system Pg uses the seq scan by preference, with a
runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when
searching for `commonterm' even when enable_seqscan is set to `off'. A
search for `commonterm80' also uses a seq scan (1067ms),  but if
enable_seqscan is set to off it'll use a bitmap heap scan at 237ms.

On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ...
odd. If I force it not to use a bitmap heap scan it'll use an index
scan.  Preventing that too results in a seq scan with a runtime of
1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like
a pretty strange result on face value.


So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT
longer than the bitmap heap scan, though similar numbers of tuples are
being read by both.

I see the same results when actually reading the results rather than
just doing an `explain analyze'. With psql set to send output
to /dev/null and with \timing enabled:

test=> \o /dev/null
test=> set enable_seqscan = on;
Time: 0.282 ms
test=> select id from ftstest where body_fts @@
to_tsquery('commonterm80');
Time: 988.880 ms
test=> set enable_seqscan = off;
Time: 0.286 ms
test=> select id from ftstest where body_fts @@
to_tsquery('commonterm80');
Time: 159.167 ms

so - nearly 1s vs 0.15s is a big difference between what I previously
confirmed to be bitmap heap scan and seq scan respectively for the same
query. The same number of records are being returned in both cases.

If I "select *" rather than just reading the `id' field, the runtimes
are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both
reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE'
results are still quite different, though, at 1020ms seq scan vs 233ms
bitmap heap, suggesting that the similarity is created only by the time
taken to actually transfer the data to the client. The time difference
between the two is much the same.

So - for some reason the seq scan takes 800ms or so longer than the
bitmap heap scan. I can see why you're puzzled. I can reproduce it on
two different machines with two different Pg versions, and using two
slightly different methods for loading the data as well.  So, I can
confirm your test results now that I'm actually testing properly.


test=> explain analyze select * from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                              QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on ftstest  (cost=25836.66..36432.95 rows=39753
width=54) (actual time=27.452..175.481 rows=39852 loops=1)
   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..25826.72
rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1)
         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 233.473 ms
(5 rows)

test=> set enable_seqscan = on;
SET
test=> explain analyze select * from ftstest where body_fts @@
to_tsquery('commonterm80');
                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------
 Seq Scan on ftstest  (cost=0.00..10750.00 rows=39753 width=54) (actual
time=0.141..956.496 rows=39852 loops=1)
   Filter: (body_fts @@ to_tsquery('commonterm80'::text))
 Total runtime: 1020.936 ms
(3 rows)




By the way, for the 8.4 test I modifed the loader script so it wouldn't
take quite so painfully long to run second time 'round. I turned
autocommit off, wrapped all the inserts up in a single transaction, and
moved the fts index creation to after all the data has been inserted.
It's a *LOT* faster, and the test results match yours.

> they were run repeatedly.

Yeah, just saw that in your original mail. Sorry.

--
Craig Ringer


Craig Ringer wrote:
> On 8.4 on a different system Pg uses the seq scan by preference, with a
> runtime of 1148ms. It doesn't seem to want to do a bitmap heap scan when
> searching for `commonterm' even when enable_seqscan is set to `off'. A
> search for `commonterm80' also uses a seq scan (1067ms),  but if
> enable_seqscan is set to off it'll use a bitmap heap scan at 237ms.

Ok, thats excactly as my number.

> On my 8.3 Pg isn't using a seqscan even for `commonterm', which is ...
> odd. If I force it not to use a bitmap heap scan it'll use an index
> scan.  Preventing that too results in a seq scan with a runtime of
> 1500ms vs the 161ms of the bitmap heap scan. I agree that it seems like
> a pretty strange result on face value.

PG 8.3 doesnt have statistics data available for gin-indexes so that may
be why the query-planner can do otherwise on 8.3. It also means that it
is a regression since in these cases 8.4 will perform worse than 8.3
did. (allthough the statistics makes a lot other cases way better).

> So, on both 8.3 and 8.4 the sequential scan is indeed taking a LOT
> longer than the bitmap heap scan, though similar numbers of tuples are
> being read by both.
>
> I see the same results when actually reading the results rather than
> just doing an `explain analyze'. With psql set to send output
> to /dev/null and with \timing enabled:
>
> test=> \o /dev/null
> test=> set enable_seqscan = on;
> Time: 0.282 ms
> test=> select id from ftstest where body_fts @@
> to_tsquery('commonterm80');
> Time: 988.880 ms
> test=> set enable_seqscan = off;
> Time: 0.286 ms
> test=> select id from ftstest where body_fts @@
> to_tsquery('commonterm80');
> Time: 159.167 ms
>
> so - nearly 1s vs 0.15s is a big difference between what I previously
> confirmed to be bitmap heap scan and seq scan respectively for the same
> query. The same number of records are being returned in both cases.
>
> If I "select *" rather than just reading the `id' field, the runtimes
> are much more similar - 4130ms seq scan, and 3285 bitmap heap scan (both
> reading data to /dev/null), a difference of ~800. `EXPLAIN ANALYZE'
> results are still quite different, though, at 1020ms seq scan vs 233ms
> bitmap heap, suggesting that the similarity is created only by the time
> taken to actually transfer the data to the client. The time difference
> between the two is much the same.
>
> So - for some reason the seq scan takes 800ms or so longer than the
> bitmap heap scan. I can see why you're puzzled. I can reproduce it on
> two different machines with two different Pg versions, and using two
> slightly different methods for loading the data as well.  So, I can
> confirm your test results now that I'm actually testing properly.

Thanks a lot.

> test=> explain analyze select * from ftstest where body_fts @@
> to_tsquery('commonterm80');
>                                                               QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on ftstest  (cost=25836.66..36432.95 rows=39753
> width=54) (actual time=27.452..175.481 rows=39852 loops=1)
>    Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
>    ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..25826.72
> rows=39753 width=0) (actual time=25.186..25.186 rows=39852 loops=1)
>          Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
>  Total runtime: 233.473 ms
> (5 rows)
>
> test=> set enable_seqscan = on;
> SET
> test=> explain analyze select * from ftstest where body_fts @@
> to_tsquery('commonterm80');
>                                                    QUERY PLAN
> ----------------------------------------------------------------------------------------------------------------
>  Seq Scan on ftstest  (cost=0.00..10750.00 rows=39753 width=54) (actual
> time=0.141..956.496 rows=39852 loops=1)
>    Filter: (body_fts @@ to_tsquery('commonterm80'::text))
>  Total runtime: 1020.936 ms
> (3 rows)

My systems seems more to prefer bitmap-scans a bit more, but given the
actual number it seems to be preferrablem. Thats about query-planning,
my main reason for posting was the actual run time.

> By the way, for the 8.4 test I modifed the loader script so it wouldn't
> take quite so painfully long to run second time 'round. I turned
> autocommit off, wrapped all the inserts up in a single transaction, and
> moved the fts index creation to after all the data has been inserted.
> It's a *LOT* faster, and the test results match yours.

I'll make that change if I have to work a bit more with it.

Thanks for speding time confirming my findings. (the I know its not just
 me getting blind at some problem).

Jesper
--
Jesper

On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh <jesper@krogh.cc> wrote:
> Hi.
>
> I'm currently trying to figure out why the tsearch performance seems to
> vary a lot between different queryplans. I have created a sample dataset
> that sort of resembles the data I have to work on.
>
> The script that builds the dataset is at:
> http://krogh.cc/~jesper/build-test.pl
> and http://krogh.cc/~jesper/words.txt is needed for it to run.
>
> Test system.. average desktop, 1 SATA drive and 1.5GB memory with pg 8.4.1.
>
> The dataset consists of words randomized, but .. all records contains
> "commonterm", around 80% contains commonterm80 and so on..
>
>        my $rand = rand();
>        push @doc,"commonterm" if $commonpos == $j;
>        push @doc,"commonterm80" if $commonpos == $j && $rand < 0.8;
>
> Results are run multiple times after each other so they should be
> reproducible:
>
> ftstest=# explain analyze select id from ftstest where body_fts @@
> to_tsquery('commonterm80');
>                                                   QUERY PLAN
>
> ----------------------------------------------------------------------------------------------------------------
>  Seq Scan on ftstest  (cost=0.00..10750.00 rows=40188 width=4) (actual
> time=0.102..1792.215 rows=40082 loops=1)
>   Filter: (body_fts @@ to_tsquery('commonterm80'::text))
>  Total runtime: 1809.437 ms
> (3 rows)
>
> ftstest=# set enable_seqscan=off;
> SET
> ftstest=# explain analyze select id from ftstest where body_fts @@
> to_tsquery('commonterm80');
>                                                              QUERY PLAN
>
>
---------------------------------------------------------------------------------------------------------------------------------------
>  Bitmap Heap Scan on ftstest  (cost=115389.14..125991.96 rows=40188
> width=4) (actual time=17.445..197.356 rows=40082 loops=1)
>   Recheck Cond: (body_fts @@ to_tsquery('commonterm80'::text))
>   ->  Bitmap Index Scan on ftstest_gin_idx  (cost=0.00..115379.09
> rows=40188 width=0) (actual time=13.370..13.370 rows=40082 loops=1)
>         Index Cond: (body_fts @@ to_tsquery('commonterm80'::text))
>  Total runtime: 204.201 ms
> (5 rows)
>
> Given that the seq-scan have to visit 50K row to create the result and
> the bitmap heap scan only have to visit 40K (but search the index) we
> would expect the seq-scan to be at most 25% more expensive than the
> bitmap-heap scan.. e.g. less than 300ms.

I've seen behavior similar to this in the past with a plain old B-tree
index.  As in your case, a bitmap index scan was significantly faster
than a sequential scan even though essentially all the heap pages had
to be scanned, but the planner expected the opposite to be true.  The
planner's expectation is that the dominent cost will be fetching the
pages, and it furthermore thinks that fetching things in sequential
order is much better than skipping around randomly.  However, if all
the pages are memory-resident - possibly even in L2 or L3 CPU cache -
fetching the pages is nearly free, so the dominant cost becomes the
CPU time to process the tuples.

My best guess is that in cases like this index cond is cheaper to
evaluate than the recheck cond/filter, so the index scan wins not by
reading fewer pages but by avoiding the need to examine some of the
tuples on each page.  I might be all wet, though.

If your whole database fits in RAM, you could try changing your
seq_page_cost and random_page_cost variables from the default values
of 1 and 4 to something around 0.05, or maybe even 0.01, and see
whether that helps.  But if it's just this query that is in cache and
you have lots of other things that are going to disk, that's harder to
tune.  You can probably still lower the default values somewhat, but
if you go crazy with it you'll start to have problems in the other
direction.

...Robert

> On Mon, Oct 26, 2009 at 4:02 PM, Jesper Krogh <jesper@krogh.cc> wrote:
>> Given that the seq-scan have to visit 50K row to create the result and
>> the bitmap heap scan only have to visit 40K (but search the index) we
>> would expect the seq-scan to be at most 25% more expensive than the
>> bitmap-heap scan.. e.g. less than 300ms.
>
> I've seen behavior similar to this in the past with a plain old B-tree
> index.  As in your case, a bitmap index scan was significantly faster
> than a sequential scan even though essentially all the heap pages had
> to be scanned, but the planner expected the opposite to be true.  The
> planner's expectation is that the dominent cost will be fetching the
> pages, and it furthermore thinks that fetching things in sequential
> order is much better than skipping around randomly.  However, if all
> the pages are memory-resident - possibly even in L2 or L3 CPU cache -
> fetching the pages is nearly free, so the dominant cost becomes the
> CPU time to process the tuples.

Well, no. This topic is not at all about the query-planner. It is about
the actual run-time of the two "allmost" identical queries. It may be
that we're seeing the results because one fits better into L2 or L3 cache,
but the complete dataset is memory resident and run multiple times in
a row to eliminate disk-access.

> My best guess is that in cases like this index cond is cheaper to
> evaluate than the recheck cond/filter, so the index scan wins not by
> reading fewer pages but by avoiding the need to examine some of the
> tuples on each page.  I might be all wet, though.

In my example the seq-scan evaulates 50K tuples and the heap-scan 40K.
The question is why does the "per-tuple" evaluation become that much more
expensive (x7.5)[1] on the seq-scan than on the index-scan, when the
complete dataset indeed is in memory?

> If your whole database fits in RAM, you could try changing your
> seq_page_cost and random_page_cost variables from the default values
> of 1 and 4 to something around 0.05, or maybe even 0.01, and see
> whether that helps.

This is about planning the query. We're talking actual runtimes here.

[1] 50K tuples in 1.800ms vs. 40K tuples in 200ms

--
Jesper


On Tue, Oct 27, 2009 at 11:08 AM,  <jesper@krogh.cc> wrote:
> In my example the seq-scan evaulates 50K tuples and the heap-scan 40K.
> The question is why does the "per-tuple" evaluation become that much more
> expensive (x7.5)[1] on the seq-scan than on the index-scan, when the
> complete dataset indeed is in memory?

[ ... thinks a little more ... ]

The bitmap index scan returns a TID bitmap.  From a quick look at
nodeBitmapHeapScan.c, it appears that the recheck cond only gets
evaluated for those portions of the TID bitmap that are lossy.  So I'm
guessing what may be happening here is that although the bitmap heap
scan is returning 40K rows, it's doing very few (possibly no) qual
evaluations, and mostly just checking tuple visibility.

>> If your whole database fits in RAM, you could try changing your
>> seq_page_cost and random_page_cost variables from the default values
>> of 1 and 4 to something around 0.05, or maybe even 0.01, and see
>> whether that helps.
>
> This is about planning the query. We're talking actual runtimes here.

Sorry, I assumed you were trying to get the planner to pick the faster
plan.  If not, never mind.

...Robert

Jesper Krogh <jesper@krogh.cc> writes:
> I'm currently trying to figure out why the tsearch performance seems to
> vary a lot between different queryplans. I have created a sample dataset
> that sort of resembles the data I have to work on.

> The script that builds the dataset is at:
> http://krogh.cc/~jesper/build-test.pl
> and http://krogh.cc/~jesper/words.txt is needed for it to run.

I got around to looking at this example finally, and I can reproduce
your results pretty closely.  I think there are two things going on:

1. The cost estimates for to_tsquery and ts_match_vq don't reflect the
actually-rather-high costs of those functions.  Since the seqscan plan
executes these functions many more times than the indexscan plan, that
results in a relative cost error.  There's already been some discussion
of changing the default costs for the tsearch functions, but nothing's
been done yet.  However, that seems to be a relatively small problem
compared to...

2. The planner is estimating that most of the GIN index has to be
examined --- specifically, it estimates (pretty accurately) that
40188 out of 50000 table rows will match, and the default assumption
is that that means 40188/50000 of the index blocks will have to be
read.  On my machine the index amounts to 39076 blocks, so we
estimate 31407 index blocks have to be read, and that's why the cost
estimate for the indexscan is so huge.  The *actual* number of index
blocks read for the query, according to the stats collector, is 66.

So it appears that genericcostestimate() is just completely
inappropriate for GIN indexes, at least when considering common terms.
I guess that's not so astonishing when you remember that GIN isn't built
around per-heap-tuple entries the way the other index types are.
Oleg, Teodor, can you suggest any better cost metric to use for GIN?

            regards, tom lane