Thread: The cost of visibillity testing? (gin-search)

The cost of visibillity testing? (gin-search)

From
Jesper Krogh
Date:
Hi Hackers.

I have a feeling that GIN is "cheating" on the visibillity checks:

test=# set enable_seqscan = off;
SET
Time: 0.129 ms
test=# select count(id) from fts_test where fts @@ to_tsquery('core'); count
-------- 158827
(1 row)

Time: 95.530 ms
test=# explain select count(id) from fts_test where fts @@ 
to_tsquery('core');                                          QUERY PLAN
---------------------------------------------------------------------------------------------- Aggregate
(cost=211571.52..211571.53rows=1 width=4)   ->  Bitmap Heap Scan on fts_test  (cost=134925.95..211174.01 
 
rows=159004 width=4)         Recheck Cond: (fts @@ to_tsquery('core'::text))         ->  Bitmap Index Scan on fts_idx
(cost=0.00..134886.20
 
rows=159004 width=0)               Index Cond: (fts @@ to_tsquery('core'::text))
(5 rows)

Time: 0.609 ms

test=# select count(id) from fts_test; count
-------- 168556
(1 row)

Time: 164.655 ms

test=# explain select count(id) from fts_test;                                           QUERY PLAN
------------------------------------------------------------------------------------------------ Aggregate
(cost=10000075969.95..10000075969.96rows=1 width=4)   ->  Seq Scan on fts_test  (cost=10000000000.00..10000075548.56 
 
rows=168556 width=4)
(2 rows)

Time: 0.338 ms

This is run multiple times for both queries and the seqscan of the table
is consistently about 1.8 times more expensive than the fts-scan.
This is all on a fully memory cached dataset.

The first query should have the cost of the GIN-search + 
visibillity-test of 158K tuples,
the latter should have the cost of visibillity-testing 168K tuples. If 
we set the cost
of actually searching GIN to 0 then the gin-search - visibillity costs: 
95/158000 0.000373ms/tuple
where the seq-scan case costs close to 0.001ms/tuple (close to 3 times 
as much).

So I have a strong feeling that GIN is cheating on the visibillity tests
otherwise I have problems imagining how it ever can become faster to execute
than the seq_scan of the table.

Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for 
visibillity-testing?

What have I missed in the logic?

Thanks.

-- 
Jesper



Re: The cost of visibillity testing? (gin-search)

From
Heikki Linnakangas
Date:
On 21.12.2010 21:25, Jesper Krogh wrote:
> The first query should have the cost of the GIN-search +
> visibillity-test of 158K tuples,
> the latter should have the cost of visibillity-testing 168K tuples. If
> we set the cost
> of actually searching GIN to 0 then the gin-search - visibillity costs:
> 95/158000 0.000373ms/tuple
> where the seq-scan case costs close to 0.001ms/tuple (close to 3 times
> as much).
>
> So I have a strong feeling that GIN is cheating on the visibillity tests
> otherwise I have problems imagining how it ever can become faster to
> execute
> than the seq_scan of the table.
>
> Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
> visibillity-testing?

It certainly shouldn't be.

> What have I missed in the logic?

Perhaps you have a lot of empty space or dead tuples that don't match 
the query in the table, which the sequential scan has to grovel through, 
but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: The cost of visibillity testing? (gin-search)

From
Andres Freund
Date:
On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote:
> What have I missed in the logic?
A reproducible testcase ;-)

Andres


Re: The cost of visibillity testing? (gin-search)

From
Tom Lane
Date:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> On 21.12.2010 21:25, Jesper Krogh wrote:
>> Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for
>> visibillity-testing?

> It certainly shouldn't be.

>> What have I missed in the logic?

> Perhaps you have a lot of empty space or dead tuples that don't match 
> the query in the table, which the sequential scan has to grovel through, 
> but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say?

Another possibility is that the seqscan is slowed by trying to operate
in a limited number of buffers (the buffer strategy stuff).
        regards, tom lane


Re: The cost of visibillity testing? (gin-search)

From
Jesper Krogh
Date:
On 2010-12-21 21:28, Andres Freund wrote:
> On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote:
>    
>> What have I missed in the logic?
>>      
> A reproducible testcase ;-)
>    
Yes, I did a  complete dump/restore of the dataset and the numbers
looked like expected. So table bloat seems to be the problem/challenge.

I must have hit a strange sitauation where my table-bloat proportionally
was significantly higher than my gin-index-bloat.

Jesper

-- 
Jesper