Thread: The cost of visibillity testing? (gin-search)
Hi Hackers. I have a feeling that GIN is "cheating" on the visibillity checks: test=# set enable_seqscan = off; SET Time: 0.129 ms test=# select count(id) from fts_test where fts @@ to_tsquery('core'); count -------- 158827 (1 row) Time: 95.530 ms test=# explain select count(id) from fts_test where fts @@ to_tsquery('core'); QUERY PLAN ---------------------------------------------------------------------------------------------- Aggregate (cost=211571.52..211571.53rows=1 width=4) -> Bitmap Heap Scan on fts_test (cost=134925.95..211174.01 rows=159004 width=4) Recheck Cond: (fts @@ to_tsquery('core'::text)) -> Bitmap Index Scan on fts_idx (cost=0.00..134886.20 rows=159004 width=0) Index Cond: (fts @@ to_tsquery('core'::text)) (5 rows) Time: 0.609 ms test=# select count(id) from fts_test; count -------- 168556 (1 row) Time: 164.655 ms test=# explain select count(id) from fts_test; QUERY PLAN ------------------------------------------------------------------------------------------------ Aggregate (cost=10000075969.95..10000075969.96rows=1 width=4) -> Seq Scan on fts_test (cost=10000000000.00..10000075548.56 rows=168556 width=4) (2 rows) Time: 0.338 ms This is run multiple times for both queries and the seqscan of the table is consistently about 1.8 times more expensive than the fts-scan. This is all on a fully memory cached dataset. The first query should have the cost of the GIN-search + visibillity-test of 158K tuples, the latter should have the cost of visibillity-testing 168K tuples. If we set the cost of actually searching GIN to 0 then the gin-search - visibillity costs: 95/158000 0.000373ms/tuple where the seq-scan case costs close to 0.001ms/tuple (close to 3 times as much). So I have a strong feeling that GIN is cheating on the visibillity tests otherwise I have problems imagining how it ever can become faster to execute than the seq_scan of the table. Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for visibillity-testing? What have I missed in the logic? Thanks. -- Jesper
On 21.12.2010 21:25, Jesper Krogh wrote: > The first query should have the cost of the GIN-search + > visibillity-test of 158K tuples, > the latter should have the cost of visibillity-testing 168K tuples. If > we set the cost > of actually searching GIN to 0 then the gin-search - visibillity costs: > 95/158000 0.000373ms/tuple > where the seq-scan case costs close to 0.001ms/tuple (close to 3 times > as much). > > So I have a strong feeling that GIN is cheating on the visibillity tests > otherwise I have problems imagining how it ever can become faster to > execute > than the seq_scan of the table. > > Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for > visibillity-testing? It certainly shouldn't be. > What have I missed in the logic? Perhaps you have a lot of empty space or dead tuples that don't match the query in the table, which the sequential scan has to grovel through, but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote: > What have I missed in the logic? A reproducible testcase ;-) Andres
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes: > On 21.12.2010 21:25, Jesper Krogh wrote: >> Or is a Bitmap Heap Scan simply 3 times faster than a Seq-scan for >> visibillity-testing? > It certainly shouldn't be. >> What have I missed in the logic? > Perhaps you have a lot of empty space or dead tuples that don't match > the query in the table, which the sequential scan has to grovel through, > but the bitmap scan skips? What does EXPLAIN ANALYZE of both queries say? Another possibility is that the seqscan is slowed by trying to operate in a limited number of buffers (the buffer strategy stuff). regards, tom lane
On 2010-12-21 21:28, Andres Freund wrote: > On Tuesday 21 December 2010 20:25:16 Jesper Krogh wrote: > >> What have I missed in the logic? >> > A reproducible testcase ;-) > Yes, I did a complete dump/restore of the dataset and the numbers looked like expected. So table bloat seems to be the problem/challenge. I must have hit a strange sitauation where my table-bloat proportionally was significantly higher than my gin-index-bloat. Jesper -- Jesper