Re: Fixing GIN for empty/null/full-scan cases - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Fixing GIN for empty/null/full-scan cases
Date
Msg-id 3234.1295387895@sss.pgh.pa.us
Whole thread Raw
In response to Re: Fixing GIN for empty/null/full-scan cases  (David E. Wheeler <david@kineticode.com>)
Responses Re: Fixing GIN for empty/null/full-scan cases  ("David E. Wheeler" <david@kineticode.com>)
List pgsql-hackers
"David E. Wheeler" <david@kineticode.com> writes:
> These numbers are a bit crazy-making, but the upshot is that Gist is
> slow out of the gate, but with data cached, it's pretty speedy. With
> indexscan and bitmapscan disabled, these queries all took 300-400
> ms. So GIN was never better performing than a table scan.

I could not replicate that here at all --- GIN indexscans were
consistently better than seqscans for me, eg

regression=# set enable_bitmapscan TO 1;
SET
Time: 0.673 ms
regression=# explain analyze SELECT count(*) FROM listings     WHERE features @@ '(1368799&1368800&1369043)'::query_int
     AND deleted_at IS NULL AND status = 1;                                                             QUERY PLAN
                                   
 

--------------------------------------------------------------------------------------------------------------------------------------Aggregate
(cost=1159.20..1159.21 rows=1 width=0) (actual time=23.964..23.964 rows=1 loops=1)  ->  Bitmap Heap Scan on listings
(cost=31.15..1158.18rows=406 width=0) (actual time=23.014..23.876 rows=772 loops=1)        Recheck Cond: ((features @@
'1368799& 1368800 & 1369043'::query_int) AND (deleted_at IS NULL) AND (status = 1))        ->  Bitmap Index Scan on
idx_gin_features (cost=0.00..31.05 rows=406 width=0) (actual time=22.913..22.913 rows=772 loops=1)              Index
Cond:(features @@ '1368799 & 1368800 & 1369043'::query_int)Total runtime: 24.040 ms
 
(6 rows)

Time: 24.968 ms
regression=# set enable_bitmapscan TO 0;
SET
Time: 0.458 ms
regression=# explain analyze SELECT count(*) FROM listings     WHERE features @@ '(1368799&1368800&1369043)'::query_int
     AND deleted_at IS NULL AND status = 1;                                                    QUERY PLAN
                                   
 

--------------------------------------------------------------------------------------------------------------------Aggregate
(cost=9158.24..9158.25 rows=1 width=0) (actual time=145.121..145.121 rows=1 loops=1)  ->  Seq Scan on listings
(cost=0.00..9157.22rows=406 width=0) (actual time=0.025..144.982 rows=772 loops=1)        Filter: ((deleted_at IS NULL)
AND(features @@ '1368799 & 1368800 & 1369043'::query_int) AND (status = 1))Total runtime: 145.177 ms
 
(4 rows)

Time: 146.228 ms

I'm noticing also that I get different rowcounts than you do, although
possibly that has something to do with the partial-index conditions,
which I'm not trying to duplicate here (all rows in my table pass those
two tests).

> * Why does it take 3-4x longer to create the GIN than the GiST index
> on tsvector?

Perhaps more maintenance_work_mem would help with that; although the
fine manual says specifically that GIN text search indexes take about
three times longer to build than equivalent GiST indexes, so maybe that
behavior is as designed.
        regards, tom lane


pgsql-hackers by date:

Previous
From: "A.M."
Date:
Subject: Re: test_fsync label adjustments
Next
From: Tom Lane
Date:
Subject: Re: Re: patch: fix performance problems with repated decomprimation of varlena values in plpgsql