Thread: GIST versus GIN indexes for intarrays
Hi Guys,
I'm a bit confused when the proper way to use GIST versus GIN indexes with integer arrays.
The documentation states:
The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere. As a rule of thumb, a GIN index is faster to search than a GiST index, but slower to build or update; so GIN is better suited for static data and GiST for often-updated data.
Since 100% of my queries are for retrieval, I should use GIN but it never appears to be used unlike how GIST indexes are:
gearbuyer_ig=# select version();
version
----------------------------------------------------------------------------------------------------
PostgreSQL 8.3.6 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 20070925 (Red Hat 4.1.2-33)
(1 row)
With just a GIN index I get this plan (no use of GIN):
gearbuyer_ig=# explain select count(*) from items where items.fast_colors @> ARRAY[0];
QUERY PLAN
-----------------------------------------------------------------
Aggregate (cost=21194.27..21194.28 rows=1 width=0)
-> Seq Scan on items (cost=0.00..21193.64 rows=251 width=0)
Filter: (fast_colors @> '{0}'::integer[])
(3 rows)
With a GIST index created like:
gearbuyer_ig=# CREATE INDEX items_fast_colors_rdtree2_idx ON items USING gist (fast_colors gist__int_ops);
gearbuyer_ig=# explain select count(*) from items where items.fast_colors @> ARRAY[0];
QUERY PLAN
-----------------------------------------------------------------------------------------------------
Aggregate (cost=929.81..929.82 rows=1 width=0)
-> Bitmap Heap Scan on items (cost=14.30..929.18 rows=251 width=0)
Recheck Cond: (fast_colors @> '{0}'::integer[])
-> Bitmap Index Scan on items_fast_colors_rdtree2_idx (cost=0.00..14.24 rows=251 width=0)
Index Cond: (fast_colors @> '{0}'::integer[])
(5 rows)
Any insight is greatly appreciated. Could this be a regression from 8.3.5 and 8.3.6?
Thanks,
Rusty
--
Rusty Conover
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
Rusty Conover <rconover@infogears.com> writes: > Since 100% of my queries are for retrieval, I should use GIN but it > never appears to be used unlike how GIST indexes are: You haven't shown us either the table or the index declaration, so it's a bit tough to comment on that. It's worth noting though that your GIST example appears to rely on a nonstandard operator class. regards, tom lane
On Feb 12, 2009, at 1:54 PM, Tom Lane wrote: > Rusty Conover <rconover@infogears.com> writes: >> Since 100% of my queries are for retrieval, I should use GIN but it >> never appears to be used unlike how GIST indexes are: > > You haven't shown us either the table or the index declaration, > so it's a bit tough to comment on that. It's worth noting though > that your GIST example appears to rely on a nonstandard operator > class. > > regards, tom lane > Hi Tom, My apologies, below is the table definition, and the GIN index creation. The gist__int_ops is the default operator class for integer[] arrays, as shown at: http://www.postgresql.org/docs/current/static/intarray.html gearbuyer_ig=# \d items Table "public.items" Column | Type | Modifiers -------------------------+----------- +--------------------------------------------------- item_id | integer | not null default nextval('generic_seq'::regclass) gb_product_url | text | not null group_id | integer | category_id | integer | product_name | text | not null gender | text | not null description_extract | text | not null sort_price | real | not null price_range | text | not null brand_id | integer | not null xapian_doc_id | integer | average_rating | uint1 | reviews_count | smallint | store_count | uint1 | default_image_id | integer | available_sizes | integer[] | fast_colors | integer[] | has_coupons | boolean | not null default false age_low | uint1 | sale_percentage_low | uint1 | store_count_low | uint1 | price_range_low | smallint | offering_stores | integer[] | subclassification_ids | integer[] | popularity_rank | integer | default_similarity_type | uint1 | default_similarity_id | integer | gc_lookup_id | integer | The GIN index was created via: CREATE INDEX items_fast_colors_rdtree_idx ON items USING gin (fast_colors); Cheers, Rusty -- Rusty Conover rconover@infogears.com InfoGears Inc / GearBuyer.com / FootwearBuyer.com http://www.infogears.com http://www.gearbuyer.com http://www.footwearbuyer.com
Rusty Conover <rconover@infogears.com> writes: > The gist__int_ops is the default operator class for integer[] arrays, > as shown at: > http://www.postgresql.org/docs/current/static/intarray.html Ah, so you have contrib/intarray installed. [ pokes at it... ] Seems like what we have here is another iteration of this ancient bug: http://archives.postgresql.org/pgsql-committers/2004-01/msg00073.php to wit, contrib/intarray is defining its own @> and <@ operators that conflict with those since added to the core. In the case Rusty is showing, the @> gets resolved as intarray's @> (because that's an exact match, where the core provides anyarray @> anyarray) and then this operator is NOT a member of the core-provided GIN opclass for integer arrays. The short-term workaround for Rusty is probably to create his GIN index using the intarray-provided gin__int_ops opclass. But it seems to me that we ought to get rid of intarray's @> and <@ operators and have the module depend on the core anyarray operators, just as we have already done for = and <>. Comments? regards, tom lane
On Feb 12, 2009, at 2:29 PM, Tom Lane wrote:
Rusty Conover <rconover@infogears.com> writes:The gist__int_ops is the default operator class for integer[] arrays,as shown at:http://www.postgresql.org/docs/current/static/intarray.html
Ah, so you have contrib/intarray installed.
[ pokes at it... ] Seems like what we have here is another iteration
of this ancient bug:
http://archives.postgresql.org/pgsql-committers/2004-01/msg00073.php
to wit, contrib/intarray is defining its own @> and <@ operators that
conflict with those since added to the core. In the case Rusty is
showing, the @> gets resolved as intarray's @> (because that's an
exact match, where the core provides anyarray @> anyarray) and then
this operator is NOT a member of the core-provided GIN opclass for
integer arrays.
The short-term workaround for Rusty is probably to create his GIN
index using the intarray-provided gin__int_ops opclass. But it
seems to me that we ought to get rid of intarray's @> and <@ operators
and have the module depend on the core anyarray operators, just as we
have already done for = and <>. Comments?
Hi Tom,
For the record using the GIN opclass does resolve the problem for me. The indexes are now seeing usage.
Thanks for the help,
Rusty
--
Rusty Conover
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
On Fri, Feb 13, 2009 at 04:12:53PM +0300, Teodor Sigaev wrote: >> The short-term workaround for Rusty is probably to create his GIN >> index using the intarray-provided gin__int_ops opclass. But it > Right >> seems to me that we ought to get rid of intarray's @> and <@ operators >> and have the module depend on the core anyarray operators, just as we >> have already done for = and <>. Comments? > Agree, will do. Although built-in anyarray operators have ~N^2 behaviour > while intarray's version - only N*log(N) Is there a way to have the buily-in anyarray opeators be N*log(N)? Ken
Teodor Sigaev <teodor@sigaev.ru> writes: >> seems to me that we ought to get rid of intarray's @> and <@ operators >> and have the module depend on the core anyarray operators, just as we >> have already done for = and <>. Comments? > Agree, will do. Although built-in anyarray operators have ~N^2 behaviour while > intarray's version - only N*log(N) Really? isort() looks like a bubble sort to me. But in any case, a pre-sort is probably actually *slower* for small numbers of array elements. I wonder where the crossover is. In principle we could make the core implementation do a sort when working with a sortable datatype, but I'm unsure it's worth the trouble. regards, tom lane