Thread: Query doesn't use index on hstore column
Hi,
Apologies if this is the wrong list for this time of query (first time posting).
I'm currently experimenting with hstore on Posgtres 9.4rc1. I've created a table with an hstore column, with and index on that column (tried both gin and btree indexes) and the explain plan says that the index is never used for the lookup and falls to a sequential scan every time (table has 1 000 000 rows). The query plans and execution time for btree index, gin index and unindexed are the same. Is there something I'm doing wrong or missing in order to get indexes to work on hstore columns?
Details:
0) Postgres version:
barkerm=# select version();
version
---------------------------------------------------------------------------------------------------------------
PostgreSQL 9.4rc1 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7), 64-bit
(1 row)
1) Created table with hstore column and btree index.
barkerm=# \d audit
Table "public.audit"
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------
id | integer | not null default nextval('audit_id_seq'::regclass)
principal_id | integer |
created_at | timestamp without time zone |
root | character varying(255) |
template_code | character(3) |
attributes | hstore |
args | character varying(255)[] |
Indexes:
"audit_pkey" PRIMARY KEY, btree (id)
"audit_attributes_idx" btree (attributes)
2) Insert 1 000 000 rows
barkerm=# select count(*) from audit;
count
---------
1000000
(1 row)
3) Run analyse.
4) Pick a row somewhere in the middle:
barkerm=# select id, attributes from audit where id = 500000;
id | attributes
--------+---------------------------------------------------------
500000 | "accountId"=>"1879355460", "instrumentId"=>"1625557725"
(1 row)
5) Explain query using the attributes column in the where clause (uses Seq Scan).
barkerm=# explain analyse select * from audit where attributes->'accountId' = '1879355460';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on audit (cost=0.00..35409.00 rows=5000 width=133) (actual time=114.314..218.821 rows=1 loops=1)
Filter: ((attributes -> 'accountId'::text) = '1879355460'::text)
Rows Removed by Filter: 999999
Planning time: 0.074 ms
Execution time: 218.843 ms
(5 rows)
6) Rebuild the data using a gin index.
barkerm=# \d audit
Table "public.audit"
Column | Type | Modifiers
---------------+-----------------------------+----------------------------------------------------
id | integer | not null default nextval('audit_id_seq'::regclass)
principal_id | integer |
created_at | timestamp without time zone |
root | character varying(255) |
template_code | character(3) |
attributes | hstore |
args | character varying(255)[] |
Indexes:
"audit_pkey" PRIMARY KEY, btree (id)
"audit_attributes_idx" gin (attributes)
7) Again explain the selection of a single row using a constraint that references the hstore column. Seq Scan is still used.
barkerm=# explain analyse select * from audit where attributes->'accountId' = '1238334838';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on audit (cost=0.00..35409.00 rows=5000 width=133) (actual time=122.173..226.363 rows=1 loops=1)
Filter: ((attributes -> 'accountId'::text) = '1238334838'::text)
Rows Removed by Filter: 999999
Planning time: 0.164 ms
Execution time: 226.392 ms
(5 rows)
8) Drop index an query as a baseline.
barkerm=# explain analyse select * from audit where attributes->'accountId' = '1238334838';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on audit (cost=0.00..35409.00 rows=5000 width=133) (actual time=109.115..212.666 rows=1 loops=1)
Filter: ((attributes -> 'accountId'::text) = '1238334838'::text)
Rows Removed by Filter: 999999
Planning time: 0.113 ms
Execution time: 212.701 ms
(5 rows)
Regards,
Michael Barker.
On Fri, Dec 05, 2014 at 09:42:20AM +1300, Michael Barker wrote: > 1) Created table with hstore column and btree index. > > barkerm=# \d audit > Table "public.audit" > Column | Type | > Modifiers > ---------------+-----------------------------+---------------------------------------------------- > id | integer | not null default > nextval('audit_id_seq'::regclass) > principal_id | integer | > created_at | timestamp without time zone | > root | character varying(255) | > template_code | character(3) | > attributes | hstore | > args | character varying(255)[] | > Indexes: > "audit_pkey" PRIMARY KEY, btree (id) > "audit_attributes_idx" btree (attributes) > > ... > 5) Explain query using the attributes column in the where clause (uses Seq > Scan). > > barkerm=# explain analyse select * from audit where attributes->'accountId' > = '1879355460'; > QUERY PLAN > > ------------------------------------------------------------------------------------------------------------ > Seq Scan on audit (cost=0.00..35409.00 rows=5000 width=133) (actual > time=114.314..218.821 rows=1 loops=1) > Filter: ((attributes -> 'accountId'::text) = '1879355460'::text) > Rows Removed by Filter: 999999 > Planning time: 0.074 ms > Execution time: 218.843 ms > (5 rows) > Hi Michael, I think your index definitions need to be on the particular attribute from attributes and not attributes itself. That works but it does not apply to the query you show above. I think that the binary json type in 9.4 will do what you want. I have not worked with it myself, just looked at the docs. Regards, Ken
Michael Barker <mikeb01@gmail.com> writes: > I'm currently experimenting with hstore on Posgtres 9.4rc1. I've created a > table with an hstore column, with and index on that column (tried both gin > and btree indexes) and the explain plan says that the index is never used > for the lookup and falls to a sequential scan every time (table has 1 000 > 000 rows). The query plans and execution time for btree index, gin index > and unindexed are the same. Is there something I'm doing wrong or missing > in order to get indexes to work on hstore columns? Well, first off, a btree index is fairly useless for this query, because btree has no concept that the hstore has any sub-structure. A GIN index or GIST index could work though. Secondly, you have to remember that indexable WHERE conditions in Postgres are *always* of the form "WHERE indexed_column indexable_operator some_comparison_value". So the trick is to recast the condition you have into something that looks like that. Instead of WHERE attributes->'accountId' = '1879355460' you could do WHERE attributes @> 'accountId=>1879355460' (@> being the hstore containment operator, ie "does attributes contain a pair that looks like this?") or equivalently but possibly easier to generate, WHERE attributes @> hstore('accountId', '1879355460') Another possibility if you're only concerned about indexing searches for one or a few specific keys is to use expression indexes: CREATE INDEX ON audit ((attributes->'accountId')); whereupon your original query works, since the left-hand side of the '=' operator is now the indexed expression. (Here, since you are testing plain equality on the indexed value, a btree works fine.) You might care to read http://www.postgresql.org/docs/9.4/static/indexes.html to get a better handle on what Postgres indexes can and can't do. regards, tom lane
Well, first off, a btree index is fairly useless for this query,
because btree has no concept that the hstore has any sub-structure.
A GIN index or GIST index could work though. Secondly, you have to
remember that indexable WHERE conditions in Postgres are *always* of
the form "WHERE indexed_column indexable_operator some_comparison_value".
And the student was enlightened....
Cheers, seeing sensible explain plans now.
You might care to read
http://www.postgresql.org/docs/9.4/static/indexes.html
to get a better handle on what Postgres indexes can and can't do.
Will do, thanks again.
Mike.