Thread: Query with rightmost function does not use index
SELECT ... FROM ... WHERE col = MD5('')
doesn't seem to use an index but
SELECT ... FROM ... WHERE col = 'd41d8cd98f00b204e9800998ecf8427e'
does.
Is this a gotcha?
I'm using PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit
Hello use a functional index http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html Regards Pavel Stehule 2011/8/8 - - <loh.law@hotmail.com>: > For a table where column col has an index, the query: > SELECT ... FROM ... WHERE col = MD5('') > doesn't seem to use an index but > SELECT ... FROM ... WHERE col = 'd41d8cd98f00b204e9800998ecf8427e' > does. > > Is this a gotcha? > I'm using PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC > gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit > > >
> Hello
> use a functional index
>
> http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html
>
> Regards
>
> Pavel Stehule
>
> 2011/8/8 - - <loh.law@hotmail.com>:
> > For a table where column col has an index, the query:
> > SELECT ... FROM ... WHERE col = MD5('')
> > doesn't seem to use an index but
> > SELECT ... FROM ... WHERE col = 'd41d8cd98f00b204e9800998ecf8427e'
> > does.
> >
> > Is this a gotcha?
> > I'm using PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC
> > gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit
> >
> >
> >
Here may be other problem. Show us the result of EXPLAIN Analyze. 2011/8/8, Pavel Stehule <pavel.stehule@gmail.com>: > Hello > > use a functional index > > http://www.postgresql.org/docs/8.4/interactive/indexes-expressional.html > > Regards > > Pavel Stehule > > 2011/8/8 - - <loh.law@hotmail.com>: >> For a table where column col has an index, the query: >> SELECT ... FROM ... WHERE col = MD5('') >> doesn't seem to use an index but >> SELECT ... FROM ... WHERE col = 'd41d8cd98f00b204e9800998ecf8427e' >> does. >> >> Is this a gotcha? >> I'm using PostgreSQL 8.4.8 on i686-pc-linux-gnu, compiled by GCC >> gcc-4.5.real (Ubuntu/Linaro 4.5.2-8ubuntu4) 4.5.2, 32-bit >> >> >> > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- ------------ pasman
explain analyze select * from filter_item where filter_hash = MD5('');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on filter_item (cost=0.00..424644.96 rows=86108 width=49) (ac tual time=8177.807..12421.921 rows=77 loops=1)
Filter: ((filter_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e'::text)
Total runtime: 12421.959 ms
(3 rows)
explain analyze select * from filter_item where filter_hash = 'd41d8cd98f00b204e9800998ecf8427e'
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on filter_item (cost=77.92..6609.02 rows=3534 width=49) (actual time=0.055..0.100 rows=77 loops=1)
Recheck Cond: (filter_hash = 'd41d8cd98f00b204e9800998ecf8427e'::bpchar)
-> Bitmap Index Scan on filter_item__filter_hash (cost=0.00..77.04 rows=3534 width=0) (actual time=0.049..0.049 rows=77 loops=1)
Index Cond: (filter_hash = 'd41d8cd98f00b204e9800998ecf8427e'::bpchar)
Total runtime: 0.130 ms
(5 rows)
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of - -
Sent: Monday, August 08, 2011 3:40 AM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Query with rightmost function does not use index
Here are the EXPLAIN ANALYZE outputs:
explain analyze select * from filter_item where filter_hash = MD5('');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on filter_item (cost=0.00..424644.96 rows=86108 width=49) (ac tual time=8177.807..12421.921 rows=77 loops=1)
Filter: ((filter_hash)::text = 'd41d8cd98f00b204e9800998ecf8427e'::text)
Total runtime: 12421.959 ms
(3 rows)
explain analyze select * from filter_item where filter_hash = 'd41d8cd98f00b204e9800998ecf8427e'
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on filter_item (cost=77.92..6609.02 rows=3534 width=49) (actual time=0.055..0.100 rows=77 loops=1)
Recheck Cond: (filter_hash = 'd41d8cd98f00b204e9800998ecf8427e'::bpchar)
-> Bitmap Index Scan on filter_item__filter_hash (cost=0.00..77.04 rows=3534 width=0) (actual time=0.049..0.049 rows=77 loops=1)
Index Cond: (filter_hash = 'd41d8cd98f00b204e9800998ecf8427e'::bpchar)
Total runtime: 0.130 ms
(5 rows)
The filter_hash index uses a “character(n)” data type – the ::bpchar. The second query is of unknown type and thus is converted to “character” and then used in the index. The first query use a function that outputs a “text”. Since the output type is known the left-side of the equals is casted to that known type. Since the index is one the “character” version of the filter_hash but the comparison requires a “text” version the index cannot be used. You would need to manually cast the result of the md5 function call to “character” in order to get the index usage; or convert the filter_hash column to text, the latter option probably being preferred.
It is not a bug, in cases of uncertainty the types of the value and the indexed field must be the same, but it could possibly be more user-friendly.
I’ll leave it to other to comment on whether this is different in more recent versions. Text-character are binary compatible and so it is not be unreasonable to assume, like you did, that indexes of one should be usable by the other.
David J.
Hi, On Monday, August 08, 2011 15:40:20 - - wrote: > explain analyze select * from filter_item where filter_hash = MD5(''); > > QUERY PLAN > ---------------------------------------------------------------------------- > ------------------------------------------- Seq Scan on filter_item > (cost=0.00..424644.96 rows=86108 width=49) (actual time=8177.807..12421.921 > rows=77 loops=1) Filter: ((filter_hash)::text = > 'd41d8cd98f00b204e9800998ecf8427e'::text) Total runtime: 12421.959 ms > (3 rows) > > > explain analyze select * from filter_item where filter_hash = > 'd41d8cd98f00b204e9800998ecf8427e' > > QUERY PLAN > ---------------------------------------------------------------------------- > ---------------------------------------------------------- Bitmap Heap Scan > on filter_item (cost=77.92..6609.02 rows=3534 width=49) (actual > time=0.055..0.100 rows=77 loops=1) Recheck Cond: (filter_hash = > 'd41d8cd98f00b204e9800998ecf8427e'::bpchar) -> Bitmap Index Scan on > filter_item__filter_hash (cost=0.00..77.04 rows=3534 width=0) (actual > time=0.049..0.049 rows=77 loops=1) Index Cond: (filter_hash = > 'd41d8cd98f00b204e9800998ecf8427e'::bpchar) Total runtime: 0.130 ms > (5 rows) The problem is that your filter_hash columns seems to be of type char(n). Thats not directly compatible with text (which is the type returned by the md5 function). So either change the column type or cast the return type of md5 to char(n). I do have to admit that this is somewhat strange. Greetings, Andres