Thread: Functional Indexes

Functional Indexes

From
"David Olbersen"
Date:
Hello all,

I have a function (urlhost) which finds the 'host' portion of a URL. In the case of
http://www.foobar.com/really/long/path/to/a/fileit returns "www.foobar.com". 

I also have a function (urltld) which returns the TLD of a URL. In the case of
http://www.foobar.com/really/long/path/to/a/fileit returns ".com" (the leading dot is OK). 

urltld uses urlhost to do it's job (how should be apparent).

Now the question: is there a single index I can create that will be used when my  WHERE clause contains either urlhost
orurltld? I could create two functional indexes, but that seems a bit silly to me. 

Any thoughts?

--------------------------
David Olbersen
iGuard Engineer
St. Bernard Software
11415 West Bernardo Court
San Diego, CA 92127
1-858-676-2277 x2152


Re: Functional Indexes

From
Frank Bax
Date:
The two functions do not group data the same way, so a common index is not 
possible.
urlhost would put pgsql.org and pgsql.com close together.
urltld would but pgsql.com and xyz.com close together.

Frank


At 01:36 PM 7/15/03, David Olbersen wrote:

>Hello all,
>
>I have a function (urlhost) which finds the 'host' portion of a URL. In 
>the case of http://www.foobar.com/really/long/path/to/a/file it returns 
>"www.foobar.com".
>
>I also have a function (urltld) which returns the TLD of a URL. In the 
>case of http://www.foobar.com/really/long/path/to/a/file it returns ".com" 
>(the leading dot is OK).
>
>urltld uses urlhost to do it's job (how should be apparent).
>
>Now the question: is there a single index I can create that will be used 
>when my  WHERE clause contains either urlhost or urltld? I could create 
>two functional indexes, but that seems a bit silly to me.
>
>Any thoughts?
>
>--------------------------
>David Olbersen
>iGuard Engineer
>St. Bernard Software
>11415 West Bernardo Court
>San Diego, CA 92127
>1-858-676-2277 x2152
>
>---------------------------(end of broadcast)---------------------------
>TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org



Re: Functional Indexes

From
Tom Lane
Date:
"David Olbersen" <DOlbersen@stbernard.com> writes:
> I have a function (urlhost) which finds the 'host' portion of a URL. In the case of
http://www.foobar.com/really/long/path/to/a/fileit returns "www.foobar.com".
 
> I also have a function (urltld) which returns the TLD of a URL. In the case of
http://www.foobar.com/really/long/path/to/a/fileit returns ".com" (the leading dot is OK).
 
> urltld uses urlhost to do it's job (how should be apparent).

> Now the question: is there a single index I can create that will be
> used when my  WHERE clause contains either urlhost or urltld?

I do not see any way with functions declared like that.  Quite aside
from implementation limitations, the portion of the 'host' string that
urltld is interested in would be the low-order part of the indexed
strings, and you can't usefully use an index to search for low-order
digits of the key.

Could you instead define an index over the reversed host name (eg,
com.foobar.www)?  This would seem to provide about the same
functionality for searches on urlhost, and you could exploit the index
for TLD searching via prefixes.  For example:

regression=# create table t1 (f1 text);
CREATE TABLE
regression=# create index t1i on t1 (lower(f1));
CREATE INDEX
regression=# explain select * from t1 where lower(f1) like 'com.%';                                QUERY PLAN
----------------------------------------------------------------------------Index Scan using t1i on t1
(cost=0.00..17.08rows=5 width=32)  Index Cond: ((lower(f1) >= 'com.'::text) AND (lower(f1) < 'com/'::text))  Filter:
(lower(f1)~~ 'com.%'::text)
 
(3 rows)

        regards, tom lane


Re: Functional Indexes

From
Joe Conway
Date:
David Olbersen wrote:
> Now the question: is there a single index I can create that will be
> used when my  WHERE clause contains either urlhost or urltld? I could
> create two functional indexes, but that seems a bit silly to me.
> 

I can't think of how to do only one index in 7.3.x and earlier, but 
FWIW, this works in 7.4devel (which should be in beta next Monday):

create or replace function tld(text) returns text as '
select split_part(split_part(substr($1,8,length($1)),''/'',1),''.'',3)
' language 'sql' STRICT IMMUTABLE;

regression=# select tld('http://www.foobar.com/really/long/path/to/a/file'); tld
----- com
(1 row)

create or replace function sld(text) returns text as '
select split_part(split_part(substr($1,8,length($1)),''/'',1),''.'',2)
' language 'sql' STRICT IMMUTABLE;

regression=# select sld('http://www.foobar.com/really/long/path/to/a/file');  sld
-------- foobar
(1 row)

create table urls(f1 text);
insert into urls values('http://www.foobar.com/really/long/path/to/a/file');
create index urls_idx1 on urls(tld(f1),sld(f1));

-- just to see index usage on toy table
set enable_seqscan to off;

regression=# explain analyze select * from urls where tld(f1) = 'com';
QUERY PLAN
---------------------------------------------------------------------- Index Scan using urls_idx1 on urls
(cost=0.00..4.69rows=1 width=32) 
 
(actual time=0.07..0.07 rows=1 loops=1)   Index Cond: (split_part(split_part(substr(f1, 8, length(f1)), 
'/'::text, 1), '.'::text, 3) = 'com'::text) Total runtime: 0.18 msec
(3 rows)

regression=# explain analyze select * from urls where tld(f1) = 'com' 
and sld(f1) = 'foobar';                              QUERY PLAN
--------------------------------------------------------------------- Index Scan using urls_idx1 on urls
(cost=0.00..4.70rows=1 width=32) 
 
(actual time=0.08..0.09 rows=1 loops=1)   Index Cond: ((split_part(split_part(substr(f1, 8, length(f1)), 
'/'::text, 1), '.'::text, 3) = 'com'::text) AND 
(split_part(split_part(substr(f1, 8, length(f1)), '/'::text, 1), 
'.'::text, 2) = 'foobar'::text)) Total runtime: 0.21 msec
(3 rows)


Joe