Thread: Referencing uninitialized variables in plpgsql

Referencing uninitialized variables in plpgsql

From
"Karl O. Pinc"
Date:
Hi,

I've a plpgsql procedure I'm pretty sure is referencing
variables, array elements really, that have not been
initialized.  Is this a well defined operation?
If so, what is the result? (NULL?) If not, shouldn't I be
getting some sort of error or warning?

I've
SET client_min_messages='debug';
and don't seem to get any messages.  AFICT there is no
run time configuration that would affect this, right?

The documentation seems silent on uninitialized plpgsql
variables.  Be nice if something was written.

I don't care for code that references unititialized variables.
It'd be nice to be able to get a warning even if the result
was well defined, just for those cases where you don't intend
to reference uninitialized variables.

Is this example telling me I get NULL for unitialized references?
I don't believe I should count on this behavior unless it's
documented, should I?

=> create or replace function foo() returns int language plpgsql
as 'declare a int; b int; begin a := b; return a; end; ';
CREATE FUNCTION
=> select foo();
 foo
-----

(1 row)


 PostgreSQL 7.4.6 on i686-redhat-linux-gnu, compiled by GCC gcc
(GCC) 3.2.3 20030502 (Red Hat Linux 3.2.3-42)


Thanks for all the postgresql work.  I don't mean to sound
grumpy, I'm tired.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                 -- Robert A. Heinlein



Re: Referencing uninitialized variables in plpgsql

From
John DeSoi
Date:
On Feb 5, 2005, at 11:20 PM, Karl O. Pinc wrote:

> Is this example telling me I get NULL for unitialized references?
> I don't believe I should count on this behavior unless it's
> documented, should I?
>
> => create or replace function foo() returns int language plpgsql
> as 'declare a int; b int; begin a := b; return a; end; ';
> CREATE FUNCTION
> => select foo();


Yes, exactly. If you don't assign a value to a declared pspgsql
variable, it is NULL. Operations on NULL variables are no different
than operations on NULL values in the database. If you are concerned
about this, then always assign a value when you declare it.

Also, you can specify NOT NULL in your declaration to ensure a runtime
error is generated if the variable is null. See:

http://www.postgresql.org/docs/8.0/interactive/plpgsql-declarations.html

  The general syntax of a variable declaration is:
name [ CONSTANT ] type [ NOT NULL ] [ { DEFAULT | := } expression ];

  The DEFAULT clause, if given, specifies the initial value assigned to
the variable when the block is entered. If the DEFAULT clause is not
given then the variable is initialized to the  SQL null value. The
CONSTANT option prevents the variable from being assigned to, so that
its value remains constant for the duration of the block. If NOT NULL
is specified, an assignment of a null value results in a run-time
error. All variables declared as NOT NULL must have a nonnull default
value specified.


John DeSoi, Ph.D.
http://pgedit.com/
Power Tools for PostgreSQL


Re: Referencing uninitialized variables in plpgsql

From
"Karl O. Pinc"
Date:
On 02/05/2005 10:57:45 PM, John DeSoi wrote:
>
> Yes, exactly. If you don't assign a value to a declared pspgsql
> variable, it
> is NULL.

Thanks, just what I needed.

> If you are concerned
> about this, then always assign a value when you declare it.

This does not really address my concern.  See below.

> Also, you can specify NOT NULL in your declaration to ensure a
> runtime error
> is generated if the variable is null. See:
>
> http://www.postgresql.org/docs/8.0/interactive/plpgsql-declarations.html

I don't know why I sometimes can't find this stuff when I need it.
I kept scanning for 'variable'.

Some quibbles.  (Where the devils lurk.  ;)

AFICT, the docs arn't clear when it comes to referencing unitialized
array elements.  I assume you get NULL values here as well even
though other array elements may have been given non-NULL values.
Can I count on this?

Initializing all varaibles, whether in declarations, automatically,
or by assignment, is not a substiute for throwing an exception at
runtime when an unitialized variable is referenced.  When
a program is written so that varaibles are given values before
those values are expected to be used, and not given values otherwise,
then runtime exceptions thrown when unitialized variables are
referenced are alerts that the program is operating in an
unexpected manner.  Initializing all variables regardless of
whether the initial values are expected to be used does nothing
more than assure these sorts of alerts will not be raised,
increasing the likelyhood that unexpected program behavior
will go unnoticed and uncorrected.

So, it would be cool of plpgsql declarations could declare
a variable NOT NULL without having to assign a default
value, and then raise an 'illegal NULL' exception should
the variable be referenced before a value is assigned to it.
This sounds like it could be complicated to impliment,
except that something similar must already be happening
with unitialized array elements to produce NULL values
when these are referenced.

Karl <kop@meme.com>
Free Software:  "You don't pay back, you pay forward."
                -- Robert A. Heinlein






Re: Referencing uninitialized variables in plpgsql

From
Tom Lane
Date:
"Karl O. Pinc" <kop@meme.com> writes:
> AFICT, the docs arn't clear when it comes to referencing unitialized
> array elements.  I assume you get NULL values here as well even
> though other array elements may have been given non-NULL values.

There is no such thing as an uninitialized array element.

Looking at array_ref, we do return a NULL if you try to fetch from a
position outside the current array subscript range.  But that's not
"uninitialized" in any normal sense of the word, it's more like
"nonexistent".  (I think you're right that it's undocumented behavior,
too.)

> Can I count on this?

Until someone makes an argument to change it ;-).

            regards, tom lane

Creating an index-type for LIKE '%value%'

From
CG
Date:
Once upon a time there was an FTI contrib module that split up a varchar field
into little bits and placed them into an FTI table to facilitate a full text
index search. It was like being able to do a "SELECT * FROM table WHERE field
LIKE '%value%';" and have it search an index!

It was a great idea! What a pain it was to implement!

You see, a trigger had to be run on INSERT and UPDATE to split up the varchar
field into little pieces. On DELETE you'd have to clear out the rows from the
FTI table. And when you wanted to use the FTI table in a SELECT you had to
write your SQL to join up that FTI table and dig through it.

As I was exploring ways to optimize my application's use of the database, which
has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
places, I thought this solution could be built upon to allow for an easier
deployment.

AFAICT, the "right" way to do this would be to create an index type which would
take care of splitting the varchar field, and to have the query planner use the
index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.

Tsearch2 is fantastic, but it works best for fields that contain words. I have
to sift through alphanumeric identification numbers.

Is the split-field FTI the best way to tackle my problem?

What can I do to get better performance on "SELECT * FROM table WHERE field
LIKE '%value%';" ??

CG



__________________________________
Do you Yahoo!?
Yahoo! Mail - Easier than ever with enhanced search. Learn more.
http://info.mail.yahoo.com/mail_250

Re: Creating an index-type for LIKE '%value%'

From
Martijn van Oosterhout
Date:
On Mon, Feb 07, 2005 at 09:28:24AM -0800, CG wrote:
> As I was exploring ways to optimize my application's use of the database, which
> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
> places, I thought this solution could be built upon to allow for an easier
> deployment.

<snip>

> AFAICT, the "right" way to do this would be to create an index type which would
> take care of splitting the varchar field, and to have the query planner use the
> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.
>
> Tsearch2 is fantastic, but it works best for fields that contain words. I have
> to sift through alphanumeric identification numbers.

Seems to me to depends quite a bit or your problem domain. How big are
the string's you're searching. If you're not searching on word
boundaries like tsearch, you'd need to split on every char. Say you
split on three character blocks. So the string "Hello World" would need
entries for:

"Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld"

For N character strings you'd need N-2 entries. If you're storing
entire documents it's not practical. But if all your strings are maybe
15 characters long (maybe serial numbers), it might be practical.

I havn't looked at tsearch but maybe you can customise it to your
needs. If you can redefine the split function you could make it work
appropriately. Then you can define the ~~ operator (which is LIKE) to
call tsearch.

This in just off the top of my head, but maybe it can work.

Hope this helps,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Creating an index-type for LIKE '%value%'

From
Tom Lane
Date:
CG <cgg007@yahoo.com> writes:
> Tsearch2 is fantastic, but it works best for fields that contain
> words. I have to sift through alphanumeric identification numbers.

Can't you adjust tsearch2's notion of what is a word?  Sure seems like
that would be easier than reinventing this wheel ...

            regards, tom lane

Re: Creating an index-type for LIKE '%value%'

From
Yury Don
Date:
Hello CG,

Monday, February 7, 2005, 10:28:24 PM, you wrote:

C> Return-Path: <pgsql-general-owner+M73162@postgresql.org>
C> Delivered-To: yura@vpcit.ru
C> Received: (qmail 15486 invoked from network); 7 Feb 2005 17:36:10 -0000
C> Received: from svr4.postgresql.org (66.98.251.159)
C>   by ns.vpcit.ru with SMTP; 7 Feb 2005 17:36:09 -0000
C> Received: from postgresql.org (svr1.postgresql.org [200.46.204.71])
C>         by svr4.postgresql.org (Postfix) with ESMTP id 91E355AFB96;
C>         Mon,  7 Feb 2005 17:35:38 +0000 (GMT)
C> X-Original-To:
C> pgsql-general-postgresql.org@localhost.postgresql.org
C> Received: from localhost (unknown [200.46.204.144])
C>         by svr1.postgresql.org (Postfix) with ESMTP id AB6DB8B9C8E
C>         for
C> <pgsql-general-postgresql.org@localhost.postgresql.org>; Mon,  7
C> Feb 2005 17:28:41 +0000 (GMT)
C> Received: from svr1.postgresql.org ([200.46.204.71])
C>  by localhost (av.hub.org [200.46.204.144]) (amavisd-new, port 10024)
C>  with ESMTP id 86703-06
C>  for <pgsql-general-postgresql.org@localhost.postgresql.org>;
C>  Mon,  7 Feb 2005 17:28:27 +0000 (GMT)
C> Received: from web13811.mail.yahoo.com (web13811.mail.yahoo.com [216.136.175.219])
C>         by svr1.postgresql.org (Postfix) with SMTP id 282268B9B41
C>         for <pgsql-general@postgresql.org>; Mon,  7 Feb 2005 17:28:26 +0000 (GMT)
C> Received: (qmail 27996 invoked by uid 60001); 7 Feb 2005 17:28:25 -0000
C> Comment: DomainKeys? See http://antispam.yahoo.com/domainkeys
C> DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws;
C>   s=s1024; d=yahoo.com;
C>
C>
b=RGAPPVsUjH1PXAVx5YgEkzrYoDPXlWw1QqdzAqR8VkgnmIBdcEWfH8poGpOiBJZd3dDCXObCkh9PoSlR0m1B5BaGO1hPPVDY5Ypl3NLL3lwhAhLaEGhHT25sPztygaIZUyNbYalfrQZLjOl7P3ZSTu7uqsiaqrI56gAntgyCZIQ=
C> Message-ID: <20050207172824.27994.qmail@web13811.mail.yahoo.com>
C> Received: from [216.173.173.66] by web13811.mail.yahoo.com via
C> HTTP; Mon, 07 Feb 2005 09:28:24 PST
C> Date: Mon, 7 Feb 2005 09:28:24 -0800 (PST)
C> From: CG <cgg007@yahoo.com>
C> Reply-To: cgg007@yahoo.com
C> Subject: [GENERAL] Creating an index-type for LIKE '%value%'
C> To: pgsql-general@postgresql.org
C> In-Reply-To: <5349.1107713905@sss.pgh.pa.us>
C> MIME-Version: 1.0
C> Content-Type: text/plain; charset=us-ascii
C> X-Virus-Scanned: by amavisd-new at hub.org
C> X-Spam-Status: No, hits=0.89 tagged_above=0 required=5
C>  tests=DNS_FROM_RFC_ABUSE, FROM_ENDS_IN_NUMS
C> X-Spam-Level:
C> X-Mailing-List: pgsql-general
C> Precedence: bulk
C> Sender: pgsql-general-owner@postgresql.org

C> Once upon a time there was an FTI contrib module that split up a varchar field
C> into little bits and placed them into an FTI table to facilitate a full text
C> index search. It was like being able to do a "SELECT * FROM table WHERE field
C> LIKE '%value%';" and have it search an index!

C> It was a great idea! What a pain it was to implement!

C> You see, a trigger had to be run on INSERT and UPDATE to split up the varchar
C> field into little pieces. On DELETE you'd have to clear out the rows from the
C> FTI table. And when you wanted to use the FTI table in a SELECT you had to
C> write your SQL to join up that FTI table and dig through it.

C> As I was exploring ways to optimize my application's use of the database, which
C> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
C> places, I thought this solution could be built upon to allow for an easier
C> deployment.

C> AFAICT, the "right" way to do this would be to create an index type which would
C> take care of splitting the varchar field, and to have the query planner use the
C> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.

C> Tsearch2 is fantastic, but it works best for fields that contain words. I have
C> to sift through alphanumeric identification numbers.

C> Is the split-field FTI the best way to tackle my problem?

C> What can I do to get better performance on "SELECT * FROM table WHERE field
C> LIKE '%value%';" ??

C> CG

We use for this type ltree from contrib.
For example you have a table with column named f1. Add a column
f1_ltree of type ltree and fill it in trigger taking value of f1 and
cracting tree with every character as node. Create index for table on
f2 using gist. For example, for f1='abcde' f2 will be 'a.b.c.d.e'. Below
is example of function for transforming text to ltree. And then you
can search "f2 ~ '*.b.c.d.*'" instead of "f1 like '%bcd%'" and it will be
index search. It's possible also not to create additional column and
create index on charsltree(f1) and search as "charsltree(f1) ~
'*.b.c.d.*'"

create or replace function charsltree(text) returns ltree as '
declare
    str alias for $1;
    res text;
    i   smallint;
begin
    if $1 is null or $1 = '''' then
        return null;
    end if;
    res = '''';
    for i in 1 .. length(str) loop
        res = res || substr(str, i, 1) || ''.'';
    end loop;
    return text2ltree(btrim(res, ''.''));
end;
' immutable language plpgsql;


--
Best regards,
 Yury                            mailto:yura@vpcit.ru


Re: Creating an index-type for LIKE '%value%'

From
Oleg Bartunov
Date:
Read http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm

     Oleg
On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:

> On Mon, Feb 07, 2005 at 09:28:24AM -0800, CG wrote:
>> As I was exploring ways to optimize my application's use of the database, which
>> has to run the horrid "SELECT * FROM table WHERE field LIKE '%value%';" in
>> places, I thought this solution could be built upon to allow for an easier
>> deployment.
>
> <snip>
>
>> AFAICT, the "right" way to do this would be to create an index type which would
>> take care of splitting the varchar field, and to have the query planner use the
>> index in a SELECT when a "LIKE '%value%'" is used in the WHERE clause.
>>
>> Tsearch2 is fantastic, but it works best for fields that contain words. I have
>> to sift through alphanumeric identification numbers.
>
> Seems to me to depends quite a bit or your problem domain. How big are
> the string's you're searching. If you're not searching on word
> boundaries like tsearch, you'd need to split on every char. Say you
> split on three character blocks. So the string "Hello World" would need
> entries for:
>
> "Hel", "ell", "llo", "lo ", "o W", " Wo", "Wor", "orl", "rld"
>
> For N character strings you'd need N-2 entries. If you're storing
> entire documents it's not practical. But if all your strings are maybe
> 15 characters long (maybe serial numbers), it might be practical.
>
> I havn't looked at tsearch but maybe you can customise it to your
> needs. If you can redefine the split function you could make it work
> appropriately. Then you can define the ~~ operator (which is LIKE) to
> call tsearch.
>
> This in just off the top of my head, but maybe it can work.
>
> Hope this helps,
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Creating an index-type for LIKE '%value%'

From
"Larry Rosenman"
Date:
Oleg Bartunov wrote:
> Read
> http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
>
>      Oleg
> On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:
Would you have a suggestion to index the following query:

SELECT domain,message,'1' as truth FROM blacklist
WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')

The somedomain is actually a constant passed in from Exim (it's the sender's
righthand
Side of an E-Mail address).

I'm looking to see if the domain name is in my blacklist.

I may just be SOL, but I figured I'd ask.

The blacklist table is:
exim=# \d blacklist
                       Table "public.blacklist"
   Column    |            Type             |        Modifiers
-------------+-----------------------------+--------------------------
 insert_when | timestamp(0) with time zone | default now()
 insert_who  | text                        | default "current_user"()
 domain      | text                        |
 message     | text                        |
Indexes:
    "blacklist_dom_idx" btree ("domain")

exim=#

And contains records like:

exim=# select * from blacklist limit 1;
      insert_when       | insert_who |  domain  |             message
------------------------+------------+----------+---------------------------
------
 2003-12-22 21:02:49-06 | ler        | 008\.net | 127.0.0.1 MX, SPAMMER
(008.net)
(1 row)

exim=#

Thanks!

LER

--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: Creating an index-type for LIKE '%value%'

From
Martijn van Oosterhout
Date:
On Mon, Feb 07, 2005 at 03:16:51PM -0600, Larry Rosenman wrote:
> Oleg Bartunov wrote:
> > Read
> > http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
> >
> >      Oleg
> > On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:
> Would you have a suggestion to index the following query:
>
> SELECT domain,message,'1' as truth FROM blacklist
> WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')
>
> The somedomain is actually a constant passed in from Exim (it's the
> sender's righthand Side of an E-Mail address).

I'm not sure, but this might be what ltree was designed for. After all,
they're just stems and you want to match any domain ending in that
stem...

Hope this helps,

--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Attachment

Re: Creating an index-type for LIKE '%value%'

From
Steve Atkins
Date:
On Mon, Feb 07, 2005 at 03:16:51PM -0600, Larry Rosenman wrote:
> SELECT domain,message,'1' as truth FROM blacklist
> WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')
>
> The somedomain is actually a constant passed in from Exim (it's the sender's
> righthand
> Side of an E-Mail address).
>
> I'm looking to see if the domain name is in my blacklist.
>
> I may just be SOL, but I figured I'd ask.
>
> The blacklist table is:
> exim=# \d blacklist
>                        Table "public.blacklist"
>    Column    |            Type             |        Modifiers
> -------------+-----------------------------+--------------------------
>  insert_when | timestamp(0) with time zone | default now()
>  insert_who  | text                        | default "current_user"()
>  domain      | text                        |
>  message     | text                        |
> Indexes:
>     "blacklist_dom_idx" btree ("domain")
>
> exim=#
>
> And contains records like:
>
> exim=# select * from blacklist limit 1;
>       insert_when       | insert_who |  domain  |             message
> ------------------------+------------+----------+---------------------------
> ------
>  2003-12-22 21:02:49-06 | ler        | 008\.net | 127.0.0.1 MX, SPAMMER
> (008.net)

A functional btree index on reverse(domain) might get you what you're
looking for.

<digs in the Abacus source code...>

  CREATE OR REPLACE FUNCTION reverse(text) RETURNS text AS '
  DECLARE
         original alias for $1;
         reverse_str text;
         i int4;
  BEGIN
   reverse_str = '''';
   FOR i IN REVERSE LENGTH(original)..1 LOOP
    reverse_str = reverse_str || substr(original,i,1);
   END LOOP;
   return reverse_str;
  END;'
  LANGUAGE 'plpgsql' IMMUTABLE;

Then do

CREATE INDEX foo_idx ON blacklist(reverse(domain));

SELECT * FROM blacklist WHERE reverse(domain) LIKE reverse(bar) || '%';

Cheers,
  Steve

Re: Creating an index-type for LIKE '%value%'

From
Larry Rosenman
Date:
On Mon, 7 Feb 2005, Steve Atkins wrote:
> A functional btree index on reverse(domain) might get you what you're
> looking for.

[snip]

I wound up doing the following:

--
-- Name: reverse(text); Type: FUNCTION; Schema: public; Owner: ler
--

CREATE FUNCTION reverse(text) RETURNS text
     AS $_$
DECLARE
original alias for $1;
      reverse_str text;
      i int4;
BEGIN
    reverse_str := '';
    FOR i IN REVERSE LENGTH(original)..1 LOOP
      reverse_str := reverse_str || substr(original,i,1);
    END LOOP;
RETURN reverse_str;
END;$_$
     LANGUAGE plpgsql IMMUTABLE;


ALTER FUNCTION public.reverse(text) OWNER TO ler;

--
-- Name: update_new_domain2(); Type: FUNCTION; Schema: public; Owner: ler
--

CREATE FUNCTION update_new_domain2() RETURNS "trigger"
     AS $$
BEGIN
IF TG_OP = 'DELETE'
THEN RETURN OLD;
END IF;
IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE'
THEN NEW.new_domain2 :=  (reverse(lower('%' || NEW.domain)) );
      RETURN NEW;
END IF;
END;
$$
     LANGUAGE plpgsql IMMUTABLE;


ALTER FUNCTION public.update_new_domain2() OWNER TO ler;

--
-- Name: blacklist; Type: TABLE; Schema: public; Owner: ler; Tablespace:
--

CREATE TABLE blacklist (
     insert_when timestamp(0) with time zone DEFAULT now(),
     insert_who text DEFAULT "current_user"(),
     message text NOT NULL,
     "domain" text NOT NULL,
     new_domain2 text NOT NULL
);
ALTER TABLE ONLY blacklist ALTER COLUMN "domain" SET STATISTICS 100;
ALTER TABLE ONLY blacklist ALTER COLUMN new_domain2 SET STATISTICS 100;


ALTER TABLE public.blacklist OWNER TO ler;
--
-- Name: blk_new_idx3; Type: INDEX; Schema: public; Owner: ler; Tablespace:
--

CREATE INDEX blk_new_idx3 ON blacklist USING btree (new_domain2);

ALTER TABLE blacklist CLUSTER ON blk_new_idx3;


ALTER INDEX public.blk_new_idx3 OWNER TO ler;

--
-- Name: blacklist_domain; Type: TRIGGER; Schema: public; Owner: ler
--

CREATE TRIGGER blacklist_domain
     BEFORE INSERT OR DELETE OR UPDATE ON blacklist
     FOR EACH ROW
     EXECUTE PROCEDURE update_new_domain2();

It doesn't yet use the index with the 254 domains I have in my fecal roster, but
it's also about 5x as fast as the other REGEX lookup.

Thanks for the ideas!

LER

--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Re: Creating an index-type for LIKE '%value%'

From
Oleg Bartunov
Date:
On Mon, 7 Feb 2005, Larry Rosenman wrote:

> Oleg Bartunov wrote:
>> Read
>> http://www.sai.msu.su/~megera/postgres/gist/pg_trgm/README.pg_trgm
>>
>>      Oleg
>> On Mon, 7 Feb 2005, Martijn van Oosterhout wrote:
> Would you have a suggestion to index the following query:
>
> SELECT domain,message,'1' as truth FROM blacklist
> WHERE somedomain ~* '(?:.+\.|)' || domain || '\$')
>
> The somedomain is actually a constant passed in from Exim (it's the sender's
> righthand
> Side of an E-Mail address).
>
> I'm looking to see if the domain name is in my blacklist.
>
> I may just be SOL, but I figured I'd ask.

Larry, I pointed you to pg_trgm module mostly following Martijn's
suggestions. Now, I see you need another our module - ltree,
see http://www.sai.msu.su/~megera/postgres/gist/ltree/
for details.


     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Creating an index-type for LIKE '%value%'

From
Larry Rosenman
Date:
On Tue, 8 Feb 2005, Oleg Bartunov wrote:

> On Mon, 7 Feb 2005, Larry Rosenman wrote:
>
>> Oleg Bartunov wrote:
>
> Larry, I pointed you to pg_trgm module mostly following Martijn's
> suggestions. Now, I see you need another our module - ltree,
> see http://www.sai.msu.su/~megera/postgres/gist/ltree/
> for details.

I maybe dense, but could you give me an example?

I'm not seeing it for some reason :).

Thanks,
LER

--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749

Re: Creating an index-type for LIKE '%value%'

From
Oleg Bartunov
Date:
On Tue, 8 Feb 2005, Larry Rosenman wrote:

> On Tue, 8 Feb 2005, Oleg Bartunov wrote:
>
>> On Mon, 7 Feb 2005, Larry Rosenman wrote:
>>
>>> Oleg Bartunov wrote:
>>
>> Larry, I pointed you to pg_trgm module mostly following Martijn's
>> suggestions. Now, I see you need another our module - ltree,
>> see http://www.sai.msu.su/~megera/postgres/gist/ltree/
>> for details.
>
> I maybe dense, but could you give me an example?

test=# \d tt
      Table "public.tt"
  Column | Type  | Modifiers
--------+-------+-----------
  domain | ltree |
Indexes:
     "ltree_idx" gist ("domain")

test=# select * from tt where domain ~ '*.ru'::lquery;
    domain
-------------
  astronet.ru
  mail.ru
  pgsql.ru
(3 rows)


>
> I'm not seeing it for some reason :).
>
> Thanks,
> LER
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Creating an index-type for LIKE '%value%'

From
"Larry Rosenman"
Date:
Oleg Bartunov wrote:
> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>
>> On Tue, 8 Feb 2005, Oleg Bartunov wrote:
>>
>>> On Mon, 7 Feb 2005, Larry Rosenman wrote:
>>>
>>>> Oleg Bartunov wrote:
>>>
>>> Larry, I pointed you to pg_trgm module mostly following Martijn's
>>> suggestions. Now, I see you need another our module - ltree, see
>>> http://www.sai.msu.su/~megera/postgres/gist/ltree/
>>> for details.
>>
>> I maybe dense, but could you give me an example?
>
> test=# \d tt
>       Table "public.tt"
>   Column | Type  | Modifiers
> --------+-------+-----------
>   domain | ltree |
> Indexes:
>      "ltree_idx" gist ("domain")
>
> test=# select * from tt where domain ~ '*.ru'::lquery;
>     domain
> -------------
>   astronet.ru
>   mail.ru
>   pgsql.ru
> (3 rows)
>
>
>>
>> I'm not seeing it for some reason :).
>>
>> Thanks,
>> LER
>>
>>
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> Sternberg Astronomical Institute, Moscow University (Russia)
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

It doesn't seem to like pieces with hyphens ('-') in the name, when I try
To update blacklist set new_domain_lt=text2ltree(domain) I get a
Syntax error (apparently for the hyphens).



--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: Creating an index-type for LIKE '%value%'

From
Oleg Bartunov
Date:
On Tue, 8 Feb 2005, Larry Rosenman wrote:

>
> It doesn't seem to like pieces with hyphens ('-') in the name, when I try
> To update blacklist set new_domain_lt=text2ltree(domain) I get a
> Syntax error (apparently for the hyphens).
>

Try change definition of ISALNUM on ltree.h

#define ISALNUM(x)      ( isalnum((unsigned char)(x)) || (x) == '_' )

this was already discussed
http://www.pgsql.ru/db/mw/msg.html?mid=2034299


>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Creating an index-type for LIKE '%value%'

From
"Larry Rosenman"
Date:
Oleg Bartunov wrote:
> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>
>>
>> It doesn't seem to like pieces with hyphens ('-') in the name, when I
>> try To update blacklist set new_domain_lt=text2ltree(domain) I get a
>> Syntax error (apparently for the hyphens).
>>
>
> Try change definition of ISALNUM on ltree.h
>
> #define ISALNUM(x)      ( isalnum((unsigned char)(x)) || (x) == '_' )
>
> this was already discussed
> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>
Thanks!

Now, how can I make it always case-insensitive?



--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: Creating an index-type for LIKE '%value%'

From
Oleg Bartunov
Date:
On Tue, 8 Feb 2005, Larry Rosenman wrote:

> Oleg Bartunov wrote:
>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>
>>>
>>> It doesn't seem to like pieces with hyphens ('-') in the name, when I
>>> try To update blacklist set new_domain_lt=text2ltree(domain) I get a
>>> Syntax error (apparently for the hyphens).
>>>
>>
>> Try change definition of ISALNUM on ltree.h
>>
>> #define ISALNUM(x)      ( isalnum((unsigned char)(x)) || (x) == '_' )
>>
>> this was already discussed
>> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>>
> Thanks!
>
> Now, how can I make it always case-insensitive?
>

from http://www.sai.msu.su/~megera/postgres/gist/ltree/

  It is possible to use several modifiers at the end of a label:


            @     Do case-insensitive label matching
            *     Do prefix matching for a label
            %     Don't account word separator '_' in label matching, that is
                  'Russian%' would match 'Russian_nations', but not 'Russian'



>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

Re: Creating an index-type for LIKE '%value%'

From
"Larry Rosenman"
Date:
Oleg Bartunov wrote:
> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>
>> Oleg Bartunov wrote:
>>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>>
>>>>
>>>> It doesn't seem to like pieces with hyphens ('-') in the name, when
>>>> I try To update blacklist set new_domain_lt=text2ltree(domain) I
>>>> get a Syntax error (apparently for the hyphens).
>>>>
>>>
>>> Try change definition of ISALNUM on ltree.h
>>>
>>> #define ISALNUM(x)      ( isalnum((unsigned char)(x)) || (x) == '_'
>>> )
>>>
>>> this was already discussed
>>> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>>>
>> Thanks!
>>
>> Now, how can I make it always case-insensitive?
>>
>
> from http://www.sai.msu.su/~megera/postgres/gist/ltree/
>
>   It is possible to use several modifiers at the end of a label:
>
>
>             @     Do case-insensitive label matching
>             *     Do prefix matching for a label
>             %     Don't account word separator '_' in label matching,
>                   that is 'Russian%' would match 'Russian_nations',
> but not 'Russian'
>
>
>
>>
>>
>>
>
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg
> Astronomical Institute, Moscow University (Russia) Internet:
> oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(095)939-16-83, +007(095)939-23-83

Does that apply to each node, or the entire string?

I'd like to not have to parse the lquery string and make each node following
it with an @.

LER


--
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



Re: Creating an index-type for LIKE '%value%'

From
Oleg Bartunov
Date:
On Tue, 8 Feb 2005, Larry Rosenman wrote:

> Oleg Bartunov wrote:
>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>
>>> Oleg Bartunov wrote:
>>>> On Tue, 8 Feb 2005, Larry Rosenman wrote:
>>>>
>>>>>
>>>>> It doesn't seem to like pieces with hyphens ('-') in the name, when
>>>>> I try To update blacklist set new_domain_lt=text2ltree(domain) I
>>>>> get a Syntax error (apparently for the hyphens).
>>>>>
>>>>
>>>> Try change definition of ISALNUM on ltree.h
>>>>
>>>> #define ISALNUM(x)      ( isalnum((unsigned char)(x)) || (x) == '_'
>>>> )
>>>>
>>>> this was already discussed
>>>> http://www.pgsql.ru/db/mw/msg.html?mid=2034299
>>>>
>>> Thanks!
>>>
>>> Now, how can I make it always case-insensitive?
>>>
>>
>> from http://www.sai.msu.su/~megera/postgres/gist/ltree/
>>
>>   It is possible to use several modifiers at the end of a label:
>>
>>
>>             @     Do case-insensitive label matching
>>             *     Do prefix matching for a label
>>             %     Don't account word separator '_' in label matching,
>>                   that is 'Russian%' would match 'Russian_nations',
>> but not 'Russian'
>>
>>
>>
>>>
>>>
>>>
>>
>>      Regards,
>>          Oleg
>> _____________________________________________________________
>> Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg
>> Astronomical Institute, Moscow University (Russia) Internet:
>> oleg@sai.msu.su, http://www.sai.msu.su/~megera/
>> phone: +007(095)939-16-83, +007(095)939-23-83
>
> Does that apply to each node, or the entire string?
>
> I'd like to not have to parse the lquery string and make each node following
> it with an @.


I'm a little bit tired :), if you want case insenstive for the whole node,
you could use built-in fuinction 'lower(text)' !

use text2ltree(lower(text))

>
> LER
>
>
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83