Thread: Work Around For Simple Oracle interMedia?

Work Around For Simple Oracle interMedia?

From
"David Griffiths"
Date:
One of our two databases uses context indexes on a column.

If your not familiar, this is part of the Oracle interMedia option. The idea
here is to be able to search for words within a text block. The text block
is in a row, in one of the columns. It can also be HTML, a Word document,
etc. When you index the column, it speeds up finding matching words.

Once that's done, you can search like this:

SELECT SCORE(1) title FROM news_items
           WHERE CONTAINS(text, 'Bolivia', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" in the "title" column.

SELECT SCORE(1) title FROM news_items
           WHERE CONTAINS(text, 'Bolivia OR Peru', 1) > 0;

This would find all rows in the table "news_item" that has the text
"Bolivia" or "Peru" in the "title" column in any order.

This is the most basic usage of the interMedia (you can get it to do a
soundex to match a sound, or get a "near" match or fuzzy match rather than
an exact match, etc.)

But I'm looking for the basic "return all rows where the text in this column
returns this word or these words".

My two thoughts are, write my own function in PERL, or store the text in a
file outside the database and use some OS-features to do the search

Any suggestions?

David




Re: Work Around For Simple Oracle interMedia?

From
Philip Hallstrom
Date:
I'll be the first to say I'm not a text/search expert, but in src/contrib
there's the following:

fulltextindex - Full text indexing using triggers

and there's this: http://openfts.sourceforge.net/

I haven't used either so can't comment at all...  there's also htdig and
mngosearch (external apps).

-philip

On Tue, 12 Feb 2002, David Griffiths wrote:

> One of our two databases uses context indexes on a column.
>
> If your not familiar, this is part of the Oracle interMedia option. The idea
> here is to be able to search for words within a text block. The text block
> is in a row, in one of the columns. It can also be HTML, a Word document,
> etc. When you index the column, it speeds up finding matching words.
>
> Once that's done, you can search like this:
>
> SELECT SCORE(1) title FROM news_items
>            WHERE CONTAINS(text, 'Bolivia', 1) > 0;
>
> This would find all rows in the table "news_item" that has the text
> "Bolivia" in the "title" column.
>
> SELECT SCORE(1) title FROM news_items
>            WHERE CONTAINS(text, 'Bolivia OR Peru', 1) > 0;
>
> This would find all rows in the table "news_item" that has the text
> "Bolivia" or "Peru" in the "title" column in any order.
>
> This is the most basic usage of the interMedia (you can get it to do a
> soundex to match a sound, or get a "near" match or fuzzy match rather than
> an exact match, etc.)
>
> But I'm looking for the basic "return all rows where the text in this column
> returns this word or these words".
>
> My two thoughts are, write my own function in PERL, or store the text in a
> file outside the database and use some OS-features to do the search
>
> Any suggestions?
>
> David
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>


Re: Work Around For Simple Oracle interMedia?

From
Holger Marzen
Date:
On Tue, 12 Feb 2002, David Griffiths wrote:

> One of our two databases uses context indexes on a column.
>
> If your not familiar, this is part of the Oracle interMedia option. The idea
> here is to be able to search for words within a text block. The text block
> is in a row, in one of the columns. It can also be HTML, a Word document,
> etc. When you index the column, it speeds up finding matching words.

I am afraid you have to convert the documents to raw text or RTF first.

> Once that's done, you can search like this:
>
> SELECT SCORE(1) title FROM news_items
>            WHERE CONTAINS(text, 'Bolivia', 1) > 0;
>
> This would find all rows in the table "news_item" that has the text
> "Bolivia" in the "title" column.

I don't see any problem here. "ILIKE" or "~*" seem to do the job.

> SELECT SCORE(1) title FROM news_items
>            WHERE CONTAINS(text, 'Bolivia OR Peru', 1) > 0;
>
> This would find all rows in the table "news_item" that has the text
> "Bolivia" or "Peru" in the "title" column in any order.
>
> This is the most basic usage of the interMedia (you can get it to do a
> soundex to match a sound, or get a "near" match or fuzzy match rather than
> an exact match, etc.)

For fuzzy match you could write a levenshtein (sp?) function. But then
you have to extract the words from the field. That might get tricky.


Re: Work Around For Simple Oracle interMedia?

From
Bruce Momjian
Date:
> > This is the most basic usage of the interMedia (you can get it to do a
> > soundex to match a sound, or get a "near" match or fuzzy match rather than
> > an exact match, etc.)
>
> For fuzzy match you could write a levenshtein (sp?) function. But then
> you have to extract the words from the field. That might get tricky.

We have levenshtein in /contrib/fuzzystrmatch.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026