Thread: tsearch thoughts

tsearch thoughts

From
"Christopher Kings-Lynne"
Date:
Is there any reason why the tseach indexes couldn't be modified to just work
on TEXT fields and not TXTIDX fields.  Is there really a reason to have the
TXTIDX type?

I mean, when the index is created over the text column, instead of just
indexing the text as-is, index the txt2txtidx'd version...?

That would vastly reduce the complexity of tsearch, and would make the
indexed text invisible, as it is in most other fti implementations...?

I tried to simulate this myself, although ideally it would be invisible to
the user:

test=# create table test (a text);
CREATE
test=# CREATE INDEX my_idx ON test USING gist(txt2txtidx(a));
ERROR:  DefineIndex: index function must be marked iscachable

So the index isn't iscachable - why's that?

Say it was marked iscachable, then I'd be able to query like this:

SELECT * FROM test WHERE txt2txtidx(test) ## 'apple';

This would mean that the index on-disk file would be large, but the table
file would stay small.  It would also vastly reduce the size of pg_dumps...

Could we move towards something like:

CREATE FULLTEXT INDEX my_idx ON test (a);

Or something?

Chris



Re: tsearch thoughts

From
Oleg Bartunov
Date:
On Sat, 30 Nov 2002, Christopher Kings-Lynne wrote:

> Is there any reason why the tseach indexes couldn't be modified to just work
> on TEXT fields and not TXTIDX fields.  Is there really a reason to have the
> TXTIDX type?
>
> I mean, when the index is created over the text column, instead of just
> indexing the text as-is, index the txt2txtidx'd version...?
>
> That would vastly reduce the complexity of tsearch, and would make the
> indexed text invisible, as it is in most other fti implementations...?

Chris,

This is sort of we had thought  about full text searching in postgres and
what should happens with maturity of tsearch. We began from contrib/module
just to get some experience and still need to do some research on
underlying algorithms. Also, remember current GiST is not concurrent and
we plan to work on this issue. We're very busy and need somebody to help
us with interface (dictionaries, parser, postgresql internal interface).


>
> I tried to simulate this myself, although ideally it would be invisible to
> the user:
>
> test=# create table test (a text);
> CREATE
> test=# CREATE INDEX my_idx ON test USING gist(txt2txtidx(a));
> ERROR:  DefineIndex: index function must be marked iscachable
>
> So the index isn't iscachable - why's that?

I don't remember the reason, but you may try to define it as 'iscachable'
in tsearch.sql.

>
> Say it was marked iscachable, then I'd be able to query like this:
>
> SELECT * FROM test WHERE txt2txtidx(test) ## 'apple';
>
> This would mean that the index on-disk file would be large, but the table
> file would stay small.  It would also vastly reduce the size of pg_dumps...
>
> Could we move towards something like:
>
> CREATE FULLTEXT INDEX my_idx ON test (a);
>
> Or something?
>
> Chris
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/users-lounge/docs/faq.html
>
Regards,    Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83



Re: tsearch thoughts

From
"Christopher Kings-Lynne"
Date:
> This is sort of we had thought  about full text searching in postgres and
> what should happens with maturity of tsearch. We began from contrib/module
> just to get some experience and still need to do some research on
> underlying algorithms. Also, remember current GiST is not concurrent and
> we plan to work on this issue. We're very busy and need somebody to help
> us with interface (dictionaries, parser, postgresql internal interface).

Hi Oleg,

I'm busy too :)

Is there for instance a specific thing that need work?

Chris



Re: tsearch thoughts

From
Teodor Sigaev
Date:
> I mean, when the index is created over the text column, instead of just
> indexing the text as-is, index the txt2txtidx'd version...?

For two reasons:
1. gist_txtidx_ops create with loss information (for less size), so any 
operation with index must be checked
with original txtidx value. The way " REATE INDEX my_idx ON test USING 
gist(txt2txtidx(a))" may decreas performance :(
2 OpenFTS. We wanted that txtidx works with OpenFTS. And adding dictionaries, 
txt2txtidx, trigger, type mquery_txt etc
was an experiment.
-- 
Teodor Sigaev
teodor@stack.net