Thread: tsearch_core for inclusion

tsearch_core for inclusion

From
Teodor Sigaev
Date:
Last try there was a fight about syntax of introduced commands. And we (Oleg and 
me) developed variant of patch with another syntax. We will not change docs 
until agreement will be reached, current version 
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/

Following demonstrates subset of FTS syntax using example from 
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-complete-tut.html.

Syntax which was proposed:
begin;
CREATE FULLTEXT CONFIGURATION public.pg ( LOCALE = 'ru_RU.UTF-8' ) LIKE english 
WITH MAP;
CREATE FULLTEXT DICTIONARY pg_dict ( OPT = 'pg_dict.txt' ) LIKE synonym;
CREATE FULLTEXT DICTIONARY en_ispell
( OPT = 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict",
AffFile="/usr/local/share/dicts/ispell/english-utf8.aff",
StopFile="/usr/local/share/dicts/ispell/english-utf8.stop"'
) LIKE ispell_template;
ALTER FULLTEXT DICTIONARY en_stem SET 
OPT='/usr/local/share/dicts/ispell/english-utf8.stop';
ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword                             WITH pg_dict,en_ispell,en_stem;
DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;
end;
Patch is http://www.sigaev.ru/misc/tsearch_core-0.38.gz

Making above FTS syntax as Peter suggested, we removed unnecessary parenthesis, 
equal sign, OPT becomes OPTION.
begin;
CREATE FULLTEXT CONFIGURATION public.pg  LOCALE  'ru_RU.UTF-8' LIKE english WITH MAP;
CREATE FULLTEXT DICTIONARY pg_dict OPTION 'pg_dict.txt'  LIKE synonym;
CREATE FULLTEXT DICTIONARY en_ispell
OPTION 'DictFile="/usr/local/share/dicts/ispell/english-utf8.dict",
AffFile="/usr/local/share/dicts/ispell/english-utf8.aff",
StopFile="/usr/local/share/dicts/ispell/english-utf8.stop"'
LIKE ispell_template;
ALTER FULLTEXT DICTIONARY en_stem SET OPTION '/usr/local/share/dicts/ispell/engl
ish-utf8.stop';
ALTER FULLTEXT MAPPING ON pg FOR lword,lhword,lpart_hword                             WITH pg_dict,en_ispell,en_stem;
DROP FULLTEXT MAPPING ON pg FOR email, url, sfloat, uri, float;
end;
Patch is http://www.sigaev.ru/misc/tsearch_core-0.38.1.gz

Comparing that syntaxes with current tsearch2 is placed at 
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html

So, which is syntax more attractive? And is there some another objections?


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsearch_core for inclusion

From
Robert Treat
Date:
On Thursday 15 March 2007 12:17, Teodor Sigaev wrote:
> Last try there was a fight about syntax of introduced commands. And we
> (Oleg and me) developed variant of patch with another syntax. We will not
> change docs until agreement will be reached, current version
> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/
>
> Following demonstrates subset of FTS syntax using example from
> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-complete-tut.html.
>

This is nice. 

<snip>
> Comparing that syntaxes with current tsearch2 is placed at
> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html
>
> So, which is syntax more attractive? 

Honestly I don't find any of these syntax's to be head and shoulders above the 
others, but would probably lean toward the original syntax, since it has some 
level of familiarity among the existing user base. 

> And is there some another objections?

Most people whom I talk to about tsearch who want the syntax changed to make 
it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING 
FULLTEXT" and then be done with it.  This patch isn't going to give people 
that. 

I'm also concerned about the stability of the tsearch api in general wrt 
including it in core.  Currently the recommended upgrade practice is to 
dump/reload without tsearch, installing the new servers version of tsearch 
instead.  IMHO this is not an acceptable solution for core-included features. 
So is this actually going to be improved in a core tsearch?  system tables 
are not dumped by default, so that seems easier, until you consider that your 
custom tsearch install will then be lost on upgrade... oops! 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: tsearch_core for inclusion

From
"Joshua D. Drake"
Date:
>> And is there some another objections?
> 
> Most people whom I talk to about tsearch who want the syntax changed to make 
> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING 
> FULLTEXT" and then be done with it.  This patch isn't going to give people 
> that. 

+1 (again) Although I would see something like this:

CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN

Joshua D. Drake


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: tsearch_core for inclusion

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> +1 (again) Although I would see something like this:
> CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN

Surely the CREATE INDEX syntax has got enough warts on it already.
Can't we express this as a particular operator class, or something
that doesn't add any new syntax?
        regards, tom lane


Re: tsearch_core for inclusion

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>> +1 (again) Although I would see something like this:
>> CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN
> 
> Surely the CREATE INDEX syntax has got enough warts on it already.
> Can't we express this as a particular operator class, or something
> that doesn't add any new syntax?

Do you mean something like:

CREATE INDEX fti1 ON t1 USING GIST|GIN(c1) WITH (FULLTEXT)

The above I got from the idea of FILLFACTOR so you could have something
like WITH (FILLFACTOR = 70, FULLTEXT) (not even sure if that is relevant)

OR

CREATE INDEX ftil ON t1 USING GIST|GIN(C1 FULLTEXT);

Where FULLTEXT is like VARCHAR OPS?

I could live with that.

Sincerely,

Joshua D. Drake




> 
>             regards, tom lane
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: tsearch_core for inclusion

From
Tom Lane
Date:
"Joshua D. Drake" <jd@commandprompt.com> writes:
> Tom Lane wrote:
>> Surely the CREATE INDEX syntax has got enough warts on it already.

> Do you mean something like:
> CREATE INDEX ftil ON t1 USING GIST|GIN(C1 FULLTEXT);
> Where FULLTEXT is like VARCHAR OPS?

Yeah, that one.  It might be more consistent to spell it as "fulltext_ops"
but I wouldn't insist on it.

Of course the issue not addressed here is where you specify all the
secondary configuration data (the stuff currently handled by config
tables in the contrib implementation).  Perhaps the WITH clause would
work for that, though in the current code WITH is targeted at the index
AM not individual opclasses.
        regards, tom lane


Re: tsearch_core for inclusion

From
"Joshua D. Drake"
Date:
Tom Lane wrote:
> "Joshua D. Drake" <jd@commandprompt.com> writes:
>> Tom Lane wrote:
>>> Surely the CREATE INDEX syntax has got enough warts on it already.
> 
>> Do you mean something like:
>> CREATE INDEX ftil ON t1 USING GIST|GIN(C1 FULLTEXT);
>> Where FULLTEXT is like VARCHAR OPS?
> 
> Yeah, that one.  It might be more consistent to spell it as "fulltext_ops"
> but I wouldn't insist on it.

*shrug* fulltext_ops is probably more accurate but FULLTEXT is more
friendly :). I find you normally can't have both, my vote would probably
be consistency.

> 
> Of course the issue not addressed here is where you specify all the
> secondary configuration data (the stuff currently handled by config
> tables in the contrib implementation).  Perhaps the WITH clause would
> work for that, though in the current code WITH is targeted at the index
> AM not individual opclasses.

Not sure what to say here. WITH seems logical and I don't think we want
to add yet another keyword but I certainly see your point.

Sincerely,

Joshua D. Drake



> 
>             regards, tom lane
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: tsearch_core for inclusion

From
Oleg Bartunov
Date:
On Thu, 15 Mar 2007, Robert Treat wrote:

> On Thursday 15 March 2007 12:17, Teodor Sigaev wrote:
>> Last try there was a fight about syntax of introduced commands. And we
>> (Oleg and me) developed variant of patch with another syntax. We will not
>> change docs until agreement will be reached, current version
>> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/
>>
>> Following demonstrates subset of FTS syntax using example from
>> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-complete-tut.html.
>>
>
> This is nice.
>
> <snip>
>> Comparing that syntaxes with current tsearch2 is placed at
>> http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html
>>
>> So, which is syntax more attractive?
>
> Honestly I don't find any of these syntax's to be head and shoulders above the
> others, but would probably lean toward the original syntax, since it has some
> level of familiarity among the existing user base.
>
>> And is there some another objections?
>
> Most people whom I talk to about tsearch who want the syntax changed to make
> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
> FULLTEXT" and then be done with it.  This patch isn't going to give people
> that.

Since we use standard postgresql-ish CREATE INDEX command, I assume 
people want to skip creation of tsvector column ? How they could manage
complex document indexing, when document is a combination (with different weights)
of many text attributes from several tables, for example ? There are several
other issues with that approach, for example, we need to store positional
information somewhere for ranking information. It's awkward to parse document
every time to get this information. 
>
> I'm also concerned about the stability of the tsearch api in general wrt
> including it in core.  Currently the recommended upgrade practice is to
> dump/reload without tsearch, installing the new servers version of tsearch
> instead.  IMHO this is not an acceptable solution for core-included features.
> So is this actually going to be improved in a core tsearch?  system tables
> are not dumped by default, so that seems easier, until you consider that your
> custom tsearch install will then be lost on upgrade... oops!

This is exact reason why we want to include tsearch into core, it was discussed
several times.
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: tsearch_core for inclusion

From
Oleg Bartunov
Date:
On Thu, 15 Mar 2007, Joshua D. Drake wrote:

>
>>> And is there some another objections?
>>
>> Most people whom I talk to about tsearch who want the syntax changed to make
>> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
>> FULLTEXT" and then be done with it.  This patch isn't going to give people
>> that.
>
> +1 (again) Although I would see something like this:
>
> CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN

Yet another syntax addition :) 
What's wrong with the standard CREATE INDEX ?
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: tsearch_core for inclusion

From
"Magnus Hagander"
Date:
> > Most people whom I talk to about tsearch who want the syntax changed to make
> > it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
> > FULLTEXT" and then be done with it.  This patch isn't going to give people
> > that.
> 
> Since we use standard postgresql-ish CREATE INDEX command, I assume 
> people want to skip creation of tsvector column ?

That would be great.

> How they could manage
> complex document indexing, when document is a combination (with different weights)
> of many text attributes from several tables, for example ?

Just to give you some more work, could I have both, please.

Seriously, if the current powerful functionality could be combined with a dead simple solution for new users and those
whodon't need it, that would be very 
 
good.

This could be an auto generated hidden column or something, as long as the user doesn't need to see or care about it in
thesimple case. 
 

/Magnus



Re: tsearch_core for inclusion

From
Oleg Bartunov
Date:
On Fri, 16 Mar 2007, Magnus Hagander wrote:

>>> Most people whom I talk to about tsearch who want the syntax changed to make
>>> it easier want something akin to just "CREATE INDEX fti1 on t1(c1) USING
>>> FULLTEXT" and then be done with it.  This patch isn't going to give people
>>> that.
>>
>> Since we use standard postgresql-ish CREATE INDEX command, I assume
>> people want to skip creation of tsvector column ?
>
> That would be great.
>
>> How they could manage
>> complex document indexing, when document is a combination (with different weights)
>> of many text attributes from several tables, for example ?
>
> Just to give you some more work, could I have both, please.
>
> Seriously, if the current powerful functionality could be combined with a dead simple solution for new users and
thosewho don't need it, that would be very
 
> good.
>
> This could be an auto generated hidden column or something, as long as the user doesn't need to see or care about it
inthe simple case.
 

hmm, then we should think about very generic fts configuration behind such
simple solution. We need to know what requirements and features should
be supported. for example, "CREATE INDEX fti1 on t1(c1) USING FULLTEXT"
syntax assume default configuration only.

Joshua, Tome proposed something like
CREATE INDEX fti1 on t1(c1) USING FULLTEXT [WITH] GIST | GIN,
so is't worth to extend it to specify fts configuration like
CREATE INDEX fti1 on t1(c1) USING FULLTEXT [public.pg]  [WITH] GIST | GIN 
Normally, fts configuration is used when creating tsvector, so 
CREATE INDEX doesn't need to know it. Hmm, looks rather messy.


    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: tsearch_core for inclusion

From
Teodor Sigaev
Date:
> Yeah, that one.  It might be more consistent to spell it as "fulltext_ops"
> but I wouldn't insist on it.

Hmm, you are prompting an idea to me how to simplify usage of full text index in  simple cases.

CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);

Fulltext_ops opclass parses the document similarly to_tsvector nad stores 
lexemes in gin index. It's a full equalent of
CREATE INDEX ... ( to_tsvector( textcolumn ) )

And, let we define operation text @ text, which is equivalent of text @@ 
plainto_tsquery(text), so, queries will look like
SELECT * FROM tblname WHERE textcolumn @ textquery;

Fulltext_ops can speedup both operation, text @@ tsquery and text @ text. 
Because gin API has extractQuery method which calls once per index scan and it 
can parse query to lexemes.

Some disadvantage: with that way it isn't possible make fast ranking - there is 
no stored parsed text. And, fulltext_ops may be done for GiST index too, but 
fulltext opclass will be lossy which means slow search due to reparse texts for 
each index match.

BTW, simple syntax sugar for CREATE INDEX (fulltext_ops) may be done:
CREATE INDEX idxname ON tblname USING FULLTEXT (textcolumn)

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsearch_core for inclusion

From
Teodor Sigaev
Date:
> I'm also concerned about the stability of the tsearch api in general wrt 
> including it in core.  Currently the recommended upgrade practice is to 
> dump/reload without tsearch, installing the new servers version of tsearch 
That is because pg_ts* tables changes, function names and internal API. Putting 
tsearch in core discards a lot of such problem. For example, who notices changes 
in pg_am table from release to release? Really it was a developers/hackers, not 
a usual users

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsearch_core for inclusion

From
Robert Treat
Date:
On Friday 16 March 2007 04:44, Teodor Sigaev wrote:
> > I'm also concerned about the stability of the tsearch api in general wrt
> > including it in core.  Currently the recommended upgrade practice is to
> > dump/reload without tsearch, installing the new servers version of
> > tsearch
>
> That is because pg_ts* tables changes, function names and internal API.
> Putting tsearch in core discards a lot of such problem. For example, who
> notices changes in pg_am table from release to release? Really it was a
> developers/hackers, not a usual users

I've ran into problems on very vanilla setups that I am sure other users are 
going to run across... see the following for details: 
http://people.planetpostgresql.org/xzilla/index.php?/archives/291-The-pain-that-is-tsearch2-8.1-8.2-upgrading.html

I don't see how the proposal is going to solve that type of problem, but maybe 
I am overlooking something?

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: tsearch_core for inclusion

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> Hmm, you are prompting an idea to me how to simplify usage of full text index in 
>   simple cases.

> CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);

+1 ... makes the easy cases easy, doesn't make the hard cases any
harder.

> BTW, simple syntax sugar for CREATE INDEX (fulltext_ops) may be done:
> CREATE INDEX idxname ON tblname USING FULLTEXT (textcolumn)

Not in favor of this, because FULLTEXT isn't an AM ... how would you
know whether to use GIST or GIN?  Actually, if you wanted to simplify
life a bit, you could mark fulltext_ops as being the default opclass
for text (and varchar I guess) under GIST and GIN.  Then it reduces
to just

CREATE INDEX idxname ON tblname USING gin (textcolumn);
        regards, tom lane


Re: tsearch_core for inclusion

From
Teodor Sigaev
Date:
> I don't see how the proposal is going to solve that type of problem, but maybe 
> I am overlooking something?

The same way as other system tables objects, they don't dump, they don't 
restore. In 8.3, seems, API to index AM will be changed - will anybody except 
pghackers see that? New opclass layout, new opfamily table - users don't that 
changes at all.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


Re: tsearch_core for inclusion

From
"Florian G. Pflug"
Date:
Teodor Sigaev wrote:
> CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);
> 
> Fulltext_ops opclass parses the document similarly to_tsvector nad 
> stores lexemes in gin index. It's a full equalent of
> CREATE INDEX ... ( to_tsvector( textcolumn ) )
> 
> And, let we define operation text @ text, which is equivalent of text @@ 
> plainto_tsquery(text), so, queries will look like
> SELECT * FROM tblname WHERE textcolumn @ textquery;
> 
> Fulltext_ops can speedup both operation, text @@ tsquery and text @ 
> text. Because gin API has extractQuery method which calls once per index 
> scan and it can parse query to lexemes.
> 
> Some disadvantage: with that way it isn't possible make fast ranking - 
> there is no stored parsed text. And, fulltext_ops may be done for GiST 
> index too, but fulltext opclass will be lossy which means slow search 
> due to reparse texts for each index match.
Just a thought:

If the patch that implements the "GENERATED ALWAYS" syntax is accepted,
than creating a seperate field that hold the parsed text and an index
on that column becomes as easy as:
alter table t1 add column text_parsed generated always as to_tsvector(text);
create index idx on t1 using gin (text_parsed fulltext_ops);

I know that there is a trigger function in tsearch that support something
similar, but I really like the simplicity of the statements above.

One a related note - will to_tsvector and to_tsquery be renamed to
something like ft_parse_text() and ft_parse_query() if tsearch2 goes
into core? It seems like the "ts" part of those names would be the only
referenced left to the name "tsearch" if they are not, which could be
somewhat confusing for users.

greetings, Florian Pflug


Re: tsearch_core for inclusion

From
"Joshua D. Drake"
Date:
> One a related note - will to_tsvector and to_tsquery be renamed to
> something like ft_parse_text() and ft_parse_query() if tsearch2 goes

Furthering this... perhaps even:

ft_search()
ft_query()

Joshua D. Drake



-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: tsearch_core for inclusion

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Actually, if you wanted to simplify
> life a bit, you could mark fulltext_ops as being the default opclass
> for text (and varchar I guess) under GIST and GIN.  Then it reduces
> to just
>
> CREATE INDEX idxname ON tblname USING gin (textcolumn);
>
>   

Nice. This gets my vote.

cheers

andrew


Re: tsearch_core for inclusion

From
Oleg Bartunov
Date:
On Fri, 16 Mar 2007, Joshua D. Drake wrote:

>
>> One a related note - will to_tsvector and to_tsquery be renamed to
>> something like ft_parse_text() and ft_parse_query() if tsearch2 goes
>
> Furthering this... perhaps even:
>
> ft_search()
> ft_query()

ts_ means Text Search, I don't think ft_ (Full Text) is better.
Going further it should be fts_ (Full Text Search), but we have 
many concerns about compatibility and stability of api, so I'd prefer
to stay with ts_.

The more important is what syntax we should accept for the final patch ?
Original proposed or modified version ?
On
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/fts-syntax-compare.html
they are D.1.2 and D.1.3

As I understood Teodor's proposal about CREATE INDEX command is ok for all.


    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: tsearch_core for inclusion

From
Stefan Kaltenbrunner
Date:
Florian G. Pflug wrote:
> Teodor Sigaev wrote:
>> CREATE INDEX idxname ON tblname USING gin (textcolumn fulltext_ops);
>>
>> Fulltext_ops opclass parses the document similarly to_tsvector nad
>> stores lexemes in gin index. It's a full equalent of
>> CREATE INDEX ... ( to_tsvector( textcolumn ) )
>>
>> And, let we define operation text @ text, which is equivalent of text
>> @@ plainto_tsquery(text), so, queries will look like
>> SELECT * FROM tblname WHERE textcolumn @ textquery;
>>
>> Fulltext_ops can speedup both operation, text @@ tsquery and text @
>> text. Because gin API has extractQuery method which calls once per
>> index scan and it can parse query to lexemes.
>>
>> Some disadvantage: with that way it isn't possible make fast ranking -
>> there is no stored parsed text. And, fulltext_ops may be done for GiST
>> index too, but fulltext opclass will be lossy which means slow search
>> due to reparse texts for each index match.
> Just a thought:
> 
> If the patch that implements the "GENERATED ALWAYS" syntax is accepted,
> than creating a seperate field that hold the parsed text and an index
> on that column becomes as easy as:
> alter table t1 add column text_parsed generated always as
> to_tsvector(text);
> create index idx on t1 using gin (text_parsed fulltext_ops);

or to take tom's idea into consideration:

ALTER TABLE t1 ADD COLUMN text_parsed GENERATED ALWAYS AS to_tsvector(text);
CREATE INDEX idxname ON t1 USING gin (text_parsed);

which looks pretty nice and simple to me


> 
> I know that there is a trigger function in tsearch that support something
> similar, but I really like the simplicity of the statements above.
> 
> One a related note - will to_tsvector and to_tsquery be renamed to
> something like ft_parse_text() and ft_parse_query() if tsearch2 goes
> into core? It seems like the "ts" part of those names would be the only
> referenced left to the name "tsearch" if they are not, which could be
> somewhat confusing for users.

well either renaming those functions (and completely destroy the upgrade
path for any current users) or  just refer to it as "text search" in the
docs (so that the prefix makes sense).


Stefan


Re: tsearch_core for inclusion

From
"Florian G. Pflug"
Date:
Oleg Bartunov wrote:
> On Fri, 16 Mar 2007, Joshua D. Drake wrote:
> 
>>
>>> One a related note - will to_tsvector and to_tsquery be renamed to
>>> something like ft_parse_text() and ft_parse_query() if tsearch2 goes
>>
>> Furthering this... perhaps even:
>>
>> ft_search()
>> ft_query()
> 
> ts_ means Text Search, I don't think ft_ (Full Text) is better.
> Going further it should be fts_ (Full Text Search), but we have many 
> concerns about compatibility and stability of api, so I'd prefer
> to stay with ts_.

Hm, so it could be fts_parse_query() and fts_parse_text()
You could alias it to to_tsvector() and to_tsquery() to
archive api compatibility.

I agree that the names of these functions are really a minor
issue, and api compatibility is more important. But confusing
names can be the source of a lot of errors for new users, so
there *is* a point is naming things consistenly. And if the
cost is basically an entry in pg_proc, why not do it?

greetings, Florian Pflug




Re: tsearch_core for inclusion

From
Robert Treat
Date:
On Friday 16 March 2007 10:45, Teodor Sigaev wrote:
> > I don't see how the proposal is going to solve that type of problem, but
> > maybe I am overlooking something?
>
> The same way as other system tables objects, they don't dump, they don't
> restore. In 8.3, seems, API to index AM will be changed - will anybody
> except pghackers see that? New opclass layout, new opfamily table - users
> don't that changes at all.

If I have configured my tsearch install for spanish (spanish 
dictionary/stemmers/synonyms/etc...) aren't I going to lose all that on the 
next database upgrade? 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL


Re: tsearch_core for inclusion

From
"Florian G. Pflug"
Date:
Robert Treat wrote:
> On Friday 16 March 2007 10:45, Teodor Sigaev wrote:
>>> I don't see how the proposal is going to solve that type of problem, but
>>> maybe I am overlooking something?
>> The same way as other system tables objects, they don't dump, they don't
>> restore. In 8.3, seems, API to index AM will be changed - will anybody
>> except pghackers see that? New opclass layout, new opfamily table - users
>> don't that changes at all.
> 
> If I have configured my tsearch install for spanish (spanish 
> dictionary/stemmers/synonyms/etc...) aren't I going to lose all that on the 
> next database upgrade? 

I believe what teodor meant is that the *tables* won't be dumped as such
by pg_dump (the same as all other tables in pg_catalog). But pg_dump would
of course need to dump the information stored in the tables - it would
put some "CREATE FULLTEXT ... " statements into your dump. Its really
the same as for any other database object like a function, a type, ...

greetings, Florian Pflug