Thread: A couple of tsearch loose ends

A couple of tsearch loose ends

From
Tom Lane
Date:
There are a couple of naming issues that I left untouched while
reviewing the tsearch patch, but wanted to bring up for discussion.

One thing that had me confused for awhile is that the patch uses
the word "template" in two different ways.  The main use is that a
"template" is an object encapsulating the superuser-only aspects of
defining a dictionary.  When you do CREATE TEXT SEARCH DICTIONARY
you have to specify a template to base it on.  So in this context
a dictionary and its template are different kinds of objects, and
there's a persistent connection between them.

On the other hand, CREATE TEXT SEARCH CONFIGURATION also uses the
word "template", but in this case it's an optional specification
of an existing configuration that gets copied.  So here, the config
and the template are the same kind of object, and there's no
connection between them after the copy is made.

This seems a bit confusing, and I wonder whether we ought not
change the terminology for one thing or the other.  I don't
particularly want to rename text search templates ... that would
be quite a bit of work at this point ... so what I'd suggest is
that the option to CREATE TEXT SEARCH CONFIGURATION be renamed
"COPY" instead of "TEMPLATE".  Another thought here is that I'm
inclined to drop the "with map" option and just always copy the
source configuration exactly.  If you don't want the map, the
only other information the source can provide is a parser name,
which you might as well just give directly.

The other thing that was bugging me was that a lot of the dictionary
types have init options that are named things like DictFile, AffFile,
etc.  As I mentioned before, I dislike the fact that these things are
out in the filesystem rather than inside the database, and hope that
that will change eventually.  So I think that these names are not
future-proof and should be altered to not use the word "file";
especially so in view of the fact that as committed, the patch doesn't
let you specify a path name for them.  I already did that to StopFile,
which is now StopWords, but did not touch the other dictionary options.
I'm not sure what to do with DictFile, because that doesn't seem to have
any special meaning at all once you take out "file" ...

Comments?
        regards, tom lane


Re: A couple of tsearch loose ends

From
"Pavel Stehule"
Date:
Hello

>
> The other thing that was bugging me was that a lot of the dictionary
> types have init options that are named things like DictFile, AffFile,
> etc.  As I mentioned before, I dislike the fact that these things are
> out in the filesystem rather than inside the database, and hope that
> that will change eventually.  So I think that these names are not
> future-proof and should be altered to not use the word "file";
> especially so in view of the fact that as committed, the patch doesn't
> let you specify a path name for them.  I already did that to StopFile,
> which is now StopWords, but did not touch the other dictionary options.
> I'm not sure what to do with DictFile, because that doesn't seem to have
> any special meaning at all once you take out "file" ...
>

and what  dictionary based languages?

Regards
Pavel Stehule


Re: A couple of tsearch loose ends

From
Teodor Sigaev
Date:
> "COPY" instead of "TEMPLATE".  Another thought here is that I'm
> inclined to drop the "with map" option and just always copy the
> source configuration exactly.  If you don't want the map, the
> only other information the source can provide is a parser name,
> which you might as well just give directly.

I havn't any objections. "with map" was introduced when another options  was existed - locale and default flag.

> 
> The other thing that was bugging me was that a lot of the dictionary
> types have init options that are named things like DictFile, AffFile,
> etc.  As I mentioned before, I dislike the fact that these things are
> out in the filesystem rather than inside the database, and hope that
> that will change eventually.  So I think that these names are not

DictFile and AffFile are files of ispell ( or derived from it ) 
dictionaries, we don't manage that files - they require a lot of 
lingustic knowledge which we don't have and I don't hope that there is 
such man in pgsql community. So, we just use they.

Managing of stop words are much more simple, so list may be stored in 
database, not in file.


Re: A couple of tsearch loose ends

From
Tom Lane
Date:
Teodor Sigaev <teodor@sigaev.ru> writes:
> I havn't any objections. "with map" was introduced when another options 
>   was existed - locale and default flag.

OK, I'll make that happen.

>> The other thing that was bugging me was that a lot of the dictionary
>> types have init options that are named things like DictFile, AffFile,
>> etc.

> DictFile and AffFile are files of ispell ( or derived from it ) 
> dictionaries, we don't manage that files - they require a lot of 
> lingustic knowledge which we don't have and I don't hope that there is 
> such man in pgsql community. So, we just use they.

Hmm ... I suppose, but I'd still prefer that the option names didn't
include the word "file".

Also, while revising the reference pages for the syntax changes I made,
I realized that there's further simplification possible for the
dictionary commands.  I changed these commands to use the same
"definition list" construct that's used by CREATE OPERATOR and such.
It has the nice property that the option "keywords" aren't actually
keywords in the eyes of the grammar, they're just any identifiers.
So what we have got as of CVS HEAD is

CREATE TEXT SEARCH DICTIONARY name (   TEMPLATE = template   [, OPTION = init_options ]
)

ALTER TEXT SEARCH DICTIONARY name (   OPTION = init_options
)

where "init_options" is supposed to be a string literal containing stuff
like'Language=swedish, StopWords=swedish'

When you look at it, this is downright silly.  Why don't we flatten
the two levels together and write something like

CREATE TEXT SEARCH DICTIONARY swedish (   TEMPLATE = snowball,   LANGUAGE = swedish,   STOPWORDS = swedish
);

The original implementation couldn't do that but it's easy in the
definition-list grammar.  This is even more useful for ALTER, because
it'd be possible to change the value of one option without having to
write out the values of all the others.  What I'd suggest is that
we adopt the convention that an option is dropped if its name appears
with no value, otherwise it's kept unless overridden with a new value.
So after

ALTER TEXT SEARCH DICTIONARY swedish (   STOPWORDS
);

this dictionary would have LANGUAGE = swedish and no stopwords option.

Any objections to changing it like that?
        regards, tom lane


Re: A couple of tsearch loose ends

From
Bruce Momjian
Date:
Tom Lane wrote:
> There are a couple of naming issues that I left untouched while
> reviewing the tsearch patch, but wanted to bring up for discussion.
> 
> One thing that had me confused for awhile is that the patch uses
> the word "template" in two different ways.  The main use is that a
> "template" is an object encapsulating the superuser-only aspects of
> defining a dictionary.  When you do CREATE TEXT SEARCH DICTIONARY
> you have to specify a template to base it on.  So in this context
> a dictionary and its template are different kinds of objects, and
> there's a persistent connection between them.

What has me concerned is the idea of database templates being different
from text search dictionary templates?  Why can't they function the same
way?

> On the other hand, CREATE TEXT SEARCH CONFIGURATION also uses the
> word "template", but in this case it's an optional specification
> of an existing configuration that gets copied.  So here, the config
> and the template are the same kind of object, and there's no
> connection between them after the copy is made.
> 
> This seems a bit confusing, and I wonder whether we ought not
> change the terminology for one thing or the other.  I don't
> particularly want to rename text search templates ... that would
> be quite a bit of work at this point ... so what I'd suggest is
> that the option to CREATE TEXT SEARCH CONFIGURATION be renamed
> "COPY" instead of "TEMPLATE".  Another thought here is that I'm
> inclined to drop the "with map" option and just always copy the
> source configuration exactly.  If you don't want the map, the
> only other information the source can provide is a parser name,
> which you might as well just give directly.

Agreed on the use of COPY.  I already pointed out this confusion in a
previous email.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: A couple of tsearch loose ends

From
Oleg Bartunov
Date:
On Tue, 21 Aug 2007, Tom Lane wrote:

> When you look at it, this is downright silly.  Why don't we flatten
> the two levels together and write something like
>
> CREATE TEXT SEARCH DICTIONARY swedish (
>    TEMPLATE = snowball,
>    LANGUAGE = swedish,
>    STOPWORDS = swedish
> );


Dictionary is a program with its own options, so we can't know in advance
what actual options it uses. We can reserve some options, though.
This is a very useful feature.
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83


Re: A couple of tsearch loose ends

From
Dimitri Fontaine
Date:
Hi list,

Le mardi 21 août 2007, Tom Lane a écrit :
> CREATE TEXT SEARCH DICTIONARY swedish (
>     TEMPLATE = snowball,
>     LANGUAGE = swedish,
>     STOPWORDS = swedish
> );
>
> ALTER TEXT SEARCH DICTIONARY swedish (
>     STOPWORDS
> );
>
> this dictionary would have LANGUAGE = swedish and no stopwords option.
>
> Any objections to changing it like that?

I don't understand why this ALTER variation is so different from existing
ones, but maybe the following syntax can't work: ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET swedish;

For dropping an option, could one of those commands do? ALTER TEXT SEARCH DICTIONARY swedish DROP STOPWORDS; ALTER TEXT
SEARCHDICTIONARY swedish ALTER STOPWORDS SET NULL; 

Not sure if it's doable or if it really looks more like other ALTER commands,
but I think I'd like it more this way :)

Hope this helps,
--
dim

Re: A couple of tsearch loose ends

From
Tom Lane
Date:
Dimitri Fontaine <dfontaine@hi-media.com> writes:
> I don't understand why this ALTER variation is so different from existing=20
> ones, but maybe the following syntax can't work:
>   ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET swedish;

You'd have to repeat the whole command for each option to be changed,
which given the amount of typing involved seems a bit unpleasant.

There are also historical differences between what is allowed by
the SET var = value syntax and what is allowed in the
parenthesized-option-list syntax.  Introducing an inconsistency between
ALTER and CREATE doesn't seem appetizing.

(BTW, does anyone want to teach psql's tab-completion about the new
text search statements?)
        regards, tom lane


Re: A couple of tsearch loose ends

From
Stefan Kaltenbrunner
Date:
Tom Lane wrote:
> Dimitri Fontaine <dfontaine@hi-media.com> writes:
>> I don't understand why this ALTER variation is so different from existing=20
>> ones, but maybe the following syntax can't work:
>>   ALTER TEXT SEARCH DICTIONARY swedish ALTER STOPWORDS SET swedish;
> 
> You'd have to repeat the whole command for each option to be changed,
> which given the amount of typing involved seems a bit unpleasant.
> 
> There are also historical differences between what is allowed by
> the SET var = value syntax and what is allowed in the
> parenthesized-option-list syntax.  Introducing an inconsistency between
> ALTER and CREATE doesn't seem appetizing.
> 
> (BTW, does anyone want to teach psql's tab-completion about the new
> text search statements?)

I will take a stab at doing that ...


Stefan