Thread: Naming of the prefab snowball stemmer dictionaries

Naming of the prefab snowball stemmer dictionaries

From
Tom Lane
Date:
I notice that the existing tsearch documentation that we've imported
fairly consistently refers to Snowball dictionaries with names like
"en_stem", "ru_stem", etc.  However, CVS HEAD is set up to create them
with names "english", "russian", etc.  As I've been absorbing more of
the docs I'm starting to wonder whether this is a good idea.  ISTM
that these names encourage a novice to think that the one dictionary
is all you could need for a given language; and there are enough
examples of more-complex setups in the docs to make it clear that
in fact Snowball is not the be-all and end-all of dictionaries.

I'm thinking that going back to the old naming convention (or something
like it --- maybe "english_stem", "russian_stem", etc) would be better.
It'd help to give the right impression, namely that these dictionaries
are a component of a solution but not necessarily all you need.

Thoughts?
        regards, tom lane


Re: Naming of the prefab snowball stemmer dictionaries

From
"A.M."
Date:
On Aug 22, 2007, at 11:10 , Tom Lane wrote:

> I notice that the existing tsearch documentation that we've imported
> fairly consistently refers to Snowball dictionaries with names like
> "en_stem", "ru_stem", etc.  However, CVS HEAD is set up to create them
> with names "english", "russian", etc.  As I've been absorbing more of
> the docs I'm starting to wonder whether this is a good idea.  ISTM
> that these names encourage a novice to think that the one dictionary
> is all you could need for a given language; and there are enough
> examples of more-complex setups in the docs to make it clear that
> in fact Snowball is not the be-all and end-all of dictionaries.
>
> I'm thinking that going back to the old naming convention (or  
> something
> like it --- maybe "english_stem", "russian_stem", etc) would be  
> better.
> It'd help to give the right impression, namely that these dictionaries
> are a component of a solution but not necessarily all you need.

Please use ISO 639 codes plus any qualifiers to reduce confusion.
http://en.wikipedia.org/wiki/List_of_ISO_639-1_codes

-M


Re: Naming of the prefab snowball stemmer dictionaries

From
"Zeugswetter Andreas ADI SD"
Date:
Sounds reasonable, but why exactly did we spell out "english" instead of "en" ?
Seems the abbrev is much easier to extract from LANG or browser prefs ...

Andreas

-----Ursprüngliche Nachricht-----
Von: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] Im Auftrag von Tom Lane
Gesendet: Mittwoch, 22. August 2007 17:11
An: Oleg Bartunov; Teodor Sigaev
Cc: pgsql-hackers@postgreSQL.org
Betreff: [HACKERS] Naming of the prefab snowball stemmer dictionaries [bayes][heur]
Wichtigkeit: Niedrig

I notice that the existing tsearch documentation that we've imported fairly consistently refers to Snowball
dictionarieswith names like "en_stem", "ru_stem", etc.  However, CVS HEAD is set up to create them with names
"english","russian", etc.  As I've been absorbing more of the docs I'm starting to wonder whether this is a good idea.
ISTMthat these names encourage a novice to think that the one dictionary is all you could need for a given language;
andthere are enough examples of more-complex setups in the docs to make it clear that in fact Snowball is not the
be-alland end-all of dictionaries. 

I'm thinking that going back to the old naming convention (or something like it --- maybe "english_stem",
"russian_stem",etc) would be better. 
It'd help to give the right impression, namely that these dictionaries are a component of a solution but not
necessarilyall you need. 

Thoughts?
        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
               http://www.postgresql.org/about/donate


Re: Naming of the prefab snowball stemmer dictionaries

From
Tom Lane
Date:
"Zeugswetter Andreas ADI SD" <Andreas.Zeugswetter@s-itsolutions.at> writes:
> Sounds reasonable, but why exactly did we spell out "english" instead of "en" ?
> Seems the abbrev is much easier to extract from LANG or browser prefs ...

Mainly because we're following the upstream snowball project on the
naming.

I don't think that LANG is relevant to this.  If you had an application
that wanted to make a selection based on that, what it'd be trying to
set is a configuration name, not a dictionary name.
        regards, tom lane


Re: Naming of the prefab snowball stemmer dictionaries

From
Oleg Bartunov
Date:
On Wed, 22 Aug 2007, Tom Lane wrote:

> I notice that the existing tsearch documentation that we've imported
> fairly consistently refers to Snowball dictionaries with names like
> "en_stem", "ru_stem", etc.  However, CVS HEAD is set up to create them
> with names "english", "russian", etc.  As I've been absorbing more of
> the docs I'm starting to wonder whether this is a good idea.  ISTM
> that these names encourage a novice to think that the one dictionary
> is all you could need for a given language; and there are enough
> examples of more-complex setups in the docs to make it clear that
> in fact Snowball is not the be-all and end-all of dictionaries.
>
> I'm thinking that going back to the old naming convention (or something
> like it --- maybe "english_stem", "russian_stem", etc) would be better.
> It'd help to give the right impression, namely that these dictionaries
> are a component of a solution but not necessarily all you need.
>
> Thoughts?

I agree with you, old naming was more informative.

>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: You can help support the PostgreSQL project by donating at
>
>                http://www.postgresql.org/about/donate
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83