Thread: Proposal: syntax of operation with tsearch's configuration

Proposal: syntax of operation with tsearch's configuration

From

Teodor Sigaev

Date:

17 November 2006, 07:55:24

Hi!

Now we (Oleg and me) are working on moving tsearch into core.

Pls, review suggested syntax. Comments, suggestions, objections will be appreciated.

1) parser operation  (pg_ts_parser table)
CREATE PARSER prsname (START = funcname,GETTOKEN = funcname,END = funcname,LEXTYPES = funcname[ , HEADLINE = funcname
]
);

DROP PARSER [IF EXISTS] prsname [ CASCADE | RESTRICT ];
ALTER PARSER prsname RENAME TO newprsname;
COMMENT ON PARSER IS text;

2) dictionaries (pg_ts_dict)

CREATE DICTIONARY dictname (INIT = funcname,LEXIZE = funcname,OPT = text,
);

--create new dictionary as already existed but with different
-- options for example
CREATE DICTIONARY dictname [([ INIT = funcname ][ , LEXIZE = funcname ][ , OPT = text ]
)] LIKE template_dictname;

DROP DICTINARY [IF EXISTS] dictname [ CASCADE | RESTRICT ];
ALTER DICTIONARY dictname RENAME TO newdictname;
ALTER DICTIONARY dictname SET OPT=text;
COMMENT ON DICTIONARY IS text;

3) configuration (pg_ts_cfg [,pg_ts_cfgmap])
CREATE TSEARCH CONFIGURATION cfgname (PARSER = prsname[, LOCALE = localename]
);

--create new configuration and optionally copies
--map of lexeme's type to dictionaries
CREATE TSEARCH CONFIGURATION cfgname [(LOCALE = localename
)] LIKE template_cfg [WITH MAP];

DROP TSEARCH CONFIGURATION [IF EXISTS] cfgname [ CASCADE | RESTRICT ];
ALTER TSEARCH CONFIGURATION cfgname RENAME TO newcfgname;
ALTER TSEARCH CONFIGURATION cfgname SET LOCALE=localename;
ALTER TSEARCH CONFIGURATION cfgname SET PARSER=prsname;
COMMENT ON TSEARCH CONFIGURATION IS text;

4) operate mapping lexemes to list of dictionary
CREATE TSEARCH MAPPING ON cfgname FOR lexemetypename USE dictname1[, dictname2
[..] ];
DROP TSEARCH MAPPING [IF EXISTS] ON cfgname FOR lexemetypename;
ALTER TSEARCH MAPPING ON cfgname FOR lexemetypename USE dictname1[, dictname2
[..] ];


Next, tsearch's configuration will be readable by psql backslashed command (F
means fulltext):
\dF     - list of configurations
\dF PATTERN - describe configuration with used parser and lexeme's mapping
\dFd    - list of dictionaries
\dFd PATTERN   - describe dictionary
\dFp    - parser's list
\dFp PATETRN     - describe parser



-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Proposal: syntax of operation with tsearch's configuration

From

Tom Lane

Date:

17 November 2006, 13:12:51

Teodor Sigaev <teodor@sigaev.ru> writes:
> Now we (Oleg and me) are working on moving tsearch into core.
> Pls, review suggested syntax. Comments, suggestions, objections will be appreciated.

Is it really necessary to invent a batch of special-purpose commands?
Seems like this will add some thousands of lines of code and no actual
new functionality; not to mention loss of backwards compatibility for
existing tsearch2 users.
        regards, tom lane

Re: Proposal: syntax of operation with tsearch's configuration

From

Teodor Sigaev

Date:

17 November 2006, 14:09:42

Hmm, IMHO, it's needed for consistent interface: nobody adds new column to table 
by editing pg_class & pg_attribute, nobody looks for description of table by 
selection values from system table.


Tom Lane wrote:
> Teodor Sigaev <teodor@sigaev.ru> writes:
>> Now we (Oleg and me) are working on moving tsearch into core.
>> Pls, review suggested syntax. Comments, suggestions, objections will be appreciated.
> 
> Is it really necessary to invent a batch of special-purpose commands?
> Seems like this will add some thousands of lines of code and no actual
> new functionality; not to mention loss of backwards compatibility for
> existing tsearch2 users.
> 
>             regards, tom lane

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/

Re: Proposal: syntax of operation with tsearch's configuration

From

Andrew Dunstan

Date:

17 November 2006, 14:43:17

Teodor Sigaev wrote:
> Hmm, IMHO, it's needed for consistent interface: nobody adds new 
> column to table by editing pg_class & pg_attribute, nobody looks for 
> description of table by selection values from system table.
>
>
> Tom Lane wrote:
>> Teodor Sigaev <teodor@sigaev.ru> writes:
>>> Now we (Oleg and me) are working on moving tsearch into core.
>>> Pls, review suggested syntax. Comments, suggestions, objections will 
>>> be appreciated.
>>
>> Is it really necessary to invent a batch of special-purpose commands?
>> Seems like this will add some thousands of lines of code and no actual
>> new functionality; not to mention loss of backwards compatibility for
>> existing tsearch2 users.
>>
>>          
>

Thousands of lines seems a high estimate, but maybe I'm wrong. I guess 
an alternative would be to do this in some builtin functions, but that 
seems a tad unclean.

I am also a bit concerned that the names of the proposed objects 
(parser, dictionary) don't convey their purpose adequately. Maybe 
TS_DICTIONARY and TS_PARSER might be better if we in fact need to name them.

cheers

andrew

Re: Proposal: syntax of operation with tsearch's configuration

From

Oleg Bartunov

Date:

17 November 2006, 15:29:12

On Fri, 17 Nov 2006, Andrew Dunstan wrote:

> Teodor Sigaev wrote:
>> Hmm, IMHO, it's needed for consistent interface: nobody adds new column to 
>> table by editing pg_class & pg_attribute, nobody looks for description of 
>> table by selection values from system table.
>> 
>> 
>> Tom Lane wrote:
>>> Teodor Sigaev <teodor@sigaev.ru> writes:
>>>> Now we (Oleg and me) are working on moving tsearch into core.
>>>> Pls, review suggested syntax. Comments, suggestions, objections will be 
>>>> appreciated.
>>> 
>>> Is it really necessary to invent a batch of special-purpose commands?
>>> Seems like this will add some thousands of lines of code and no actual
>>> new functionality; not to mention loss of backwards compatibility for
>>> existing tsearch2 users.
>>>
>>> 
>> 
>
> Thousands of lines seems a high estimate, but maybe I'm wrong. I guess an 
> alternative would be to do this in some builtin functions, but that seems a 
> tad unclean.

As Teodor already wrote we want to be consistent with the current interface to
system catalog, as long as full text search is going to the pg core. 
We don't invent anything new, we just extending current user's interface
to support full text search.

>
> I am also a bit concerned that the names of the proposed objects (parser, 
> dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and 
> TS_PARSER might be better if we in fact need to name them.

this looks reasonable to me.

>
> cheers
>
> andrew
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>      subscribe-nomail command to majordomo@postgresql.org so that your
>      message can get through to the mailing list cleanly
>
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Proposal: syntax of operation with tsearch's configuration

From

Alvaro Herrera

Date:

17 November 2006, 16:03:17

Oleg Bartunov wrote:
> On Fri, 17 Nov 2006, Andrew Dunstan wrote:

> >I am also a bit concerned that the names of the proposed objects (parser, 
> >dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and 
> >TS_PARSER might be better if we in fact need to name them.
> 
> this looks reasonable to me.

Huh, but we don't use keywords with ugly abbreviations and underscores.
How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"?  (Using
"FULLTEXT" instead of "FULL TEXT" means you don't created common
reserved words, and furthermore you don't collide with an existing type
name.)

I also think the "thousands of lines" is an exaggeration :-)  The
grammar should take a couple dozen at most.  The rest of the code would
go to their own files.

We should also take the opportunity to discuss new keywords for the XML
support -- will we use new grammar, or functions?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Proposal: syntax of operation with tsearch's configuration

From

Peter Eisentraut

Date:

17 November 2006, 16:09:59

Alvaro Herrera wrote:
> We should also take the opportunity to discuss new keywords for the
> XML support -- will we use new grammar, or functions?

The XML stuff is defined in the SQL standard and there are existing 
implementations, so any nonstandard syntax is going to be significantly 
less useful.  (The other problem is that you can't implement most of 
the stuff in functions anyway.)

I don't see any comparable arguments about this full-text search stuff.  
In particular I don't see any arguments why a change would necessary at 
all, including why moving to core would be necessary in the first 
place.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Proposal: syntax of operation with tsearch's configuration

From

Andrew Dunstan

Date:

17 November 2006, 16:13:10

Alvaro Herrera wrote:
> Oleg Bartunov wrote:
>   
>> On Fri, 17 Nov 2006, Andrew Dunstan wrote:
>>     
>
>   
>>> I am also a bit concerned that the names of the proposed objects (parser, 
>>> dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and 
>>> TS_PARSER might be better if we in fact need to name them.
>>>       
>> this looks reasonable to me.
>>     
>
> Huh, but we don't use keywords with ugly abbreviations and underscores.
> How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"?  (Using
> "FULLTEXT" instead of "FULL TEXT" means you don't created common
> reserved words, and furthermore you don't collide with an existing type
> name.)
>   

good point. this works for me.

>
> We should also take the opportunity to discuss new keywords for the XML
> support -- will we use new grammar, or functions?
>
>   

Well, it will have to be keywords if we want to be able to do anything 
like the spec, IIRC.

cheers

andrew

Re: Proposal: syntax of operation with tsearch's configuration

From

Tom Lane

Date:

17 November 2006, 16:18:22

Peter Eisentraut <peter_e@gmx.net> writes:
> I don't see any comparable arguments about this full-text search stuff.  
> In particular I don't see any arguments why a change would necessary at 
> all, including why moving to core would be necessary in the first 
> place.

AFAIR the only argument in favor of that is basically a marketing one:
users perceive a feature as more real, or more supported, if it's in
core.  I don't find this argument especially compelling myself.
        regards, tom lane

Re: Proposal: syntax of operation with tsearch's configuration

From

Jeremy Drake

Date:

17 November 2006, 16:30:52

On Fri, 17 Nov 2006, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
> > I don't see any comparable arguments about this full-text search stuff.
> > In particular I don't see any arguments why a change would necessary at
> > all, including why moving to core would be necessary in the first
> > place.
>
> AFAIR the only argument in favor of that is basically a marketing one:
> users perceive a feature as more real, or more supported, if it's in
> core.  I don't find this argument especially compelling myself.

I am currently in the position that my hosting provider is apprehensive
about installing modules in contrib because they believe they are less
secure.  They cited (real or imagined) "security holes" as the reason they
would not install tsearch2, or any other contrib module.  This leaves me
without any fulltext indexing option, as it requires a superuser to
install.  I have currently worked around this by running my own postgres
instance from my home directory, as they provide shell access and allow
running background processes, but I was really happy when I heard that
tsearch2 was going to be integrated into core in 8.3.

I think I would settle for some sort of assurance somewhere by someone who
sounds authoritative that the contrib modules are not less secure than
postgres core, and are fully supported by the developers.  I think if  I
could point them at that, I may be able to convince them that it is safe.

>
>             regards, tom lane
>

Re: Proposal: syntax of operation with tsearch's configuration

From

Tom Lane

Date:

17 November 2006, 16:34:55

Alvaro Herrera <alvherre@commandprompt.com> writes:
> I also think the "thousands of lines" is an exaggeration :-)

I think a reasonable comparison point is the operator-class commands,
which are at least in the same general ballpark of complexity.
opclasscmds.c is currently 1075 lines, and that's not counting the
grammar additions, nor miscellaneous bits of support in places like
backend/nodes/, dependency.c if you expect to be able to DROP the
objects, namespace.c if they live in schemas, aclchk.c if they have
owners or permissions, comment.c, etc.  Teodor is proposing to add not
one but four new kinds of system objects.  In round numbers I would
bet that such a patch will add a lot closer to 10000 lines than 1000.

It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a
huge improvement over the previous ways of doing it --- but don't
underestimate the size of what we're talking about.
        regards, tom lane

Re: Proposal: syntax of operation with tsearch's configuration

From

Alvaro Herrera

Date:

17 November 2006, 16:42:46

Tom Lane wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > I also think the "thousands of lines" is an exaggeration :-)
> 
> I think a reasonable comparison point is the operator-class commands,
> which are at least in the same general ballpark of complexity.
> opclasscmds.c is currently 1075 lines, and that's not counting the
> grammar additions, nor miscellaneous bits of support in places like
> backend/nodes/, dependency.c if you expect to be able to DROP the
> objects, namespace.c if they live in schemas, aclchk.c if they have
> owners or permissions, comment.c, etc.  Teodor is proposing to add not
> one but four new kinds of system objects.  In round numbers I would
> bet that such a patch will add a lot closer to 10000 lines than 1000.
> 
> It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a
> huge improvement over the previous ways of doing it --- but don't
> underestimate the size of what we're talking about.

Hmm, actually the tsearch2 directory contains 16500 lines of code
(generated using David A. Wheeler's 'SLOCCount'), so I didn't doubt that
it's a big piece of code as a whole -- but I thought what was being
discussed was the size of the grammar changes, which is why I mentioned
the "a couple dozen" figure.

Having the supporting code in core does not make much of a difference
otherwise from having it in contrib, does it?

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Proposal: syntax of operation with tsearch's configuration

From

Oleg Bartunov

Date:

17 November 2006, 16:43:14

On Fri, 17 Nov 2006, Tom Lane wrote:

> Peter Eisentraut <peter_e@gmx.net> writes:
>> I don't see any comparable arguments about this full-text search stuff.
>> In particular I don't see any arguments why a change would necessary at
>> all, including why moving to core would be necessary in the first
>> place.
>
> AFAIR the only argument in favor of that is basically a marketing one:
> users perceive a feature as more real, or more supported, if it's in
> core.  I don't find this argument especially compelling myself.

marketing is not always "swear-word" :) We live in real world and there are
many situations where marketing is the deciding vote. Not all are 
Tom Lane, who could convince customer saying there is no difference 
between contrib module and core feature, or that PostgreSQL is a mature
database with fts add-on, which could be installed separately (with
supersuser rights).

I think, this is a good question for the next poll on postgresql.org.
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Proposal: syntax of operation with tsearch's configuration

From

Tom Lane

Date:

17 November 2006, 16:53:44

Alvaro Herrera <alvherre@commandprompt.com> writes:
> Tom Lane wrote:
>> It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a
>> huge improvement over the previous ways of doing it --- but don't
>> underestimate the size of what we're talking about.

> Hmm, actually the tsearch2 directory contains 16500 lines of code
> (generated using David A. Wheeler's 'SLOCCount'), so I didn't doubt that
> it's a big piece of code as a whole -- but I thought what was being
> discussed was the size of the grammar changes, which is why I mentioned
> the "a couple dozen" figure.

No, what I was on about was the cost of inventing custom-SQL-statement
manipulation of the catalog entries that drive tsearch2.  The analogy to
operator classes is fairly exact, because before 7.3 you had to
manipulate those using direct insertions of catalog entries.  The
manipulation commands are just about independent of the actual use of
the catalog entries --- my count of "support" didn't include any of the
planner or index AM code that actually uses operator classes, and in the
same way the existing tsearch2 code doesn't have any particular
relationship to this new code that'd have to be written to support the 
manipulation commands.

> Having the supporting code in core does not make much of a difference
> otherwise from having it in contrib, does it?

Given the nonextensibility of gram.y and keywords.c, it has to be in
core to even think about having special syntax :-(
        regards, tom lane

Re: Proposal: syntax of operation with tsearch's configuration

From

Ron Mayer

Date:

17 November 2006, 18:21:38

Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
>> I don't see any comparable arguments about this full-text search stuff.  
>> In particular I don't see any arguments why a change would necessary at 
>> all, including why moving to core would be necessary in the first 
>> place.
> 
> AFAIR the only argument in favor of that is basically a marketing one:
> users perceive a feature as more real, or more supported, if it's in
> core.  I don't find this argument especially compelling myself.

On the flip side of that argument - the more non-SQL-standard pieces
are in core, the more "non-real" other pieces non-in-core appear.
People seem to have little doubts regarding the CPAN, or Ruby Gems.
I believe because to a large part that's because a lot of very
important and well supported functionality exists outside of their
core distributions.  The less that's pre-baked into core, I think
the more people will be aware of the rich set of extensions postgresql
enables.

From a marketing point of view (should I have moved this to .advocacy),
it seems to me the biggest problem is the name "contrib".  If it were
called "optional" or "advanced" or "extra" I think it'd be seen less
suspiciously by hosting companies (who seem to have the biggest problem
with contrib) and we wouldn't need as many discussions of which contribs
to move into core.
 Ron M

Re: Proposal: syntax of operation with tsearch's configuration

From

"Nikolay Samokhvalov"

Date:

17 November 2006, 18:35:33

On 11/17/06, Peter Eisentraut <peter_e@gmx.net> wrote:
> Alvaro Herrera wrote:
> > We should also take the opportunity to discuss new keywords for the
> > XML support -- will we use new grammar, or functions?
>
> The XML stuff is defined in the SQL standard and there are existing
> implementations, so any nonstandard syntax is going to be significantly
> less useful.  (The other problem is that you can't implement most of
> the stuff in functions anyway.)

Yes, it's better not to mix XML syntax discussion and the Tsearch2
configuration syntax discussion in one place. Not only because these
are different things - here we have a discussion of syntax for catalog
manipulation commands, when XML stuff (at least that I was working on
during summer and am going to continue) is about functionality itself.
And in case of XML we have some things to stick to - the standard
papers and existent implementations...

However, Alvaro made me to recall my old thoughts - when I just
started to use Tsearch2 I was wondering why should I explicitly create
column for index - in other databases I shouldn't do this. Indeed,
this is the index and, ideally, all I have to do is to write "CREATE
INDEX ..." only, maybe with some custom (fulltext-special) additions
(and something like "fulltext" instead of "gist").

So, is it possible to let people to avoid explicit "ALTER TABLE .. ADD
COLUMN ... tsvector"? Maybe it would be a "syntax sugar" too, but I
suppose that (especially for postgres-novices) it would simplify the
overall use of Tsearch. For me such changes are more important than
syntax for manipulations with catalog (i.e., I would live with "insert
into ts_cfg ..." one or two years more :-) ). However, I'm sure that
Oleg and Teodor already considered this feature and there should be
some things that prevent from letting users write only "CREATE INDEX"
w/o ALTERing tables...

>
> I don't see any comparable arguments about this full-text search stuff.
> In particular I don't see any arguments why a change would necessary at
> all, including why moving to core would be necessary in the first
> place.

Many hosters with PostgreSQL support (e.g. goDaddy - one of the
biggest hosters) don't provide any contrib module - so people have to
live w/o fulltext search. Then, many sysadmins are afraid of the word
"contrib"... So, there is no doubt for me that adding to core is
really good thing :-)

-- 
Best regards,
Nikolay

Re: Proposal: syntax of operation with tsearch's configuration

From

Martijn van Oosterhout

Date:

17 November 2006, 19:14:11

On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
> > Having the supporting code in core does not make much of a difference
> > otherwise from having it in contrib, does it?
>
> Given the nonextensibility of gram.y and keywords.c, it has to be in
> core to even think about having special syntax :-(

Has anyone ever heard of extensible grammers? Just thinking wildly, you
could decree that commands beginning with @ are extensions and are parsed
by the module listed next. Then your command set becomes:

@tsearch CREATE PARSER ....

Then contrib modules can add their own parser. You'd have the overhead
of multiple lex/yacc parsers, but you wouldn't have to change the main
parser for every extension.

Has anyone ever heard of something like this?

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: Proposal: syntax of operation with tsearch's configuration

From

Tom Lane

Date:

17 November 2006, 19:37:30

Martijn van Oosterhout <kleptog@svana.org> writes:
> On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
>> Given the nonextensibility of gram.y and keywords.c, it has to be in
>> core to even think about having special syntax :-(

> Has anyone ever heard of extensible grammers?

Yeah, I worked with systems that could do that at Hewlett-Packard, nigh
thirty years ago ... but they were much less pleasant to use than bison,
and if memory serves, slower and more limited in what they could parse
(something narrower than LALR(1), IIRC, which would make certain parts
of SQL even hairier to parse than they are now).  I'm not in a big hurry
to go there, even though it would certainly take some of the steam out
of "I want this in core" arguments.

> ... decree that commands beginning with @ are extensions and are parsed
> by the module listed next. Then your command set becomes:

> @tsearch CREATE PARSER ....

This'd only work well for trivial standalone commands; as a counterexample
consider CREATE INDEX, which requires access to the core sub-grammars
for typename and expression.  The SQL2003 XML additions couldn't be
handled this way either.
        regards, tom lane

Re: Proposal: syntax of operation with tsearch's configuration

From

Peter Eisentraut

Date:

18 November 2006, 03:30:59

Jeremy Drake wrote:
> I am currently in the position that my hosting provider is
> apprehensive about installing modules in contrib because they believe
> they are less secure.

Using irrational and unfounded statements one can of course make 
arguments for just about anything, but that won't help us.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Proposal: syntax of operation with tsearch's configuration

From

Peter Eisentraut

Date:

18 November 2006, 03:37:48

Oleg Bartunov wrote:
> marketing is not always "swear-word" :) We live in real world and
> there are many situations where marketing is the deciding vote.

I don't know about you, but I market PostgreSQL partially using

1. sane design, not driven by random demands
2. extensibility

which would be completely contradicted by moving any module into core 
for "marketing" reasons.

> Not 
> all are Tom Lane, who could convince customer saying there is no
> difference between contrib module and core feature, or that
> PostgreSQL is a mature database with fts add-on, which could be
> installed separately (with supersuser rights).

It's not like PostgreSQL is the first software product in the world to 
provide a module or plugin mechanism.  (It is incidentally the first 
DBMS to do so.)  People who refuse to understand that are idiots, and 
we don't design for idiots.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Proposal: syntax of operation with tsearch'sconfiguration

From

"Simon Riggs"

Date:

18 November 2006, 06:36:33

On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote:
> On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
> > > Having the supporting code in core does not make much of a difference
> > > otherwise from having it in contrib, does it?
> > 
> > Given the nonextensibility of gram.y and keywords.c, it has to be in
> > core to even think about having special syntax :-(
> 
> Has anyone ever heard of extensible grammers? 

(not specifically answering Martijn...)

The main thought for me on this thread is: Why do we need to invent
*any* grammar to make this work? Why not just use functions?

For PITR we have pg_start_backup() rather than BACKUP START. For
advisory locks we have pg_advisory_lock()

What's wrong with having pg_tsearch_ functions to do everything? There's
nothing wrong with additional catalog tables/columns that are
manipulated by function calls only. We have that already - look at
pg_stat_reset() - no grammar stuff there.

Anybody with an Oracle or SQLServer background is used to seeing system
functions available as function calls; as I've observed above, so are
we. We should keep the grammar clean to allow a very close adherence to
SQL standards, IMHO.

I would like to see Oleg and Teodor's good work come into core, but I
don't want to see bucketfuls of new grammar issues.

--  Simon Riggs              EnterpriseDB   http://www.enterprisedb.com

Re: Proposal: syntax of operation with tsearch'sconfiguration

From

Oleg Bartunov

Date:

18 November 2006, 07:13:09

On Sat, 18 Nov 2006, Simon Riggs wrote:

> On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote:
>> On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote:
>>>> Having the supporting code in core does not make much of a difference
>>>> otherwise from having it in contrib, does it?
>>>
>>> Given the nonextensibility of gram.y and keywords.c, it has to be in
>>> core to even think about having special syntax :-(
>>
>> Has anyone ever heard of extensible grammers?
>
> (not specifically answering Martijn...)
>
> The main thought for me on this thread is: Why do we need to invent
> *any* grammar to make this work? Why not just use functions?
>
> For PITR we have pg_start_backup() rather than BACKUP START. For
> advisory locks we have pg_advisory_lock()
>
> What's wrong with having pg_tsearch_ functions to do everything? There's
> nothing wrong with additional catalog tables/columns that are
> manipulated by function calls only. We have that already - look at
> pg_stat_reset() - no grammar stuff there.
>
> Anybody with an Oracle or SQLServer background is used to seeing system
> functions available as function calls; as I've observed above, so are
> we. We should keep the grammar clean to allow a very close adherence to
> SQL standards, IMHO.
>
> I would like to see Oleg and Teodor's good work come into core, but I
> don't want to see bucketfuls of new grammar issues.

Summarizing, we have two questions -

1. Will tsearch comes to the core
2. Do we need grammar changes

I hope, we have consensus about 1. - we need fts as a core feature. 
Second question is not very principal, that's why we asked -hackers.
So, if we'll not touch grammar, are there any issues with tsearch2 in core ?
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Proposal: syntax of operation with tsearch'sconfiguration

From

Peter Eisentraut

Date:

18 November 2006, 08:48:05

Oleg Bartunov wrote:
> So, if we'll not touch grammar, are there any issues with tsearch2 in
> core ?

Are there any issues with tsearch2 not in core?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

Re: Proposal: syntax of operation with tsearch'sconfiguration

From

Markus Schiltknecht

Date:

18 November 2006, 13:28:23

Hi,

Peter Eisentraut wrote:
> Are there any issues with tsearch2 not in core?

I have run into troubles when restoring a dump, especially across 
different versions of PostgreSQL and tsearch2. Mainly because pg_ts_* 
are not system tables and thus need to be restored or installed separately.

And there still is the packaging issue which needs to be addressed. It's 
not complicated, but a PITA to compile stemmers and setup custom 
dictionaries.

What's really needed IMO is a clever packaging, including stemmers and 
dictionaries for as many languages as we can come up with. So on a 
debian system, it should become as simple as:

apt-get install postgresql-contrib-8.3
apt-get install postgresql-language-pack-english-8.3
apt-get install postgresql-language-pack-german-8.3
apt-get install postgresql-language-russian-german-8.3

Inclusion into core surely does not help with that.

Relabeling contrib to modules or extras or something would probably give 
some people a warm fuzzy feeling when installing. OTOH, these are 
probably the very same people who get excited about tsearch2 in core, so 
if we want to satisfy them, we better put it right into core... I dunno.

Regards

Markus

Re: Proposal: syntax of operation with

From

"Andrew Dunstan"

Date:

18 November 2006, 14:38:02

Peter Eisentraut wrote:
> Oleg Bartunov wrote:
>> So, if we'll not touch grammar, are there any issues with tsearch2 in
>> core ?
>
> Are there any issues with tsearch2 not in core?
>

Quite apart from anything else, it really needs documentation of the
standard we give other core features.

I think if a feature will be of sufficiently general use it should be a
candidate for inclusion, and text search certainly comes within that
category in my mind.

cheers

andrew

Re: Proposal: syntax of operation with tsearch'sconfiguration

From

Oleg Bartunov

Date:

18 November 2006, 14:48:12

On Sat, 18 Nov 2006, Andrew Dunstan wrote:

> Peter Eisentraut wrote:
>> Oleg Bartunov wrote:
>>> So, if we'll not touch grammar, are there any issues with tsearch2 in
>>> core ?
>>
>> Are there any issues with tsearch2 not in core?
>>
>
>
> Quite apart from anything else, it really needs documentation of the
> standard we give other core features.


Sure. I just learned how to built (successfully) pg documentation and
researching on what's documentation standard. Should we need to write 
separate full text search chapter and/or add description to relevant
chapters.

>
> I think if a feature will be of sufficiently general use it should be a
> candidate for inclusion, and text search certainly comes within that
> category in my mind.

It could helps us in Pg-MySQL discussions, at least, since we beat 
mysql's fts :)
    Regards,        Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: Proposal: syntax of operation with

From

Stefan Kaltenbrunner

Date:

18 November 2006, 16:03:31

Andrew Dunstan wrote:
> Peter Eisentraut wrote:
>> Oleg Bartunov wrote:
>>> So, if we'll not touch grammar, are there any issues with tsearch2 in
>>> core ?
>> Are there any issues with tsearch2 not in core?
>>
> 
> 
> Quite apart from anything else, it really needs documentation of the
> standard we give other core features.
> 
> I think if a feature will be of sufficiently general use it should be a
> candidate for inclusion, and text search certainly comes within that
> category in my mind.

I agree here - full text search is of general use (and a very often
requested) feature - including it in core will both help us in marketing
postgresql (which should notbe seen as a "bad" thing at all) and more to
the point it provides an in-core user and showcase for two very powerful
and innovative technologies - GIST and GIN.

Stefan

Re: Proposal: syntax of operation with tsearch's configuration

From

Stefan Kaltenbrunner

Date:

18 November 2006, 16:14:06

Alvaro Herrera wrote:
> Oleg Bartunov wrote:
>> On Fri, 17 Nov 2006, Andrew Dunstan wrote:
> 
>>> I am also a bit concerned that the names of the proposed objects (parser, 
>>> dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and 
>>> TS_PARSER might be better if we in fact need to name them.
>> this looks reasonable to me.
> 
> Huh, but we don't use keywords with ugly abbreviations and underscores.
> How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"?  (Using
> "FULLTEXT" instead of "FULL TEXT" means you don't created common
> reserved words, and furthermore you don't collide with an existing type
> name.)

sounds fine

> 
> I also think the "thousands of lines" is an exaggeration :-)  The
> grammar should take a couple dozen at most.  The rest of the code would
> go to their own files.
> 
> We should also take the opportunity to discuss new keywords for the XML
> support -- will we use new grammar, or functions?
> 

that is a good question and we should decide on a direction for that -
we already have a feature in shipping code that causes quite some
confusion in that regard(autovacuum).
What see I from supporting/consulting people is that there are more and
more people adapting autovacuum for there databases but those with
complex ones want to override them on a per table base.
We already provide a rather crude interface for that - namely manually
inserting some rows into a system table which is confusing the heck out
of people (they are either responding with "why is there now ALTER
AUTOVACUUM SET ..." or and equivalent pg_* function for that).
I'm not too sure what the most suitable interface for that would be but
finding a consistent solution for that might be good nevertheless.

Stefan

Re: Proposal: syntax of operation with

From

Bruce Momjian

Date:

20 November 2006, 18:18:24

Peter Eisentraut wrote:
> Oleg Bartunov wrote:
> > So, if we'll not touch grammar, are there any issues with tsearch2 in
> > core ?
> 
> Are there any issues with tsearch2 not in core?

No, but many think the idea of moving well-established code from
/contrib into the backend is true for tsearch2 once it works for
multi-byte encodings.

--  Bruce Momjian   bruce@momjian.us EnterpriseDB    http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +