Thread: Proposal: syntax of operation with tsearch's configuration
Hi! Now we (Oleg and me) are working on moving tsearch into core. Pls, review suggested syntax. Comments, suggestions, objections will be appreciated. 1) parser operation (pg_ts_parser table) CREATE PARSER prsname (START = funcname,GETTOKEN = funcname,END = funcname,LEXTYPES = funcname[ , HEADLINE = funcname ] ); DROP PARSER [IF EXISTS] prsname [ CASCADE | RESTRICT ]; ALTER PARSER prsname RENAME TO newprsname; COMMENT ON PARSER IS text; 2) dictionaries (pg_ts_dict) CREATE DICTIONARY dictname (INIT = funcname,LEXIZE = funcname,OPT = text, ); --create new dictionary as already existed but with different -- options for example CREATE DICTIONARY dictname [([ INIT = funcname ][ , LEXIZE = funcname ][ , OPT = text ] )] LIKE template_dictname; DROP DICTINARY [IF EXISTS] dictname [ CASCADE | RESTRICT ]; ALTER DICTIONARY dictname RENAME TO newdictname; ALTER DICTIONARY dictname SET OPT=text; COMMENT ON DICTIONARY IS text; 3) configuration (pg_ts_cfg [,pg_ts_cfgmap]) CREATE TSEARCH CONFIGURATION cfgname (PARSER = prsname[, LOCALE = localename] ); --create new configuration and optionally copies --map of lexeme's type to dictionaries CREATE TSEARCH CONFIGURATION cfgname [(LOCALE = localename )] LIKE template_cfg [WITH MAP]; DROP TSEARCH CONFIGURATION [IF EXISTS] cfgname [ CASCADE | RESTRICT ]; ALTER TSEARCH CONFIGURATION cfgname RENAME TO newcfgname; ALTER TSEARCH CONFIGURATION cfgname SET LOCALE=localename; ALTER TSEARCH CONFIGURATION cfgname SET PARSER=prsname; COMMENT ON TSEARCH CONFIGURATION IS text; 4) operate mapping lexemes to list of dictionary CREATE TSEARCH MAPPING ON cfgname FOR lexemetypename USE dictname1[, dictname2 [..] ]; DROP TSEARCH MAPPING [IF EXISTS] ON cfgname FOR lexemetypename; ALTER TSEARCH MAPPING ON cfgname FOR lexemetypename USE dictname1[, dictname2 [..] ]; Next, tsearch's configuration will be readable by psql backslashed command (F means fulltext): \dF - list of configurations \dF PATTERN - describe configuration with used parser and lexeme's mapping \dFd - list of dictionaries \dFd PATTERN - describe dictionary \dFp - parser's list \dFp PATETRN - describe parser -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/ -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev <teodor@sigaev.ru> writes: > Now we (Oleg and me) are working on moving tsearch into core. > Pls, review suggested syntax. Comments, suggestions, objections will be appreciated. Is it really necessary to invent a batch of special-purpose commands? Seems like this will add some thousands of lines of code and no actual new functionality; not to mention loss of backwards compatibility for existing tsearch2 users. regards, tom lane
Hmm, IMHO, it's needed for consistent interface: nobody adds new column to table by editing pg_class & pg_attribute, nobody looks for description of table by selection values from system table. Tom Lane wrote: > Teodor Sigaev <teodor@sigaev.ru> writes: >> Now we (Oleg and me) are working on moving tsearch into core. >> Pls, review suggested syntax. Comments, suggestions, objections will be appreciated. > > Is it really necessary to invent a batch of special-purpose commands? > Seems like this will add some thousands of lines of code and no actual > new functionality; not to mention loss of backwards compatibility for > existing tsearch2 users. > > regards, tom lane -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
Teodor Sigaev wrote: > Hmm, IMHO, it's needed for consistent interface: nobody adds new > column to table by editing pg_class & pg_attribute, nobody looks for > description of table by selection values from system table. > > > Tom Lane wrote: >> Teodor Sigaev <teodor@sigaev.ru> writes: >>> Now we (Oleg and me) are working on moving tsearch into core. >>> Pls, review suggested syntax. Comments, suggestions, objections will >>> be appreciated. >> >> Is it really necessary to invent a batch of special-purpose commands? >> Seems like this will add some thousands of lines of code and no actual >> new functionality; not to mention loss of backwards compatibility for >> existing tsearch2 users. >> >> > Thousands of lines seems a high estimate, but maybe I'm wrong. I guess an alternative would be to do this in some builtin functions, but that seems a tad unclean. I am also a bit concerned that the names of the proposed objects (parser, dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and TS_PARSER might be better if we in fact need to name them. cheers andrew
On Fri, 17 Nov 2006, Andrew Dunstan wrote: > Teodor Sigaev wrote: >> Hmm, IMHO, it's needed for consistent interface: nobody adds new column to >> table by editing pg_class & pg_attribute, nobody looks for description of >> table by selection values from system table. >> >> >> Tom Lane wrote: >>> Teodor Sigaev <teodor@sigaev.ru> writes: >>>> Now we (Oleg and me) are working on moving tsearch into core. >>>> Pls, review suggested syntax. Comments, suggestions, objections will be >>>> appreciated. >>> >>> Is it really necessary to invent a batch of special-purpose commands? >>> Seems like this will add some thousands of lines of code and no actual >>> new functionality; not to mention loss of backwards compatibility for >>> existing tsearch2 users. >>> >>> >> > > Thousands of lines seems a high estimate, but maybe I'm wrong. I guess an > alternative would be to do this in some builtin functions, but that seems a > tad unclean. As Teodor already wrote we want to be consistent with the current interface to system catalog, as long as full text search is going to the pg core. We don't invent anything new, we just extending current user's interface to support full text search. > > I am also a bit concerned that the names of the proposed objects (parser, > dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and > TS_PARSER might be better if we in fact need to name them. this looks reasonable to me. > > cheers > > andrew > > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov wrote: > On Fri, 17 Nov 2006, Andrew Dunstan wrote: > >I am also a bit concerned that the names of the proposed objects (parser, > >dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and > >TS_PARSER might be better if we in fact need to name them. > > this looks reasonable to me. Huh, but we don't use keywords with ugly abbreviations and underscores. How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"? (Using "FULLTEXT" instead of "FULL TEXT" means you don't created common reserved words, and furthermore you don't collide with an existing type name.) I also think the "thousands of lines" is an exaggeration :-) The grammar should take a couple dozen at most. The rest of the code would go to their own files. We should also take the opportunity to discuss new keywords for the XML support -- will we use new grammar, or functions? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera wrote: > We should also take the opportunity to discuss new keywords for the > XML support -- will we use new grammar, or functions? The XML stuff is defined in the SQL standard and there are existing implementations, so any nonstandard syntax is going to be significantly less useful. (The other problem is that you can't implement most of the stuff in functions anyway.) I don't see any comparable arguments about this full-text search stuff. In particular I don't see any arguments why a change would necessary at all, including why moving to core would be necessary in the first place. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Alvaro Herrera wrote: > Oleg Bartunov wrote: > >> On Fri, 17 Nov 2006, Andrew Dunstan wrote: >> > > >>> I am also a bit concerned that the names of the proposed objects (parser, >>> dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and >>> TS_PARSER might be better if we in fact need to name them. >>> >> this looks reasonable to me. >> > > Huh, but we don't use keywords with ugly abbreviations and underscores. > How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"? (Using > "FULLTEXT" instead of "FULL TEXT" means you don't created common > reserved words, and furthermore you don't collide with an existing type > name.) > good point. this works for me. > > We should also take the opportunity to discuss new keywords for the XML > support -- will we use new grammar, or functions? > > Well, it will have to be keywords if we want to be able to do anything like the spec, IIRC. cheers andrew
Peter Eisentraut <peter_e@gmx.net> writes: > I don't see any comparable arguments about this full-text search stuff. > In particular I don't see any arguments why a change would necessary at > all, including why moving to core would be necessary in the first > place. AFAIR the only argument in favor of that is basically a marketing one: users perceive a feature as more real, or more supported, if it's in core. I don't find this argument especially compelling myself. regards, tom lane
On Fri, 17 Nov 2006, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > I don't see any comparable arguments about this full-text search stuff. > > In particular I don't see any arguments why a change would necessary at > > all, including why moving to core would be necessary in the first > > place. > > AFAIR the only argument in favor of that is basically a marketing one: > users perceive a feature as more real, or more supported, if it's in > core. I don't find this argument especially compelling myself. I am currently in the position that my hosting provider is apprehensive about installing modules in contrib because they believe they are less secure. They cited (real or imagined) "security holes" as the reason they would not install tsearch2, or any other contrib module. This leaves me without any fulltext indexing option, as it requires a superuser to install. I have currently worked around this by running my own postgres instance from my home directory, as they provide shell access and allow running background processes, but I was really happy when I heard that tsearch2 was going to be integrated into core in 8.3. I think I would settle for some sort of assurance somewhere by someone who sounds authoritative that the contrib modules are not less secure than postgres core, and are fully supported by the developers. I think if I could point them at that, I may be able to convince them that it is safe. > > regards, tom lane >
Alvaro Herrera <alvherre@commandprompt.com> writes: > I also think the "thousands of lines" is an exaggeration :-) I think a reasonable comparison point is the operator-class commands, which are at least in the same general ballpark of complexity. opclasscmds.c is currently 1075 lines, and that's not counting the grammar additions, nor miscellaneous bits of support in places like backend/nodes/, dependency.c if you expect to be able to DROP the objects, namespace.c if they live in schemas, aclchk.c if they have owners or permissions, comment.c, etc. Teodor is proposing to add not one but four new kinds of system objects. In round numbers I would bet that such a patch will add a lot closer to 10000 lines than 1000. It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a huge improvement over the previous ways of doing it --- but don't underestimate the size of what we're talking about. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > I also think the "thousands of lines" is an exaggeration :-) > > I think a reasonable comparison point is the operator-class commands, > which are at least in the same general ballpark of complexity. > opclasscmds.c is currently 1075 lines, and that's not counting the > grammar additions, nor miscellaneous bits of support in places like > backend/nodes/, dependency.c if you expect to be able to DROP the > objects, namespace.c if they live in schemas, aclchk.c if they have > owners or permissions, comment.c, etc. Teodor is proposing to add not > one but four new kinds of system objects. In round numbers I would > bet that such a patch will add a lot closer to 10000 lines than 1000. > > It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a > huge improvement over the previous ways of doing it --- but don't > underestimate the size of what we're talking about. Hmm, actually the tsearch2 directory contains 16500 lines of code (generated using David A. Wheeler's 'SLOCCount'), so I didn't doubt that it's a big piece of code as a whole -- but I thought what was being discussed was the size of the grammar changes, which is why I mentioned the "a couple dozen" figure. Having the supporting code in core does not make much of a difference otherwise from having it in contrib, does it? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, 17 Nov 2006, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I don't see any comparable arguments about this full-text search stuff. >> In particular I don't see any arguments why a change would necessary at >> all, including why moving to core would be necessary in the first >> place. > > AFAIR the only argument in favor of that is basically a marketing one: > users perceive a feature as more real, or more supported, if it's in > core. I don't find this argument especially compelling myself. marketing is not always "swear-word" :) We live in real world and there are many situations where marketing is the deciding vote. Not all are Tom Lane, who could convince customer saying there is no difference between contrib module and core feature, or that PostgreSQL is a mature database with fts add-on, which could be installed separately (with supersuser rights). I think, this is a good question for the next poll on postgresql.org. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Alvaro Herrera <alvherre@commandprompt.com> writes: > Tom Lane wrote: >> It may be worth doing anyway --- certainly CREATE OPERATOR CLASS was a >> huge improvement over the previous ways of doing it --- but don't >> underestimate the size of what we're talking about. > Hmm, actually the tsearch2 directory contains 16500 lines of code > (generated using David A. Wheeler's 'SLOCCount'), so I didn't doubt that > it's a big piece of code as a whole -- but I thought what was being > discussed was the size of the grammar changes, which is why I mentioned > the "a couple dozen" figure. No, what I was on about was the cost of inventing custom-SQL-statement manipulation of the catalog entries that drive tsearch2. The analogy to operator classes is fairly exact, because before 7.3 you had to manipulate those using direct insertions of catalog entries. The manipulation commands are just about independent of the actual use of the catalog entries --- my count of "support" didn't include any of the planner or index AM code that actually uses operator classes, and in the same way the existing tsearch2 code doesn't have any particular relationship to this new code that'd have to be written to support the manipulation commands. > Having the supporting code in core does not make much of a difference > otherwise from having it in contrib, does it? Given the nonextensibility of gram.y and keywords.c, it has to be in core to even think about having special syntax :-( regards, tom lane
Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: >> I don't see any comparable arguments about this full-text search stuff. >> In particular I don't see any arguments why a change would necessary at >> all, including why moving to core would be necessary in the first >> place. > > AFAIR the only argument in favor of that is basically a marketing one: > users perceive a feature as more real, or more supported, if it's in > core. I don't find this argument especially compelling myself. On the flip side of that argument - the more non-SQL-standard pieces are in core, the more "non-real" other pieces non-in-core appear. People seem to have little doubts regarding the CPAN, or Ruby Gems. I believe because to a large part that's because a lot of very important and well supported functionality exists outside of their core distributions. The less that's pre-baked into core, I think the more people will be aware of the rich set of extensions postgresql enables. From a marketing point of view (should I have moved this to .advocacy), it seems to me the biggest problem is the name "contrib". If it were called "optional" or "advanced" or "extra" I think it'd be seen less suspiciously by hosting companies (who seem to have the biggest problem with contrib) and we wouldn't need as many discussions of which contribs to move into core. Ron M
On 11/17/06, Peter Eisentraut <peter_e@gmx.net> wrote: > Alvaro Herrera wrote: > > We should also take the opportunity to discuss new keywords for the > > XML support -- will we use new grammar, or functions? > > The XML stuff is defined in the SQL standard and there are existing > implementations, so any nonstandard syntax is going to be significantly > less useful. (The other problem is that you can't implement most of > the stuff in functions anyway.) Yes, it's better not to mix XML syntax discussion and the Tsearch2 configuration syntax discussion in one place. Not only because these are different things - here we have a discussion of syntax for catalog manipulation commands, when XML stuff (at least that I was working on during summer and am going to continue) is about functionality itself. And in case of XML we have some things to stick to - the standard papers and existent implementations... However, Alvaro made me to recall my old thoughts - when I just started to use Tsearch2 I was wondering why should I explicitly create column for index - in other databases I shouldn't do this. Indeed, this is the index and, ideally, all I have to do is to write "CREATE INDEX ..." only, maybe with some custom (fulltext-special) additions (and something like "fulltext" instead of "gist"). So, is it possible to let people to avoid explicit "ALTER TABLE .. ADD COLUMN ... tsvector"? Maybe it would be a "syntax sugar" too, but I suppose that (especially for postgres-novices) it would simplify the overall use of Tsearch. For me such changes are more important than syntax for manipulations with catalog (i.e., I would live with "insert into ts_cfg ..." one or two years more :-) ). However, I'm sure that Oleg and Teodor already considered this feature and there should be some things that prevent from letting users write only "CREATE INDEX" w/o ALTERing tables... > > I don't see any comparable arguments about this full-text search stuff. > In particular I don't see any arguments why a change would necessary at > all, including why moving to core would be necessary in the first > place. Many hosters with PostgreSQL support (e.g. goDaddy - one of the biggest hosters) don't provide any contrib module - so people have to live w/o fulltext search. Then, many sysadmins are afraid of the word "contrib"... So, there is no doubt for me that adding to core is really good thing :-) -- Best regards, Nikolay
On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote: > > Having the supporting code in core does not make much of a difference > > otherwise from having it in contrib, does it? > > Given the nonextensibility of gram.y and keywords.c, it has to be in > core to even think about having special syntax :-( Has anyone ever heard of extensible grammers? Just thinking wildly, you could decree that commands beginning with @ are extensions and are parsed by the module listed next. Then your command set becomes: @tsearch CREATE PARSER .... Then contrib modules can add their own parser. You'd have the overhead of multiple lex/yacc parsers, but you wouldn't have to change the main parser for every extension. Has anyone ever heard of something like this? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout <kleptog@svana.org> writes: > On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote: >> Given the nonextensibility of gram.y and keywords.c, it has to be in >> core to even think about having special syntax :-( > Has anyone ever heard of extensible grammers? Yeah, I worked with systems that could do that at Hewlett-Packard, nigh thirty years ago ... but they were much less pleasant to use than bison, and if memory serves, slower and more limited in what they could parse (something narrower than LALR(1), IIRC, which would make certain parts of SQL even hairier to parse than they are now). I'm not in a big hurry to go there, even though it would certainly take some of the steam out of "I want this in core" arguments. > ... decree that commands beginning with @ are extensions and are parsed > by the module listed next. Then your command set becomes: > @tsearch CREATE PARSER .... This'd only work well for trivial standalone commands; as a counterexample consider CREATE INDEX, which requires access to the core sub-grammars for typename and expression. The SQL2003 XML additions couldn't be handled this way either. regards, tom lane
Jeremy Drake wrote: > I am currently in the position that my hosting provider is > apprehensive about installing modules in contrib because they believe > they are less secure. Using irrational and unfounded statements one can of course make arguments for just about anything, but that won't help us. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Oleg Bartunov wrote: > marketing is not always "swear-word" :) We live in real world and > there are many situations where marketing is the deciding vote. I don't know about you, but I market PostgreSQL partially using 1. sane design, not driven by random demands 2. extensibility which would be completely contradicted by moving any module into core for "marketing" reasons. > Not > all are Tom Lane, who could convince customer saying there is no > difference between contrib module and core feature, or that > PostgreSQL is a mature database with fts add-on, which could be > installed separately (with supersuser rights). It's not like PostgreSQL is the first software product in the world to provide a module or plugin mechanism. (It is incidentally the first DBMS to do so.) People who refuse to understand that are idiots, and we don't design for idiots. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote: > On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote: > > > Having the supporting code in core does not make much of a difference > > > otherwise from having it in contrib, does it? > > > > Given the nonextensibility of gram.y and keywords.c, it has to be in > > core to even think about having special syntax :-( > > Has anyone ever heard of extensible grammers? (not specifically answering Martijn...) The main thought for me on this thread is: Why do we need to invent *any* grammar to make this work? Why not just use functions? For PITR we have pg_start_backup() rather than BACKUP START. For advisory locks we have pg_advisory_lock() What's wrong with having pg_tsearch_ functions to do everything? There's nothing wrong with additional catalog tables/columns that are manipulated by function calls only. We have that already - look at pg_stat_reset() - no grammar stuff there. Anybody with an Oracle or SQLServer background is used to seeing system functions available as function calls; as I've observed above, so are we. We should keep the grammar clean to allow a very close adherence to SQL standards, IMHO. I would like to see Oleg and Teodor's good work come into core, but I don't want to see bucketfuls of new grammar issues. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Sat, 18 Nov 2006, Simon Riggs wrote: > On Sat, 2006-11-18 at 00:13 +0100, Martijn van Oosterhout wrote: >> On Fri, Nov 17, 2006 at 03:53:35PM -0500, Tom Lane wrote: >>>> Having the supporting code in core does not make much of a difference >>>> otherwise from having it in contrib, does it? >>> >>> Given the nonextensibility of gram.y and keywords.c, it has to be in >>> core to even think about having special syntax :-( >> >> Has anyone ever heard of extensible grammers? > > (not specifically answering Martijn...) > > The main thought for me on this thread is: Why do we need to invent > *any* grammar to make this work? Why not just use functions? > > For PITR we have pg_start_backup() rather than BACKUP START. For > advisory locks we have pg_advisory_lock() > > What's wrong with having pg_tsearch_ functions to do everything? There's > nothing wrong with additional catalog tables/columns that are > manipulated by function calls only. We have that already - look at > pg_stat_reset() - no grammar stuff there. > > Anybody with an Oracle or SQLServer background is used to seeing system > functions available as function calls; as I've observed above, so are > we. We should keep the grammar clean to allow a very close adherence to > SQL standards, IMHO. > > I would like to see Oleg and Teodor's good work come into core, but I > don't want to see bucketfuls of new grammar issues. Summarizing, we have two questions - 1. Will tsearch comes to the core 2. Do we need grammar changes I hope, we have consensus about 1. - we need fts as a core feature. Second question is not very principal, that's why we asked -hackers. So, if we'll not touch grammar, are there any issues with tsearch2 in core ? Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg Bartunov wrote: > So, if we'll not touch grammar, are there any issues with tsearch2 in > core ? Are there any issues with tsearch2 not in core? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Hi, Peter Eisentraut wrote: > Are there any issues with tsearch2 not in core? I have run into troubles when restoring a dump, especially across different versions of PostgreSQL and tsearch2. Mainly because pg_ts_* are not system tables and thus need to be restored or installed separately. And there still is the packaging issue which needs to be addressed. It's not complicated, but a PITA to compile stemmers and setup custom dictionaries. What's really needed IMO is a clever packaging, including stemmers and dictionaries for as many languages as we can come up with. So on a debian system, it should become as simple as: apt-get install postgresql-contrib-8.3 apt-get install postgresql-language-pack-english-8.3 apt-get install postgresql-language-pack-german-8.3 apt-get install postgresql-language-russian-german-8.3 Inclusion into core surely does not help with that. Relabeling contrib to modules or extras or something would probably give some people a warm fuzzy feeling when installing. OTOH, these are probably the very same people who get excited about tsearch2 in core, so if we want to satisfy them, we better put it right into core... I dunno. Regards Markus
Peter Eisentraut wrote: > Oleg Bartunov wrote: >> So, if we'll not touch grammar, are there any issues with tsearch2 in >> core ? > > Are there any issues with tsearch2 not in core? > Quite apart from anything else, it really needs documentation of the standard we give other core features. I think if a feature will be of sufficiently general use it should be a candidate for inclusion, and text search certainly comes within that category in my mind. cheers andrew
On Sat, 18 Nov 2006, Andrew Dunstan wrote: > Peter Eisentraut wrote: >> Oleg Bartunov wrote: >>> So, if we'll not touch grammar, are there any issues with tsearch2 in >>> core ? >> >> Are there any issues with tsearch2 not in core? >> > > > Quite apart from anything else, it really needs documentation of the > standard we give other core features. Sure. I just learned how to built (successfully) pg documentation and researching on what's documentation standard. Should we need to write separate full text search chapter and/or add description to relevant chapters. > > I think if a feature will be of sufficiently general use it should be a > candidate for inclusion, and text search certainly comes within that > category in my mind. It could helps us in Pg-MySQL discussions, at least, since we beat mysql's fts :) Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Andrew Dunstan wrote: > Peter Eisentraut wrote: >> Oleg Bartunov wrote: >>> So, if we'll not touch grammar, are there any issues with tsearch2 in >>> core ? >> Are there any issues with tsearch2 not in core? >> > > > Quite apart from anything else, it really needs documentation of the > standard we give other core features. > > I think if a feature will be of sufficiently general use it should be a > candidate for inclusion, and text search certainly comes within that > category in my mind. I agree here - full text search is of general use (and a very often requested) feature - including it in core will both help us in marketing postgresql (which should notbe seen as a "bad" thing at all) and more to the point it provides an in-core user and showcase for two very powerful and innovative technologies - GIST and GIN. Stefan
Alvaro Herrera wrote: > Oleg Bartunov wrote: >> On Fri, 17 Nov 2006, Andrew Dunstan wrote: > >>> I am also a bit concerned that the names of the proposed objects (parser, >>> dictionary) don't convey their purpose adequately. Maybe TS_DICTIONARY and >>> TS_PARSER might be better if we in fact need to name them. >> this looks reasonable to me. > > Huh, but we don't use keywords with ugly abbreviations and underscores. > How about "FULLTEXT DICTIONARY" and "FULLTEXT PARSER"? (Using > "FULLTEXT" instead of "FULL TEXT" means you don't created common > reserved words, and furthermore you don't collide with an existing type > name.) sounds fine > > I also think the "thousands of lines" is an exaggeration :-) The > grammar should take a couple dozen at most. The rest of the code would > go to their own files. > > We should also take the opportunity to discuss new keywords for the XML > support -- will we use new grammar, or functions? > that is a good question and we should decide on a direction for that - we already have a feature in shipping code that causes quite some confusion in that regard(autovacuum). What see I from supporting/consulting people is that there are more and more people adapting autovacuum for there databases but those with complex ones want to override them on a per table base. We already provide a rather crude interface for that - namely manually inserting some rows into a system table which is confusing the heck out of people (they are either responding with "why is there now ALTER AUTOVACUUM SET ..." or and equivalent pg_* function for that). I'm not too sure what the most suitable interface for that would be but finding a consistent solution for that might be good nevertheless. Stefan
Peter Eisentraut wrote: > Oleg Bartunov wrote: > > So, if we'll not touch grammar, are there any issues with tsearch2 in > > core ? > > Are there any issues with tsearch2 not in core? No, but many think the idea of moving well-established code from /contrib into the backend is true for tsearch2 once it works for multi-byte encodings. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +