Re: Updated tsearch documentation - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Updated tsearch documentation |
Date | |
Msg-id | Pine.LNX.4.64.0706172026340.1881@sn.sai.msu.ru Whole thread Raw |
In response to | Updated tsearch documentation (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Updated tsearch documentation
|
List | pgsql-hackers |
On Sun, 17 Jun 2007, Bruce Momjian wrote: > I have completed my first pass over the tsearch documentation: > > http://momjian.us/expire/fulltext/HTML/sql.html > > They are from section 14 and following. > > I have come up with a number of questions that I placed in SGML comments > in these files: > > http://momjian.us/expire/fulltext/SGML/ > > Teodor/Oleg, let me know when you want to go over my questions. Below are my answers (marked as ) Comments to editorial work of Bruce Momjian. fulltext-intro.sgml: it is useful to have a predefined list of lexemes. >Bruce, here should be list of types of lexemes ! </para></listitem> <!-- SEEMS UNNECESSARY It useless to attempt normalize <type>email address</type> using morphological dictionary of russian language, but looks reasonable to pick out <type>domain name</type> and be able to search for <type>domain name</type>. --> >I dont' understand where did you get this para :) fulltext-opfunc.sgml: All of the following functions that accept a configuration argument can use either an integer <!-- why an integer --> or a textual configuration name to select a configuration. > originally it was integer id, probably better use <type>oid</type> This returns the query used for searching an index. It can be used to test for an empty query. The <command>SELECT</> below returns <literal>'T'</>, <!-- lowercase? --> which corresponds to an empty query since GIN indexes do not support negate queries (a full index scan is inefficient): > capital case. This looks cumbersome, probably querytree() should > just return NULL. The integer option controls several behaviors which is done using bit-wise fields and <literal>|</literal> (for example, <literal>2|4</literal>): <!-- why so complex? --> > to avoid 2 arguments its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; <!-- n if none is specified that the current configuration is used. > I don't understand this question <para> <!-- why? --> Note that the cascade dropping of the <function>headline</function> function cause dropping of the <literal>parser</literal> used in fulltext configuration <replaceable>tsname</replaceable>. </para> > hmm, probably it should be reversed - cascade dropping of the parser cause > dropping of the headline function. In example below, <literal>fulltext_idx</literal> is a GIN index:<!-- why isn't this automatic --> > It's explained above. The problem is that current index api doesn't allow > to say if search was lossy or exact, so to preserve performance of > GIN index we had to introduce @@@ operator, which is the same as @@, but > lossy. nly the <token>lword</token> lexeme, then a <acronym>TZ</acronym> definition like ' one 1:11' will not work since lexeme type <token>digit</token> is not assigned to the <acronym>TZ</acronym>. <!-- what do these numbers mean? --> </para> > nothing special, just numbers for example. <function>ts_debug</> displays information about every token of <replaceable class="PARAMETER">document</replaceable> as produced by the parser and processed by the configured dictionaries using the configuration specified by <replaceable class="PARAMETER">cfgname</replaceable> or <replaceable class="PARAMETER">oid</replaceable>. <!-- no need for oid > don't understand this comment. ts_debug accepts cfgname or its oid Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: