Re: Updated tsearch documentation - Mailing list pgsql-hackers

From Nicolas Barbier
Subject Re: Updated tsearch documentation
Date
Msg-id b0f3f5a10707071602m6662ebb4yc4c145dedf5f8601@mail.gmail.com
Whole thread Raw
In response to Re: Updated tsearch documentation  (Bruce Momjian <bruce@momjian.us>)
Responses Re: Updated tsearch documentation  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
2007/7/7, Bruce Momjian <bruce@momjian.us>:

> FYI, I have massively reorganized the text search documentation and it
> is getting closer to something I am happy with:
>
>         http://momjian.us/expire/fulltext/HTML/textsearch.html

The following is the result of me proofreading, mainly searching for
small mistakes such as spelling/grammatical errors (that means no
document structure comments, etc).

All corrections are relative to the version of the text at above URL
at the time of me reading it :-).

General

It seems to be a recurring problem that commas are not put between the
brackets when an argument is optional. For example:
"to_tsvector([conf_name], document TEXT)" -> I guess this should be
"to_tsvector([conf_name,] document TEXT)"

Full-text vs. full text and stop-word vs. stop word are not used
consistently. Also, capitalization of full text searching is not used
consistently.

14.1. Introduction

* "indexinging" - > "indexing"
* "There is no linguistic support, even in English" -> "for" instead of "in"?
* "e.g.satisfies" -> add a space before "satisfies"
* "have several thousands derivatives" -> should this not use the
singular form thousand?
* "infinitive form" -> is this the right term? I think it only applies
to verbs (also occurs in 14.4 and probably others)
* "over how lexemes creation" -> not sure what this should be. "are
created" maybe?
* "Map synonyms to a single word. ispell." -> why is ispell a standalone word?
* "so it is natural to introduce a new data type" -> this does not
sound like documentation
* "Also, full-text search operator @@" -> add "the" before "full-text"
* "A document is any text file that can be opened, read, and modified"
-> "file" sounds as if it should be a file on a filesystem.
* "However, the document file must be uniquely identified in the
database." -> why?
* "COALESCE" -> should be a link
* "during calculation of document rank" -> add "the" before
"calculation" and before "document"
* "which supports boolean operators, & (AND)" -> remove the ",". maybe
add "the" before boolean
* "parenthesis" -> "parentheses"
* "Tsquery consists of" -> maybe add "A" before Tsquery

14.2. Operators And Functions               ^^^ -> a non-capital "a" in "and" seems to be more
consistent with the rest of the manual

* "TSVECTOR, otherwise false:" -> "and false if not" or "and false
otherwise" (occurs 3 times in this section)
* "The text should be formatted to match the way a vector is displayed
by SELECT." -> what a strange definition, I think something like
"input format" or so should be used (and defined somewhere, didn't see
it yet) (used twice in this section)
* "tsearch([vector_column_name], my_filter_name | text_column_name1
[...], text_column_nameN)" -> I do not understand the notation
* "The following rule is used: a function is applied to all subsequent
TEXT columns until next matching column occurs." -> I don't get it
* "stat([sqlquery text ], [weight text ]) returns SETOF statinfo" -> I
guess that not both of the arguments are optional?
* "stop-words candidates" -> stop-word candidates
* "tsvectors are compared with each other using lexicographical
ordering." -> of the output representation or something else?
* "Accepts querytext, which should be single tokens separated by" ->
replace "be" with "consist of"
* "& and | or, and ! not" -> putting parentheses around the "and" "or"
and "not" would be more readable. also, a comma is missing before the
"|" sign
* "break it onto tokens" -> into instead of onto
* "since GIN indexes do not support negate queries" -> something like:
"queries with negation" or "negated queries" (depending on what the
correct rule is)
* "Arguments to rewrite() function" -> "the .. functions" or "to .."
(without the "function")
* "can be column names of type tsquery" -> "names of columns of type
tsquery" (the names are not of type tsquery, the columns are)
* "we can change rewriting rule online" -> add "the", possibly use
another word for "online" (it is not clear what that means to me)

14.3. Additional Controls

* "Full text searching in PostgreSQL provides function" -> add "the"
* "we see the resulting" -> maybe "we see that the resulting"
"does not contain a, on, or it, word rats became rat, and the
punctuation sign - was ignored" -> "does not contain the words" (or
lexemes, or tokens), add "the" before "word rats", add quotes around
the "-"
* "on words" -> "into words"
* "they are too frequent" -> "they occur too frequently" (I think a
word cannot "be" frequent)
* "The Punctuation sign -" -> "The punctuation sign -" + put quotes
around the "-"
* "which shows all details of full text machinery" -> add "the" before "full"
* "is to mark out the different parts of document" -> add "a" before "document"
* "by the 1 + logarithm" -> "by 1 + the logarithm"
* "i.e., ordering of search results will not change" -> add "the"
before "ordering", maybe also before "search"
* "note that second example" -> add "the" before "second"
* "than ones with labeled with D" -> "than ones labeled with D" or
"than ones that are labeled with D"
* "Unfortunately, it is almost impossible to avoid since full text
indexing in a database should work without indexes" -> I don't get it
* "to show part of each document" -> add "a" before "part"
* "provides the function headline" -> add something, such as "to
accomplish this" or "that implements such functionality" or something.
* "ellipse-separated" -> "ellipsis-separated"
* "the cascade dropping of the parser function cause dropping of the
headling" -> I don't get the meaning of the sentence. I guess that
"cause" should be "causes" and "headling" should be 'heading"

14.4. Dictionaries

* "to use any word form in a query" -> "to use any derived form of a
word in a query"
* "infinitive" -> is this the right term? I think it only applies to
verbs (used twice in this section)
* "colour" -> is the manual supposed to be UK or US English? I cannot
remember ever having read any UK-isms before
* "substituted to their" -> replace "to" with "by" or "with" (native
English speakers, help me here)
* "see dictionary for integers Section 14.11 as an example" -> strange
way of referring, I would put parenthesis around the section number,
or alternatively put the section number before the title
* "Lexemes come through a stack" -> replace "come through" with "are
processed by" or something
* "appears as a stop-word" -> "turns out to be a stop-word", also
"stop word" is used elsewhere (without the "-") (this inconsistency
occurs a lot in this section)
* "Also, the ts_debug function ( Section 14.10 ) is very useful for
this." -> the spaces around the section reference look strange. maybe
replace "is very useful" by "can be used"
* "and appear in almost every document" -> two times "and" sounds bad,
replace this "and" by a comma
* "discrimination value so they can be ignored in" -> cut this in two
sentences: "discrimination value. Therefore, they can be ignored in
the context of"
* "word like a and it is useless to have them in an index" -> replace
"word" with "words", make "a" somehow stand out (quotes?), replace
"and" with "although" and "have" with "store"
* "However stop words" -> "However, stop words"
* "does affect ranking" -> "do affect ranking" (I think both can be
considered correct, but like this one better)
* "Relative paths in OPTION resolve relative to share/" -> and
"share/" is relative to what? such references occur elsewhere in this
section
* "Synonym dictionary can be used" -> replace "dictonary" with
"dictionaries", or alternatively, put "A" before "synonym"
* "thesynonym" -> add a space
* "en_stemm" -> "en_stem"
* "abbeviated" -> "abbreviated"
* "preferred terms, non-preferred, related terms" -> add "terms" after
"non-preferred", or alternatively, remove all "terms" references apart
from the last one
* "in the thesaurus requires reindexing" -> replace "requires" with "require"
* "It is possible to define only one dictionary." -> I guess that
sentence wants to express that only one dictionary is allowed? In that
case, change to "It is only possible to define one dictionary."
* "Use asterisk" -> add "an" before "asterisk"
* "thesubdictionary" -> "the subdictionary"
* "It is still required that sample words should be known" -> don't
use "required" and "should" together: "sample words are still required
to be known"
* "Since thesaurus dictionary" -> add "a" before "thesaurus"
* "with parser" -> add "the" before "parser"
* "but we can use plainto_tsquery and to_tsvector functions" -> add
"the" before the name of the first function, or remove the "functions"
part
* "not a lexemes" -> "not lexemes"
* "on OpenOffice Wiki" -> add "the" before "OpenOffice"
* "does not supports" -> "does not support"
* "support of" -> "support for"
* "At present, Full text" -> I guess that "full" should not be capitalized
* "see Snowball site" -> add "the" before "Snowball"
* "which accepts a snowball stemmer" -> "that is accepted by a snowball stemmer"

14.5. Indexes

* "speedup" -> "speed up"
* "GiST(The Generalized Search Tree)-based" -> "GiST (Generalized
Search Tree)-based"
* "GIN(The Generalized Inverted Index)-based" -> "GIN (Generalized
Inverted Index)-based"
* "necessary consult the" -> add "to" before "consult"
* "and could be result" -> remove the "be"
* "transitive containment relation is realized" -> add "the" before "transitive"
* "Knuth,1973" -> add a space after the comma
* "i.e. parent is 'OR'-ed bit-strings" -> "i.e., a parent is the
result of 'OR'-ing the bit-strings"
* "of its limited" -> "of the limited"
* "The likelihood of false drops" -> what are "drops"? maybe this
needs to be "hits"?
* "while longer one are" -> replace "one" with "ones"
* "or the result" -> add "whether" before "the"
* "currently is currently" -> remove the first "currently"
* "but its performance" -> replace "its" with "their"
* "heap, so" -> "heap. Therefore, "
* "In example below" -> add "the" before "example"
* "constraint_exclusion" -> why the underscore? should be a link

14.6. Configuration

* "all of the options" -> maybe remove "of the"
* "objects a set" -> add a comma before "a"

14.7. Limitations

* "Length of" -> "The length of" (twice)
* "less then" -> "less than"

None of the numbers use commas to separate the thousands, except for one.

14.8. psql Support

14.9. Application Tutorial

* "searchs" -> "searches"
* "is last-modified date" -> add "the" after "is"

14.10. Debugging

* "Word supernovaes" -> "The word supernovaes"
* "end the dictionary stack" -> add "the" before "dictionary"
* "specifies maximum length" -> add "the" before "maximum"

14.12. Example of Creating a Parser

* "Note it should" -> insert "that" after "Note"
* "The void function" -> replace "The" with "This"

Nicolas

-- 
Nicolas Barbier
http://www.gnu.org/philosophy/no-word-attachments.html


pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: usleep feature for pgbench
Next
From: Jim Nasby
Date:
Subject: Re: Still recommending daily vacuum...