tsearch patch and namespace pollution - Mailing list pgsql-hackers

From Tom Lane
Subject tsearch patch and namespace pollution
Date
Msg-id 25419.1187312966@sss.pgh.pa.us
Whole thread Raw
Responses Re: tsearch patch and namespace pollution  (Bruce Momjian <bruce@momjian.us>)
List pgsql-hackers
I find the following additions to pg_proc in the current tsearch2 patch:
                  proc                   | prorettype 
------------------------------------------+------------pg_ts_parser_is_visible(oid)             |
booleanpg_ts_dict_is_visible(oid)              | booleanpg_ts_template_is_visible(oid)           |
booleanpg_ts_config_is_visible(oid)            | booleantsvectorin(cstring)                      |
tsvectortsvectorout(tsvector)                   | cstringtsvectorsend(tsvector)                   |
byteatsqueryin(cstring)                      | tsquerytsqueryout(tsquery)                      |
cstringtsquerysend(tsquery)                    | byteagtsvectorin(cstring)                     |
gtsvectorgtsvectorout(gtsvector)                 | cstringtsvector_lt(tsvector,tsvector)           |
booleantsvector_le(tsvector,tsvector)          | booleantsvector_eq(tsvector,tsvector)           |
booleantsvector_ne(tsvector,tsvector)          | booleantsvector_ge(tsvector,tsvector)           |
booleantsvector_gt(tsvector,tsvector)          | booleantsvector_cmp(tsvector,tsvector)          |
integerlength(tsvector)                        | integerstrip(tsvector)                          |
tsvectorsetweight(tsvector,"char")              | tsvectortsvector_concat(tsvector,tsvector)       |
tsvectorvq_exec(tsvector,tsquery)               | booleanqv_exec(tsquery,tsvector)                |
booleantt_exec(text,text)                      | booleanct_exec(character varying,text)          |
booleantq_exec(text,tsquery)                   | booleancq_exec(character varying,tsquery)       |
booleantsquery_lt(tsquery,tsquery)             | booleantsquery_le(tsquery,tsquery)              |
booleantsquery_eq(tsquery,tsquery)             | booleantsquery_ne(tsquery,tsquery)              |
booleantsquery_ge(tsquery,tsquery)             | booleantsquery_gt(tsquery,tsquery)              |
booleantsquery_cmp(tsquery,tsquery)            | integertsquery_and(tsquery,tsquery)             |
tsquerytsquery_or(tsquery,tsquery)             | tsquerytsquery_not(tsquery)                     |
tsquerytsq_mcontains(tsquery,tsquery)          | booleantsq_mcontained(tsquery,tsquery)          |
booleannumnode(tsquery)                        | integerquerytree(tsquery)                       |
textrewrite(tsquery,tsquery,tsquery)        | tsqueryrewrite(tsquery,text)                    |
tsqueryrewrite_accum(tsquery,tsquery[])        | tsqueryrewrite_finish(tsquery)                  |
tsqueryrewrite(tsquery[])                      | tsquerystat(text)                               |
recordstat(text,text)                         | recordrank(real[],tsvector,tsquery,integer)    |
realrank(real[],tsvector,tsquery)           | realrank(tsvector,tsquery,integer)           | realrank(tsvector,tsquery)
                 | realrank_cd(real[],tsvector,tsquery,integer) | realrank_cd(real[],tsvector,tsquery)         |
realrank_cd(tsvector,tsquery,integer)       | realrank_cd(tsvector,tsquery)                | realtoken_type(oid)
                 | recordtoken_type(text)                         | recordparse(oid,text)                          |
recordparse(text,text)                        | recordlexize(oid,text)                         |
text[]lexize(text,text)                       | text[]headline(oid,text,tsquery,text)          |
textheadline(oid,text,tsquery)              | textheadline(text,text,tsquery,text)         |
textheadline(text,text,tsquery)             | textheadline(text,tsquery,text)              | textheadline(text,tsquery)
                 | textto_tsvector(oid,text)                    | tsvectorto_tsvector(text,text)                   |
tsvectorto_tsquery(oid,text)                    | tsqueryto_tsquery(text,text)                    |
tsqueryplainto_tsquery(oid,text)               | tsqueryplainto_tsquery(text,text)               |
tsqueryto_tsvector(text)                       | tsvectorto_tsquery(text)                         |
tsqueryplainto_tsquery(text)                   | tsquerytsvector_update_trigger()                |
triggerget_ts_config_oid(text)                 | oidget_current_ts_config()                  | oid
 
(82 rows)

(This list omits functions with INTERNAL arguments, as those are of
no particular concern to users.)

While most of these are probably OK, I'm disturbed by the prospect
that we are commandeering names as generic as "parse" or "stat"
with argument types as generic as "text".  I think we need to put
a "ts_" prefix on some of these.  Specifically, I find these names
totally unacceptable without a ts_ prefix:
stat(text)                               | recordstat(text,text)                          | record
token_type(oid)                          | recordtoken_type(text)                         | record
parse(oid,text)                          | recordparse(text,text)                         | record
lexize(oid,text)                         | text[]lexize(text,text)                        | text[]

These guys might be all right given that some of their arguments are
tsvector or tsquery, but it's not completely convincing --- think about
the case where an argument is given as an undecorated literal string.
It's also not all that clear that they are related to text searching.
I'm for putting a ts_ prefix on them too:
rank(real[],tsvector,tsquery,integer)    | realrank(real[],tsvector,tsquery)            |
realrank(tsvector,tsquery,integer)          | realrank(tsvector,tsquery)                   |
realrank_cd(real[],tsvector,tsquery,integer)| realrank_cd(real[],tsvector,tsquery)         |
realrank_cd(tsvector,tsquery,integer)       | realrank_cd(tsvector,tsquery)                | real
 
rewrite(tsquery,tsquery,tsquery)         | tsqueryrewrite(tsquery,text)                    |
tsqueryrewrite_accum(tsquery,tsquery[])        | tsqueryrewrite_finish(tsquery)                  |
tsqueryrewrite(tsquery[])                      | tsquery
 
headline(oid,text,tsquery,text)          | textheadline(oid,text,tsquery)               |
textheadline(text,text,tsquery,text)        | textheadline(text,text,tsquery)              |
textheadline(text,tsquery,text)             | textheadline(text,tsquery)                   | text
 

These guys are just plain badly named, as it's completely unobvious that
they have anything to do with tsearch (or what they do at all, actually).
Furthermore the "varchar" variants seem entirely redundant with the
"text" ones:
vq_exec(tsvector,tsquery)                | booleanqv_exec(tsquery,tsvector)                | booleantt_exec(text,text)
                    | booleanct_exec(character varying,text)          | booleantq_exec(text,tsquery)
|booleancq_exec(character varying,tsquery)       | boolean
 

Comments, suggestions?
        regards, tom lane


pgsql-hackers by date:

Previous
From: "Marc G. Fournier"
Date:
Subject: Re: Re: cvsweb busted (was Re: [COMMITTERS] pgsql: Repair problems occurring when multiple RI updates have to be)
Next
From: Tom Lane
Date:
Subject: tsvector_update_trigger() is utterly insecure