Re: tsearch patch and namespace pollution - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: tsearch patch and namespace pollution
Date
Msg-id 200708170218.l7H2Io715461@momjian.us
Whole thread Raw
In response to tsearch patch and namespace pollution  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: tsearch patch and namespace pollution  (Michael Paesold <mpaesold@gmx.at>)
List pgsql-hackers
Tom Lane wrote:
> I find the following additions to pg_proc in the current tsearch2 patch:

It seems a lot of these are useless and just bloat.  I will mark a few:

>                    proc                   | prorettype 
> ------------------------------------------+------------
>  pg_ts_parser_is_visible(oid)             | boolean
>  pg_ts_dict_is_visible(oid)               | boolean
>  pg_ts_template_is_visible(oid)           | boolean
>  pg_ts_config_is_visible(oid)             | boolean

Why would anyone look these up via OID rather than name?

>  tsvectorin(cstring)                      | tsvector
>  tsvectorout(tsvector)                    | cstring
>  tsvectorsend(tsvector)                   | bytea
>  tsqueryin(cstring)                       | tsquery
>  tsqueryout(tsquery)                      | cstring
>  tsquerysend(tsquery)                     | bytea
>  gtsvectorin(cstring)                     | gtsvector
>  gtsvectorout(gtsvector)                  | cstring
>  tsvector_lt(tsvector,tsvector)           | boolean
>  tsvector_le(tsvector,tsvector)           | boolean
>  tsvector_eq(tsvector,tsvector)           | boolean
>  tsvector_ne(tsvector,tsvector)           | boolean
>  tsvector_ge(tsvector,tsvector)           | boolean
>  tsvector_gt(tsvector,tsvector)           | boolean
>  tsvector_cmp(tsvector,tsvector)          | integer
>  length(tsvector)                         | integer
>  strip(tsvector)                          | tsvector
>  setweight(tsvector,"char")               | tsvector
>  tsvector_concat(tsvector,tsvector)       | tsvector
>  vq_exec(tsvector,tsquery)                | boolean
>  qv_exec(tsquery,tsvector)                | boolean
>  tt_exec(text,text)                       | boolean
>  ct_exec(character varying,text)          | boolean
>  tq_exec(text,tsquery)                    | boolean
>  cq_exec(character varying,tsquery)       | boolean
>  tsquery_lt(tsquery,tsquery)              | boolean
>  tsquery_le(tsquery,tsquery)              | boolean
>  tsquery_eq(tsquery,tsquery)              | boolean
>  tsquery_ne(tsquery,tsquery)              | boolean
>  tsquery_ge(tsquery,tsquery)              | boolean
>  tsquery_gt(tsquery,tsquery)              | boolean
>  tsquery_cmp(tsquery,tsquery)             | integer
>  tsquery_and(tsquery,tsquery)             | tsquery
>  tsquery_or(tsquery,tsquery)              | tsquery
>  tsquery_not(tsquery)                     | tsquery
>  tsq_mcontains(tsquery,tsquery)           | boolean
>  tsq_mcontained(tsquery,tsquery)          | boolean
>  numnode(tsquery)                         | integer
>  querytree(tsquery)                       | text
>  rewrite(tsquery,tsquery,tsquery)         | tsquery
>  rewrite(tsquery,text)                    | tsquery
>  rewrite_accum(tsquery,tsquery[])         | tsquery
>  rewrite_finish(tsquery)                  | tsquery
>  rewrite(tsquery[])                       | tsquery
>  stat(text)                               | record
>  stat(text,text)                          | record

>  rank(real[],tsvector,tsquery,integer)    | real
>  rank(real[],tsvector,tsquery)            | real
>  rank(tsvector,tsquery,integer)           | real
>  rank(tsvector,tsquery)                   | real
>  rank_cd(real[],tsvector,tsquery,integer) | real
>  rank_cd(real[],tsvector,tsquery)         | real
>  rank_cd(tsvector,tsquery,integer)        | real
>  rank_cd(tsvector,tsquery)                | real

Do we realy need this many ranking functions?

>  token_type(oid)                          | record

Again, why by OID?

>  token_type(text)                         | record
>  parse(oid,text)                          | record
>  parse(text,text)                         | record
>  lexize(oid,text)                         | text[]
>  lexize(text,text)                        | text[]
>  headline(oid,text,tsquery,text)          | text
>  headline(oid,text,tsquery)               | text
>  headline(text,text,tsquery,text)         | text
>  headline(text,text,tsquery)              | text
>  headline(text,tsquery,text)              | text
>  headline(text,tsquery)                   | text
>  to_tsvector(oid,text)                    | tsvector
>  to_tsvector(text,text)                   | tsvector
>  to_tsquery(oid,text)                     | tsquery

Why OID again for the configuration?  I just don't see the use case and
it is bloat and causes confusion.

>  to_tsquery(text,text)                    | tsquery
>  plainto_tsquery(oid,text)                | tsquery
>  plainto_tsquery(text,text)               | tsquery

Again, OID.  I asked Oleg about this and he said:

> Bruce, just remove oid argument specification from documentation.

so I think we can go ahead and remove cases where the configuration name
or object is specified by oid.  I have already removed them from the
documentation and I though the patch had them removed too, but I guess
not.  Admittedly this API has been in flux.

>  to_tsvector(text)                        | tsvector
>  to_tsquery(text)                         | tsquery
>  plainto_tsquery(text)                    | tsquery
>  tsvector_update_trigger()                | trigger
>  get_ts_config_oid(text)                  | oid
>  get_current_ts_config()                  | oid
> (82 rows)
> 
> (This list omits functions with INTERNAL arguments, as those are of
> no particular concern to users.)

Also @@ accepts TEXT @@ TEXT, at least according to the documentation. 
Is it clear to anyone which is tsquery and which tsvector?  Is this
something we want to support?

> While most of these are probably OK, I'm disturbed by the prospect
> that we are commandeering names as generic as "parse" or "stat"
> with argument types as generic as "text".  I think we need to put
> a "ts_" prefix on some of these.  Specifically, I find these names
> totally unacceptable without a ts_ prefix:
> 
>  stat(text)                               | record
>  stat(text,text)                          | record
> 
>  token_type(oid)                          | record
>  token_type(text)                         | record
> 
>  parse(oid,text)                          | record
>  parse(text,text)                         | record
> 
>  lexize(oid,text)                         | text[]
>  lexize(text,text)                        | text[]
> 
> These guys might be all right given that some of their arguments are
> tsvector or tsquery, but it's not completely convincing --- think about
> the case where an argument is given as an undecorated literal string.
> It's also not all that clear that they are related to text searching.
> I'm for putting a ts_ prefix on them too:
> 
>  rank(real[],tsvector,tsquery,integer)    | real
>  rank(real[],tsvector,tsquery)            | real
>  rank(tsvector,tsquery,integer)           | real
>  rank(tsvector,tsquery)                   | real
>  rank_cd(real[],tsvector,tsquery,integer) | real
>  rank_cd(real[],tsvector,tsquery)         | real
>  rank_cd(tsvector,tsquery,integer)        | real
>  rank_cd(tsvector,tsquery)                | real
> 
>  rewrite(tsquery,tsquery,tsquery)         | tsquery
>  rewrite(tsquery,text)                    | tsquery
>  rewrite_accum(tsquery,tsquery[])         | tsquery
>  rewrite_finish(tsquery)                  | tsquery
>  rewrite(tsquery[])                       | tsquery
> 
>  headline(oid,text,tsquery,text)          | text
>  headline(oid,text,tsquery)               | text
>  headline(text,text,tsquery,text)         | text
>  headline(text,text,tsquery)              | text
>  headline(text,tsquery,text)              | text
>  headline(text,tsquery)                   | text
> 
> These guys are just plain badly named, as it's completely unobvious that
> they have anything to do with tsearch (or what they do at all, actually).
> Furthermore the "varchar" variants seem entirely redundant with the
> "text" ones:
> 
>  vq_exec(tsvector,tsquery)                | boolean
>  qv_exec(tsquery,tsvector)                | boolean
>  tt_exec(text,text)                       | boolean
>  ct_exec(character varying,text)          | boolean
>  tq_exec(text,tsquery)                    | boolean
>  cq_exec(character varying,tsquery)       | boolean
> 
> Comments, suggestions?

I would be happy if all text search functions began with 'ts', 'ts_', or
'to_ts', and if we don't clean this up now, it is going to be harder in
the future.  I think users can expect some migration for text search in
8.3 as a benefit of getting into core and be dump-able.

--  Bruce Momjian  <bruce@momjian.us>          http://momjian.us EnterpriseDB
http://www.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


pgsql-hackers by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: GIT patch
Next
From: Bruce Momjian
Date:
Subject: Re: GIT patch