Re: FTS uses "tsquery" directly in the query - Mailing list pgsql-general

From xu fei
Subject Re: FTS uses "tsquery" directly in the query
Date
Msg-id 362774.80233.qm@web45404.mail.sp1.yahoo.com
Whole thread Raw
In response to Re: FTS uses "tsquery" directly in the query  (Ivan Sergio Borgonovo <mail@webthatworks.it>)
List pgsql-general
Hi, Ivan:

I agree with you and also would like to 'hack' into the code. Current FTC is the best one in database system and a great building block to support more functions. I list some I can think about:
  • choose "|" or "&" as an optional parameter for to_tsquery, to_tsvector.
  • choose normalization or not for to_tsquery, to_tsvector.
  • current two rankings are not enough: the default ts_rank, I have not figured out the algorithm. The ts_rank_cd, we have the paper but it is designed for short query with 2 or 3 tokens.
  • The normalization may be similar to Apache Lucene which is really easy to modify and build your own tokenizer. I still feel confused after reading the annual. 
I am not sure current there is a team to help Oleg Bartunov or not. If need, I can try to do something rather than just hacking it. I am sure, Ivan also will join this. :)
Xu
--- On Mon, 1/25/10, Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:

From: Ivan Sergio Borgonovo <mail@webthatworks.it>
Subject: Re: [GENERAL] FTS uses "tsquery" directly in the query
To: pgsql-general@postgresql.org
Date: Monday, January 25, 2010, 4:33 PM

On Mon, 25 Jan 2010 23:35:12 +0300 (MSK)
Oleg Bartunov <oleg@sai.msu.su> wrote:

> Do you guys wanted something like:
>
> arxiv=# select and2or(to_tsquery('1 & 2 & 3'));
>         and2or
> ---------------------
>   ( '1' | '2' ) | '3'
> (1 row)

Nearly. I'm starting from a weighted tsvector not from text/tsquery..
I would like to:
- keep the weights in the query
- avoid parsing the text to extract lexemes twice (I already have a
  tsvector)

For me extending pg in C is a new science, but I'm actually trying
to write at least a couple of functions that:
- will return a tsvector as a weight int, pos int[], lexeme text
  record
- will turn a tsvector + operator into a tsquery
  'orange':A1,2,3 'banana':B4,5 'tomato':C6,7 ->
  'orange':A | 'banana':B | 'tomato':C
  or eventually
  'orange':A & 'banana':B & 'tomato':C

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Jeff Davis
Date:
Subject: Re: revoke from all users
Next
From: "Joshua D. Drake"
Date:
Subject: Re: Log full of: statement_timeout out of the validrange.