Re: fts, compond words? - Mailing list pgsql-general

From Mike Rylander
Subject Re: fts, compond words?
Date
Msg-id b918cf3d0512071020n2877e80kfebdc9f7533ed956@mail.gmail.com
Whole thread Raw
In response to Re: fts, compond words?  (Teodor Sigaev <teodor@sigaev.ru>)
Responses Re: fts, compond words?
Re: fts, compond words?
List pgsql-general
On 12/7/05, Teodor Sigaev <teodor@sigaev.ru> wrote:
> That is a long discussed thing. We can't formulate unconflicting rules... For
> example:
> 1) a  &[dist<=2]  ( b &[dist<=3] c )
> 2) a  &[dist<=2]  ( b |[dist<=3] c )
> 3) a  &[dist<=2] !c
> 4) a  &[dist<=2]  ( b |[dist<=3] !c )
> 5) a  &[dist<=2] ( b & c )
> What does exact they mean? What is tsvectors which should be matched by those
> queries?

1,2,4, and 5 are obviously ambiguous, but 3 seems straightforward to
me, if not more difficult to implement.  Would it not be acceptable to
say that proximity modifiers are only valid between two simple lexemes
and can not be placed next to any compound expression?

>
> The simple solution is : under operation 'phrase search' (ok, it will be '+'
> below) it must be only 'phrase search operations. I.e.:
> a | b ( c + ( d + e ) )      - good
> a | ( c + ( d & g ) )        -  bad.
>

Same as above.  And, while '+' would be a very good shortcut for
"&[follows;dist=1]" (or some such), I think the user should be able to
specify the proximity more explicitly as well.

> For example, we have word 'foonish' and after lexize we got two lexemes: 'foo1'
> and 'foo2'. So a good query 'a + foonish' becomes 'a + ( foo1 | foo2 )'...
>

hrm... that is a problem.  Though, I think that's a case of how the
compiled expression is built from user input.  Unless I'm mistaken

  a + ( foo1 | foo2 )

is exactly equal to

  (a + foo1) | (a + foo2)


Ahhh... but then there is the more complex example of

  a + foonish + bar

becoming

  a + (foo1 | foo2) + bar

.... but I guess that could be

(a + foo1 + bar) | (a + foo2 + bar)



>
>
>
>
> Mike Rylander wrote:
> > On 12/6/05, Marcus Engene <mengpg@engene.se> wrote:
> >
> > [snip]
> >
> >
> >>  A & (B | (New OperatorTheNextWordMustFollow York))
> >>
> >
> >
> > Actually, I love that idea.  Oleg, would it be possible to create a
> > tsquery operator that understands proximity?  Or, how allowing a
> > predicate to the current '&' op, as in '&[dist<=1]' meaning "next
> > token follows with a max distance of  1".  I imagine that it would
> > only be useful on unstripped tsvectors, but if the lexem position is
> > already stored ...
> >
> > --
> > Mike Rylander
> > mrylander@gmail.com
> > GPLS -- PINES Development
> > Database Developer
> > http://open-ils.org
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 2: Don't 'kill -9' the postmaster
>
> --
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru
>                                                     WWW: http://www.sigaev.ru/
>


--
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

pgsql-general by date:

Previous
From: Volkan YAZICI
Date:
Subject: Re: Problem: libpq, network traffic, memory usage
Next
From: Jaime Casanova
Date:
Subject: Re: Letting a function return multiple columns instead of a single complex one