Re: Why not keeping positions in GIN? - Mailing list pgsql-hackers

From Hitoshi Harada
Subject Re: Why not keeping positions in GIN?
Date
Msg-id 000001c79f99$ff6106f0$5f01a8c0@daraha
Whole thread Raw
In response to Re: Why not keeping positions in GIN?  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: Why not keeping positions in GIN?
List pgsql-hackers
> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree,
> n-gram index would be more universal for asian languages.
Yeah, I know, but in tsearch2 for japanese sample you must use external
morphological analysis libraries to separate words. It is powerful but I
need more "lightweight" approach. Also especially when you search for
non-document(such like titles, names, or pattern in the genome), the
approach above is not so useful.

As I mentioned, GIN is also powerful for array data type search, so I am
very expecting it will have additional information.

Anyway, thanks a lot for much information. I try to read it.

Regards, 

Hitoshi Harada

> -----Original Message-----
> From: Oleg Bartunov [mailto:oleg@sai.msu.su]
> Sent: Saturday, May 26, 2007 10:12 PM
> To: Hitoshi Harada
> Cc: pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] Why not keeping positions in GIN?
> 
> On Fri, 25 May 2007, Hitoshi Harada wrote:
> 
> > Hi,
> >
> > I was walking through GIN am source code these days, and found that it
has
> > only posting lists but no positions related those.
> >
> > The reason I was doing that is, to try to implement n-gram text search
index
> > on GIN for myself. As you know Japanese is not like English or other
> > European languages. If you write Japanese (or other 'not separated')
text
> > index by n-gram, it should have entry positions on the entry as well as
the
> > posting lists, because you must know if each split query key are joined
with
> > each other in the data. To know this, position must be there.
> 
> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree,
> n-gram index would be more universal for asian languages.
> 
> >
> > It's not only about Japanese. When you search "phrase" for text in
English,
> > the same logic above will be needed. I don't research about tsearch2 but
is
> > there any problem?? Also, in some case int-array inverted index needs
the
> > entry positions as well, I guess. Obtaining positions with posting lists
is
> > "general" enough for GIN, isn't it?
> >
> > Is there any future plan around it?
> 
> Yes, we do have plans. See our todo,
http://www.sai.msu.su/~megera/wiki/todo
> You may read also FTSBOOK, http://www.sai.msu.su/~megera/postgres/fts/doc
> and slides from PGCon2007,
> http://www.sai.msu.su/~megera/postgres/talks/fts-pgcon2007.pdf
> >
> >
> > Regards,
> >
> > Hitoshi Harada
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 4: Have you searched our list archives?
> >
> >               http://archives.postgresql.org
> >
> 
>      Regards,
>          Oleg
> _____________________________________________________________
> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
> Sternberg Astronomical Institute, Moscow University, Russia
> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> phone: +007(495)939-16-83, +007(495)939-23-83



pgsql-hackers by date:

Previous
From: Oleg Bartunov
Date:
Subject: Re: Why not keeping positions in GIN?
Next
From: Jan Wieck
Date:
Subject: Constraint exclusion crashes 8.3devel