Re: Why not keeping positions in GIN? - Mailing list pgsql-hackers
| From | Hitoshi Harada | 
|---|---|
| Subject | Re: Why not keeping positions in GIN? | 
| Date | |
| Msg-id | 000001c79f99$ff6106f0$5f01a8c0@daraha Whole thread Raw | 
| In response to | Re: Why not keeping positions in GIN? (Oleg Bartunov <oleg@sai.msu.su>) | 
| Responses | Re: Why not keeping positions in GIN? | 
| List | pgsql-hackers | 
> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree, > n-gram index would be more universal for asian languages. Yeah, I know, but in tsearch2 for japanese sample you must use external morphological analysis libraries to separate words. It is powerful but I need more "lightweight" approach. Also especially when you search for non-document(such like titles, names, or pattern in the genome), the approach above is not so useful. As I mentioned, GIN is also powerful for array data type search, so I am very expecting it will have additional information. Anyway, thanks a lot for much information. I try to read it. Regards, Hitoshi Harada > -----Original Message----- > From: Oleg Bartunov [mailto:oleg@sai.msu.su] > Sent: Saturday, May 26, 2007 10:12 PM > To: Hitoshi Harada > Cc: pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] Why not keeping positions in GIN? > > On Fri, 25 May 2007, Hitoshi Harada wrote: > > > Hi, > > > > I was walking through GIN am source code these days, and found that it has > > only posting lists but no positions related those. > > > > The reason I was doing that is, to try to implement n-gram text search index > > on GIN for myself. As you know Japanese is not like English or other > > European languages. If you write Japanese (or other 'not separated') text > > index by n-gram, it should have entry positions on the entry as well as the > > posting lists, because you must know if each split query key are joined with > > each other in the data. To know this, position must be there. > > FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree, > n-gram index would be more universal for asian languages. > > > > > It's not only about Japanese. When you search "phrase" for text in English, > > the same logic above will be needed. I don't research about tsearch2 but is > > there any problem?? Also, in some case int-array inverted index needs the > > entry positions as well, I guess. Obtaining positions with posting lists is > > "general" enough for GIN, isn't it? > > > > Is there any future plan around it? > > Yes, we do have plans. See our todo, http://www.sai.msu.su/~megera/wiki/todo > You may read also FTSBOOK, http://www.sai.msu.su/~megera/postgres/fts/doc > and slides from PGCon2007, > http://www.sai.msu.su/~megera/postgres/talks/fts-pgcon2007.pdf > > > > > > Regards, > > > > Hitoshi Harada > > > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 4: Have you searched our list archives? > > > > http://archives.postgresql.org > > > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: