Re: Why not keeping positions in GIN? - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Why not keeping positions in GIN? |
Date | |
Msg-id | Pine.LNX.4.64.0705281722520.12152@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Why not keeping positions in GIN? ("Hitoshi Harada" <hitoshi_harada@forcia.com>) |
List | pgsql-hackers |
Hitoshi, there is no problem to write n-gram dictionary for tsearch2 ! The problem is in how to define word boundary. Oleg On Sat, 26 May 2007, Hitoshi Harada wrote: >> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree, >> n-gram index would be more universal for asian languages. > Yeah, I know, but in tsearch2 for japanese sample you must use external > morphological analysis libraries to separate words. It is powerful but I > need more "lightweight" approach. Also especially when you search for > non-document(such like titles, names, or pattern in the genome), the > approach above is not so useful. > > As I mentioned, GIN is also powerful for array data type search, so I am > very expecting it will have additional information. > > Anyway, thanks a lot for much information. I try to read it. > > Regards, > > Hitoshi Harada > >> -----Original Message----- >> From: Oleg Bartunov [mailto:oleg@sai.msu.su] >> Sent: Saturday, May 26, 2007 10:12 PM >> To: Hitoshi Harada >> Cc: pgsql-hackers@postgresql.org >> Subject: Re: [HACKERS] Why not keeping positions in GIN? >> >> On Fri, 25 May 2007, Hitoshi Harada wrote: >> >>> Hi, >>> >>> I was walking through GIN am source code these days, and found that it > has >>> only posting lists but no positions related those. >>> >>> The reason I was doing that is, to try to implement n-gram text search > index >>> on GIN for myself. As you know Japanese is not like English or other >>> European languages. If you write Japanese (or other 'not separated') > text >>> index by n-gram, it should have entry positions on the entry as well as > the >>> posting lists, because you must know if each split query key are joined > with >>> each other in the data. To know this, position must be there. >> >> FYI, Tatsuo uses tsearch2 for indexing japanese documents. But I agree, >> n-gram index would be more universal for asian languages. >> >>> >>> It's not only about Japanese. When you search "phrase" for text in > English, >>> the same logic above will be needed. I don't research about tsearch2 but > is >>> there any problem?? Also, in some case int-array inverted index needs > the >>> entry positions as well, I guess. Obtaining positions with posting lists > is >>> "general" enough for GIN, isn't it? >>> >>> Is there any future plan around it? >> >> Yes, we do have plans. See our todo, > http://www.sai.msu.su/~megera/wiki/todo >> You may read also FTSBOOK, http://www.sai.msu.su/~megera/postgres/fts/doc >> and slides from PGCon2007, >> http://www.sai.msu.su/~megera/postgres/talks/fts-pgcon2007.pdf >>> >>> >>> Regards, >>> >>> Hitoshi Harada >>> >>> >>> >>> ---------------------------(end of broadcast)--------------------------- >>> TIP 4: Have you searched our list archives? >>> >>> http://archives.postgresql.org >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: