Re: Partial match in GIN - Mailing list pgsql-patches
From | Teodor Sigaev |
---|---|
Subject | Re: Partial match in GIN |
Date | |
Msg-id | 47FDFFBC.4060302@sigaev.ru Whole thread Raw |
In response to | Re: Partial match in GIN (Heikki Linnakangas <heikki@enterprisedb.com>) |
List | pgsql-patches |
> How about forcing the use of a bitmap index scan, and modify the indexam > API so that GIN could a return a lossy bitmap, and let the bitmap heap > scan do the rechecking? Partial match might be used only for one search entry from many. In sext search example: 'a:* & qwertyuiop' - second lexeme has only a few matched tuples. But GIN itself doesn't know about semantic meaning of operation and can not distinguish following tsqueries: '!a:* & qwertyuiop' '!a:* & qwertyuiop' 'a:* & !qwertyuiop' So, your suggestion is equivalent to mark all operation with RECHEK flag and OR-ing all posting lists. That will be give a lot of false match and too slow. > >>> I don't think the storage size of tsquery matters much, so whatever >>> is the best solution in terms of code readability etc. >> That was about tsqueryesend/recv format? not a storage on disk. We >> don't require compatibility of binary format of db's files, but I have >> some doubts about binary dump. > > We generally don't make any promises about cross-version compatibility > of binary dumps, though it would be nice not to break it if it's not too > much effort. > >>> Hmm. match_special_index_operator() already checks that the index's >>> opfamily is pattern_ops, or text_ops with C-locale. Are you reusing >>> the same operator families for wildspeed? Doesn't it then also get >>> confused if you do a "WHERE textcol > 'foo'" query by hand? >> No, wildspeed use the same operator ~~ >> match_special_index_operator() isn't called at all: in >> match_clause_to_indexcol() function is_indexable_operator() is called >> before match_special_index_operator() and returns true. >> >> expand_indexqual_opclause() sees that operation is a OID_TEXT_LIKE_OP >> and calls prefix_quals() which fails because it wishes only several >> Btree opfamilies. > > Oh, I see. So this assumption mentioned in the comment there: > > /* > * LIKE and regex operators are not members of any index opfamily, > * so if we find one in an indexqual list we can assume that it > * was accepted by match_special_index_operator(). > */ > > is no longer true with wildspeed. So we do need to check that in > expand_indexqual_opclause() then. > >>>> NOTICE 2: it seems to me, that similar technique could be >>>> implemented for ordinary BTree to eliminate hack around LIKE support. >>> LIKE expression. I wonder what the size and performance of that would >>> be like, in comparison to the proposed GIN solution? >> >> GIN speeds up '%foo%' too - which is impossible for btree. But I don't >> like a hack around LIKE support in BTree. This support uses outflank >> ways missing regular one. > > You could satisfy '%foo%' using a regular and a reverse B-tree index, > and a bitmap AND. Which is interestingly similar to the way you proposed > to use a TIDBitmap within GIN. > -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
pgsql-patches by date: