<div class="gmail_quote">On Mon, Apr 30, 2012 at 10:07 PM, Robert Haas <span dir="ltr"><<a
href="mailto:robertmhaas@gmail.com"target="_blank">robertmhaas@gmail.com</a>></span> wrote:<br /><blockquote
class="gmail_quote"style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Sun, Apr
29,2012 at 8:12 AM, Erik Rijkers <<a href="mailto:er@xs4all.nl">er@xs4all.nl</a>> wrote:<br /> > Perhaps I'm
tooearly with these tests, but FWIW I reran my earlier test program against three<br /> > instances. (the patches
compiledfine, and make check was without problem).<br /><br /></div>These tests results seem to be more about the
pg_trgmchanges than the<br /> patch actually on this thread, unless I'm missing something. But the<br /> executive
summaryseems to be that pg_trgm might need to be a bit<br /> smarter about costing the trigram-based search, because
whenthe<br /> number of trigrams is really big, using the index is<br /> counterproductive. Hopefully that's not too
hardto fix; the basic<br /> approach seems quite promising.</blockquote><div class="gmail_quote"><br /></div><div
class="gmail_quote">Right.When number of trigrams is big, it is slow to scan posting list of all of them. The solution
isthis case is to exclude most frequent trigrams from index scan. But, it require some kind of statistics of trigrams
frequencieswhich we don't have. We could estimate frequencies using some hard-coded assumptions about natural
languages.Or we could exclude arbitrary trigrams. But I don't like both these ideas. This problem is also relevant for
LIKE/ILIKEsearch using trigram indexes.</div><div class="gmail_quote"><br /></div><div class="gmail_quote">Something
similarcould occur in tsearch when we search for "frequent_term & rare_term". In some situations (depending on
termsfrequencies) it's better to exclude frequent_term from index scan and do recheck. We have relevant statistics to
dosuch decision, but it doesn't seem to be feasible to get it using current GIN interface.</div><div
class="gmail_quote"><br/></div><div class="gmail_quote">Probably you have some comments on idea of conversion from
pg_wcharto multibyte? Is it acceptable at all?</div><br />------<br />With best regards,<br />Alexander Korotkov.</div>