Re: wildcard search support for pg_trgm - Mailing list pgsql-hackers

From Alexander Korotkov
Subject Re: wildcard search support for pg_trgm
Date
Msg-id AANLkTik7sL1jWt-6Dg0C2s6ozj=QDXW9BcKzUtcoQwcg@mail.gmail.com
Whole thread Raw
In response to wildcard search support for pg_trgm  (Jan Urbański <wulczer@wulczer.org>)
Responses Re: wildcard search support for pg_trgm  (Jesper Krogh <jesper@krogh.cc>)
Re: wildcard search support for pg_trgm  (Alexander Korotkov <aekorotkov@gmail.com>)
List pgsql-hackers
Hi!

On Mon, Jan 24, 2011 at 3:07 AM, Jan Urbański <wulczer@wulczer.org> wrote:
I see two issues with this patch. First of them is the resulting index
size. I created a table with 5 copies of
/usr/share/dict/american-english in it and a gin index on it, using
gin_trgm_ops. The results were:

 * relation size: 18MB
 * index size: 109 MB

while without the patch the GIN index was 43 MB. I'm not really sure
*why* this happens, as it's not obvious from reading the patch what
exactly is this extra data that gets stored in the index, making it more
than double its size.
Do you sure that you did comparison correctly? The sequence of index building and data insertion does matter. I tried to build gin index on  5 copies of /usr/share/dict/american-english with patch and got 43 MB index size.
 
That leads me to the second issue. The pg_trgm code is already woefully
uncommented, and after spending quite some time reading it back and
forth I have to admit that I don't really understand what the code does
in the first place, and so I don't understand what does that patch
change. I read all the changes in detail and I could't find any obvious
mistakes like reading over array boundaries or dereferencing
uninitialized pointers, but I can't tell if the patch is correct
semantically. All test cases I threw at it work, though.
I'll try to write sufficient comment and send new revision of patch.
 
This patch changes the names and signatures of some support functions
for GIN, and I'm not sure how that affects binary compatibility and
pg_upgrade. I tried to create an index with the vanilla source, and then
recompile pg_trgm and reindex the table, but it still was not using the
index. I think it's because it's missing entries in the catalogs about
the index supporting the like strategy. How should this be handled?
This patch don't alters structure of index. It only adds strategies for index scan. In order update this index one should recreate operator class (it will require to drop index). It can be done by sequential uninstall_pg_trgm.sql and pg_trgm.sql. After that new index can be created and it will support like strategy. Although actually there is no need of index recreation, I don't see easier way to do this.
 
----
With best regards,
Alexander Korotkov.

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Include WAL in base backup
Next
From: Jesper Krogh
Date:
Subject: Re: wildcard search support for pg_trgm