pg_trgm version 1.2 - Mailing list pgsql-hackers

From Jeff Janes
Subject pg_trgm version 1.2
Date
Msg-id CAMkU=1woR_Pdmie6d-zj6sDOPiHd_iUe3vZSXFGe_i4-AQYsJQ@mail.gmail.com
Whole thread Raw
Responses Re: pg_trgm version 1.2  (Merlin Moncure <mmoncure@gmail.com>)
Re: pg_trgm version 1.2  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
List pgsql-hackers
This patch implements version 1.2 of contrib module pg_trgm.

This supports the triconsistent function, introduced in version 9.4 of the server, to make it faster to implement indexed queries where some keys are common and some are rare.

I've included the paths to both upgrade and downgrade between 1.1 and 1.2, although after doing so you must close and restart the session before you can be sure the change has taken effect. There is no change to the on-disk index structure

This shows the difference it can make in some cases:

create extension pg_trgm version "1.1";

create table foo as select  

  md5(random()::text)|| case when random()<0.000005 then 'lmnop' else '123' end || 

  md5(random()::text) as bar 

from generate_series(1,10000000);

create index on foo using gin (bar gin_trgm_ops);

--some queries

alter extension pg_trgm update to "1.2"; 

--close, reopen, more queries


select count(*) from foo where bar like '%12344321lmnabcddd%';

 

V1.1: Time: 1743.691 ms  --- after repeated execution to warm the cache

V1.2: Time:  2.839 ms      --- after repeated execution to warm the cache


You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in core) from 4 to some higher value (which it probably should be anyway, but there will always be a case where it needs to be higher than you can afford it to be, so a real solution is needed).


I wasn't sure if this should be a new version of pg_trgm or not, because there is no user visible change other than to performance.  But there may be some cases where it results in performance reduction and so it is nice to provide options.  Also, I'd like to use it in a back-branch, so versions seems to be the right way to go there.


There is a lot of code duplication between the binary consistent function and the ternary one.  I thought it the duplication was necessary in order to support both 1.1 and 1.2 from the same code base.


There may also be some gains in the similarity and regex cases, but I didn't really analyze those for performance.


I've thought about how to document this change.  Looking to other example of other contrib modules with multiple versions, I decided that we don't document them, other than in the release notes.


The same patch applies to 9.4 code with a minor conflict in the Makefile, and gives benefits there as well.


Cheers,


Jeff

Attachment

pgsql-hackers by date:

Previous
From: Oskari Saarenmaa
Date:
Subject: Re: Solaris testers wanted for strxfrm() behavior
Next
From: Tatsuo Ishii
Date:
Subject: Re: pg_file_settings view vs. Windows