Re: pg_trgm version 1.2 - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: pg_trgm version 1.2 |
Date | |
Msg-id | CAPpHfdtUfkY7Hv+7dLQPYFWrW1xrUb1jvPrdwL84eyGN6y1E5w@mail.gmail.com Whole thread Raw |
In response to | pg_trgm version 1.2 (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: pg_trgm version 1.2
|
List | pgsql-hackers |
This patch implements version 1.2 of contrib module pg_trgm.This supports the triconsistent function, introduced in version 9.4 of the server, to make it faster to implement indexed queries where some keys are common and some are rare.
I've included the paths to both upgrade and downgrade between 1.1 and 1.2, although after doing so you must close and restart the session before you can be sure the change has taken effect. There is no change to the on-disk index structure
create extension pg_trgm version "1.1";create table foo as select
md5(random()::text)|| case when random()<0.000005 then 'lmnop' else '123' end ||
md5(random()::text) as bar
from generate_series(1,10000000);
create index on foo using gin (bar gin_trgm_ops);
--some queries
alter extension pg_trgm update to "1.2";
--close, reopen, more queries
select count(*) from foo where bar like '%12344321lmnabcddd%';
V1.1: Time: 1743.691 ms --- after repeated execution to warm the cache
V1.2: Time: 2.839 ms --- after repeated execution to warm the cache
You could get the same benefit just by increasing MAX_MAYBE_ENTRIES (in core) from 4 to some higher value (which it probably should be anyway, but there will always be a case where it needs to be higher than you can afford it to be, so a real solution is needed).
I wasn't sure if this should be a new version of pg_trgm or not, because there is no user visible change other than to performance. But there may be some cases where it results in performance reduction and so it is nice to provide options. Also, I'd like to use it in a back-branch, so versions seems to be the right way to go there.
There is a lot of code duplication between the binary consistent function and the ternary one. I thought it the duplication was necessary in order to support both 1.1 and 1.2 from the same code base.
There may also be some gains in the similarity and regex cases, but I didn't really analyze those for performance.
I've thought about how to document this change. Looking to other example of other contrib modules with multiple versions, I decided that we don't document them, other than in the release notes.
The same patch applies to 9.4 code with a minor conflict in the Makefile, and gives benefits there as well.
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
pgsql-hackers by date: