Re: pg_trgm - Mailing list pgsql-hackers

From Greg Stark
Subject Re: pg_trgm
Date
Msg-id AANLkTikQHXUZF35Ukq6WEVGU3au_uzHltPnShX9YeltE@mail.gmail.com
Whole thread Raw
In response to Re: pg_trgm  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_trgm
List pgsql-hackers
On Sun, May 30, 2010 at 3:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I don't think it's unreasonable to insist that behavioral changes be
> made in an upward compatible fashion ... especially ones that seem as
> least as likely to break some current usages as to enable new usages.

Fwiw I don't think we've traditionally been so tense about contrib
modules. With the advent of extensions that users can easily install
with a single command that might be about to change though.

There seem to be three behaviours on the table here:

1) Status quo -- only alpha and digit characters for the current
locale are considered word elements

2) All characters aside from space characters for the current locale
are considered word elements

3) Alpha and digit characters for the current locale, and for C locale
any non-ascii (high bit set) character is considered a word element

1 -> 3 seems like a pretty safe change considering that anyone using
non-ascii characters in C locale probably isn't using pg_tgrm or they
would be complaining about it already. How big a user-base do we think
pg_tgrm has anyways? How many of those are using it on non-ascii
characters in C locale? And of those how many expect the non-ascii
characters to be considered non-word characters? It doesn't sound like
terribly useful behaviour to me.

Behaviour 2 also seems like it would be useful so providing it as well
is also a perfectly reasonable option. But I agree that 1->2 would be
a user-visible change for basically all users so it would have to be
an additional option.

-- 
greg


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: small exclusion constraints patch
Next
From: Tom Lane
Date:
Subject: Re: small exclusion constraints patch