Thread: LIKE and Locale
I'm a little frustrated select * from mytable where mystring = 'foo'; Uses an index select * from mytable where mystring like 'foo'; Does not use an index. I know Tom is not to excited about this, but I think it is a serious problem. What really brings me to this is that I just installed 7.4.2. It is my first real deployment of PostgreSQL in about a year and a half. Unknown to me, the default for my latest DB was not type 'C' but "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE' to use an index without surrounding the index and query with some function, like lower(). This "upgrade" seriously broke a working installation. In the foggy recesses of my mind, I vaguely recalled locale issues with various non-english languages. This shouldn't have been a problem as I was, I thought, just using the default. Surprisingly, SHOW ALL, showed differently. I recreate the database with --no-locale, then it works, obviously. Yea, this amounts to an RTFM issue, granted, but shouldn't various locales be able to work with LIKE? Shouldn't "en_US.iso885915" work with "LIKE?" Shouldn't database creation with anything but 'C' issue a warning? The real issue here is that one has to know that the behavior of "LIKE" is dependent on the locale to understand the problem. Yes it is briefly mentioned in the FAQ, but it is not obvious as a common problem in the UNIX world. As far as I can tell it is a PostgreSQL only issue that the locale setting in the system seriously affects functionality. It is further compounded by the fact that this setting can not be changed without recreating the database. Given a non-trivally sized database, this is no small issue. (Don't get me wrong, these RTFM landmines are great for the consultant and support industry, keep up the good work ;-))
> > On Wed, 31 Mar 2004 pgsql@mohawksoft.com wrote: > >> I'm a little frustrated >> >> select * from mytable where mystring = 'foo'; >> >> Uses an index >> >> select * from mytable where mystring like 'foo'; >> >> Does not use an index. >> >> I know Tom is not to excited about this, but I think it is a serious >> problem. What really brings me to this is that I just installed 7.4.2. >> It > > I agree with Tom mostly. It'd be nice for cases to be better optimized in > general, but optimizing basically degenerate cases seems futile especially > when there's a generally better workaround (see below) I'm not convinced that one optimization must de-optimize something else. Also, I am suspicious of "work arounds" being suggested as norms. > >> is my first real deployment of PostgreSQL in about a year and a half. >> Unknown to me, the default for my latest DB was not type 'C' but >> "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE' >> to use an index without surrounding the index and query with some > > What about making an index with the <whatever>_pattern_ops opclass which > IIRC is supposed to allow index use on LIKE even for anchored searches > in non-C locales. At issue, would this require a change of the SQL query? If it requires changing the query, then PostgreSQL places too much of a burden on the application writer when it comes to supporting multiple databases.
On Wed, 31 Mar 2004 pgsql@mohawksoft.com wrote: > I'm a little frustrated > > select * from mytable where mystring = 'foo'; > > Uses an index > > select * from mytable where mystring like 'foo'; > > Does not use an index. > > I know Tom is not to excited about this, but I think it is a serious > problem. What really brings me to this is that I just installed 7.4.2. It I agree with Tom mostly. It'd be nice for cases to be better optimized in general, but optimizing basically degenerate cases seems futile especially when there's a generally better workaround (see below) > is my first real deployment of PostgreSQL in about a year and a half. > Unknown to me, the default for my latest DB was not type 'C' but > "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE' > to use an index without surrounding the index and query with some What about making an index with the <whatever>_pattern_ops opclass which IIRC is supposed to allow index use on LIKE even for anchored searches in non-C locales.
On Wed, 31 Mar 2004 pgsql@mohawksoft.com wrote: > > On Wed, 31 Mar 2004 pgsql@mohawksoft.com wrote: > > > >> I'm a little frustrated > >> > >> select * from mytable where mystring = 'foo'; > >> > >> Uses an index > >> > >> select * from mytable where mystring like 'foo'; > >> > >> Does not use an index. > >> > >> I know Tom is not to excited about this, but I think it is a serious > >> problem. What really brings me to this is that I just installed 7.4.2. > >> It > > > > I agree with Tom mostly. It'd be nice for cases to be better optimized in > > general, but optimizing basically degenerate cases seems futile especially > > when there's a generally better workaround (see below) > > I'm not convinced that one optimization must de-optimize something else. But, given limited developer resources, optimizing degenerate sql is probably not the best use unless someone feels strongly enough about it to do it themselves. > Also, I am suspicious of "work arounds" being suggested as norms. The workaround in this case is to make an index that works with LIKE even in non "C" locales. I qualified it as a workaround because potentially you might need two indexes on the field. However, given that it's not limited to non-wildcard containing strings, it's also more generally useful. > >> is my first real deployment of PostgreSQL in about a year and a half. > >> Unknown to me, the default for my latest DB was not type 'C' but > >> "en_US.iso885915" and thus no amount of work would have allowed a 'LIKE' > >> to use an index without surrounding the index and query with some > > > > What about making an index with the <whatever>_pattern_ops opclass which > > IIRC is supposed to allow index use on LIKE even for anchored searches > > in non-C locales. > > At issue, would this require a change of the SQL query? If it requires > changing the query, then PostgreSQL places too much of a burden on the > application writer when it comes to supporting multiple databases. No, it involves making an index using the built-in <whatever>_pattern_ops operator class (which is mentioned in the operator class part of the index documentation I think, but probably needs better mention) Something like:CREATE INDEX indblah on tab(col text_pattern_ops)