Thread: locales and encodings Oh MY!
Hi all, So today while I investigating some slow queries I found out that some time ago some of our databases (due, innocently I suspect, to upgrades or hardware migrations) are now not using C as lc_ctype. And why some 'things' got a lot slower... To correct this problem it seems like I am going to have to do a dump, intidb, then restore. (or add some new indexes and eat the extra overhead). Unless someone knows something I don't about changing ctype and collation with some sleight of hand (if you do I am very interested) I have listed what I think I will be doing with regards to initdb. if anyone sees problems with the following mixture during my dump -> init-> restore I would be most keen in hearing about it. initdb /path/to/data/dir --lc_ctype=C --lc_collation=C --lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8 --lc_time=en_US.UTF8 -E UTF8 Thank you, Mark
Hi Mark, Il 12/11/10 03:31, mark ha scritto: > I have listed what I think I will be doing with regards to initdb. if anyone > sees problems with the following mixture during my dump -> init-> restore I > would be most keen in hearing about it. FYI, PostgreSQL 8.4 introduced database level collation. Before that, yes, you need to re-init your data dir. > initdb /path/to/data/dir --lc_ctype=C --lc_collation=C > --lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8 > --lc_time=en_US.UTF8 -E UTF8 Maybe you meant --lc_collate ? Cheers, Gabriele -- Gabriele Bartolini - 2ndQuadrant Italia PostgreSQL Training, Services and Support gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it
On Fri, Nov 12, 2010 at 12:45 AM, Gabriele Bartolini <gabriele.bartolini@2ndquadrant.it> wrote: > Hi Mark, > > Il 12/11/10 03:31, mark ha scritto: >> >> I have listed what I think I will be doing with regards to initdb. if >> anyone >> sees problems with the following mixture during my dump -> init-> >> restore I >> would be most keen in hearing about it. > > FYI, PostgreSQL 8.4 introduced database level collation. Before that, yes, > you need to re-init your data dir. Guess I missed that, I have 8.3.X boxes in production and 9.0.1 boxes in dev so I guess only someone of them will require a re-init. >> >> initdb /path/to/data/dir --lc_ctype=C --lc_collation=C >> --lc_message=en_US.UTF8 --lc_monetary=en_US.UTF8 --lc_numeric=en_US.UTF8 >> --lc_time=en_US.UTF8 -E UTF8 > > Maybe you meant --lc_collate ? > Yes I did mean lc_collate - thanks > Cheers, > Gabriele > With how similar straight C and en_US.UTF8 are it was suggested to me, by persons who are far more C knowledgeable then I in my office, that this is something the PG community could "fix" . A "fix" being so that "col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8 (and probably some others). @hackers -> is the request unreasonable ? anyone got any idea of the price tag to make that happen ? > -- > Gabriele Bartolini - 2ndQuadrant Italia > PostgreSQL Training, Services and Support > gabriele.bartolini@2ndQuadrant.it | www.2ndQuadrant.it > >
mark wrote: > A "fix" being so that "col LIKE 'foo%' " could use btree indexes in > locales like en_US.UTF8 (and probably some others). How about specifying an opclass?: http://www.postgresql.org/docs/current/interactive/indexes-opclass.html -Kevin
> With how similar straight C and en_US.UTF8 are it was suggested to me, > by persons who are far more C knowledgeable then I in my office, that > this is something the PG community could "fix" . A "fix" being so that > "col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8 > (and probably some others). > > > @hackers -> > is the request unreasonable ? anyone got any idea of the price tag to > make that happen ? I thought it already did that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: >> With how similar straight C and en_US.UTF8 are it was suggested to me, >> by persons who are far more C knowledgeable then I in my office, that >> this is something the PG community could "fix" . A "fix" being so that >> "col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8 >> (and probably some others). >> is the request unreasonable ? anyone got any idea of the price tag to >> make that happen ? > I thought it already did that. No, and the odds of it ever happening are insignificant. The sort order associated with en_US (and other "dictionary order" locales) is just too randomly different from what you need to optimize a LIKE search. (Whoever told you en_US sorts similarly to C is nuts.) The solution if you want the database's prevailing sort order to be en_US is to put an extra text_pattern_ops index on the column you want to do LIKE searches on. We might eventually have the ability to spell that "put a C-locale index on the column", but text_pattern_ops is the way to do it today. regards, tom lane
On Fri, Nov 12, 2010 at 12:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >>> With how similar straight C and en_US.UTF8 are it was suggested to me, >>> by persons who are far more C knowledgeable then I in my office, that >>> this is something the PG community could "fix" . A "fix" being so that >>> "col LIKE 'foo%' " could use btree indexes in locales like en_US.UTF8 >>> (and probably some others). > >>> is the request unreasonable ? anyone got any idea of the price tag to >>> make that happen ? > >> I thought it already did that. > > No, and the odds of it ever happening are insignificant. The sort order > associated with en_US (and other "dictionary order" locales) is just too > randomly different from what you need to optimize a LIKE search. > (Whoever told you en_US sorts similarly to C is nuts.) > > The solution if you want the database's prevailing sort order to be en_US > is to put an extra text_pattern_ops index on the column you want to do > LIKE searches on. We might eventually have the ability to spell that > "put a C-locale index on the column", but text_pattern_ops is the way to > do it today. > > regards, tom lane > Ok I hear you loud and clear. I am going to eat the overhead until I get to 9.0.1, currently on 8.3.X in some places. I will either take an outage and do a dump - re-init-restore or inplace upgrade and then do some locking, copy, drop old, rename new db path. thanks all. ..: Mark