Thread: ISO8859_1 vs UTF-8 Performance?
I am not a PostgresSQL user but would like to hear from users who have configured their database to use the UTF-8 charset, especially any users using the case and accent insensitive collation.
I am using a open source SQL server at this time using ISO8859_1 and performance is excellent but I recreated the structure using UTF-8 and imported the data in a test environment and the performance was totally unacceptable. Example:An unoptimized (no index) query under ISO8859_1 runs in about 15ms and took almost 6.5 seconds under UTF-8.
Creating an index did help but performance was still unacceptable. Using the same query under ISO8859_1 now took a reported 0ms while taking 109ms under UTF-8.
On 10/22/2013 4:15 PM, Caroline Beltran wrote: > I am not a PostgresSQL user but would like to hear from users who have > configured their database to use the UTF-8 charset, especially any > users using the case and accent insensitive collation. > > I am using a open source SQL server at this time using ISO8859_1 and > performance is excellent but I recreated the structure using UTF-8 and > imported the data in a test environment and the performance was > totally unacceptable. Example: > > An unoptimized (no index) query under ISO8859_1 runs in about 15ms and > took almost 6.5 seconds under UTF-8. > > Creating an index did help but performance was still unacceptable. > Using the same query under ISO8859_1 now took a reported 0ms while > taking 109ms under UTF-8. I rarely see more than a 5% difference between SQL_ASCII (eg, no char encoding at all) and en_US.utf8 with postgresql. postgres doesn't have any sort of 'case insensitive collation' built in, there's citext as an addon, and you can do things with functional indexes of lower(fieldname). -- john r pierce 37N 122W somewhere on the middle of the left coast
On 10/22/2013 6:07 PM, Caroline Beltran wrote: > John, thank you for taking the time to respond. Performance wise, a 5% > difference is much better than I thought possible. > > The only concern in regards to my post would be in regards to accent > character collation, i.e.: > > Angel Smith > Ángel Smith > Angel Williams > ... > > Additionally, accent insensitive searching is also useful because > people may not type in the accented character when searching. Is it > possible to create a user defined function can be created and then > used during index creation? If so, can that index be used to sort > your data display as well as for your SELECT statements? the built in lower() (or upper() functions work with the database's defined LC_CTYPE setting, and properly convert upper/lower case. but, yes, it IS possible to use user defined functions for a functional index, just not required in this case as the required functionality is built in. And, as I said, there is also the contributed "CITEXT" extension, distributed with the postgres core, http://www.nytimes.com/interactive/2012/02/12/us/entitlement-map.html?ref=us which further simplifies this. -- john r pierce 37N 122W somewhere on the middle of the left coast
On 10/22/2013 7:37 PM, John R Pierce wrote: > And, as I said, there is also the contributed "CITEXT" extension, > distributed with the postgres core, > http://www.nytimes.com/intera... > which further simplifies this. wow. I have NO idea how that link crept in there. I thought I pasted http://www.postgresql.org/docs/current/static/citext.html -- john r pierce 37N 122W somewhere on the middle of the left coast