Thread: ISO8859_1 vs UTF-8 Performance?

ISO8859_1 vs UTF-8 Performance?

From

Caroline Beltran

Date:

23 October 2013, 02:16:02

I am not a PostgresSQL user but would like to hear from users who have configured their database to use the UTF-8 charset, especially any users using the case and accent insensitive collation.

I am using a open source SQL server at this time using ISO8859_1 and performance is excellent but I recreated the structure using UTF-8 and imported the data in a test environment and the performance was totally unacceptable. Example:

An unoptimized (no index) query under ISO8859_1 runs in about 15ms and took almost 6.5 seconds under UTF-8.

Creating an index did help but performance was still unacceptable. Using the same query under ISO8859_1 now took a reported 0ms while taking 109ms under UTF-8.

Thank you for your time

Re: ISO8859_1 vs UTF-8 Performance?

From

John R Pierce

Date:

23 October 2013, 02:39:39

On 10/22/2013 4:15 PM, Caroline Beltran wrote:
> I am not a PostgresSQL user but would like to hear from users who have
> configured their database to use the UTF-8 charset, especially any
> users using the case and accent insensitive collation.
>
> I am using a open source SQL server at this time using ISO8859_1 and
> performance is excellent but I recreated the structure using UTF-8 and
> imported the data in a test environment and the performance was
> totally unacceptable.  Example:
>
> An unoptimized (no index) query under ISO8859_1 runs in about 15ms and
> took almost 6.5 seconds under UTF-8.
>
> Creating an index did help but performance was still unacceptable.
> Using the same query under ISO8859_1 now took a reported 0ms while
> taking 109ms under UTF-8.

I rarely see more than a 5% difference between SQL_ASCII (eg, no char
encoding at all) and en_US.utf8 with postgresql.

postgres doesn't have any sort of 'case insensitive collation' built in,
there's citext as an addon, and you can do things with functional
indexes of lower(fieldname).



--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: ISO8859_1 vs UTF-8 Performance?

From

John R Pierce

Date:

23 October 2013, 05:37:52

On 10/22/2013 6:07 PM, Caroline Beltran wrote:
> John, thank you for taking the time to respond. Performance wise, a 5%
> difference is much better than I thought possible.
>
> The only concern in regards to my post would be in regards to accent
> character collation, i.e.:
>
> Angel Smith
> Ángel Smith
> Angel Williams
> ...
>
> Additionally, accent insensitive searching is also useful because
> people may not type in the accented character when searching.  Is it
> possible to create a user defined function can be created and then
> used during index creation?  If so, can that index be used to sort
> your data display as well as for your SELECT statements?

the built in lower() (or upper() functions work with the database's
defined LC_CTYPE setting, and properly convert upper/lower case.

but, yes, it IS possible to use user defined functions for a functional
index, just not required in this case as the required functionality is
built in.

And, as I said, there is also the contributed "CITEXT" extension,
distributed with the postgres core,
http://www.nytimes.com/interactive/2012/02/12/us/entitlement-map.html?ref=us
which further simplifies this.

--
john r pierce                                      37N 122W
somewhere on the middle of the left coast

Re: ISO8859_1 vs UTF-8 Performance?

From

John R Pierce

Date:

23 October 2013, 06:41:06

On 10/22/2013 7:37 PM, John R Pierce wrote:
> And, as I said, there is also the contributed "CITEXT" extension,
> distributed with the postgres core,
> http://www.nytimes.com/intera...
> which further simplifies this.

wow.  I have NO idea how that link crept in there.
I thought I pasted
http://www.postgresql.org/docs/current/static/citext.html




--
john r pierce                                      37N 122W
somewhere on the middle of the left coast