Re: bytea encode performance issues - Mailing list pgsql-general

From Merlin Moncure
Subject Re: bytea encode performance issues
Date
Msg-id b42b73150808070641p4a45a67bicca80e7227f13687@mail.gmail.com
Whole thread Raw
In response to Re: bytea encode performance issues  ("Merlin Moncure" <mmoncure@gmail.com>)
Responses Re: bytea encode performance issues  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: bytea encode performance issues  (Sim Zacks <sim@compulab.co.il>)
List pgsql-general
On Thu, Aug 7, 2008 at 9:38 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Aug 7, 2008 at 1:16 AM, Sim Zacks <sim@compulab.co.il> wrote:
>>
>>> I don't quite follow that...the whole point of utf8 encoded database
>>> is so that you can use text functions and operators without the bytea
>>> treatment.  As long as your client encoding is set up properly (so
>>> that data coming in and out is computed to utf8), then you should be
>>> ok.  Dropping to ascii is usually not the solution.  Your data
>>> inputting application should set the client encoding properly and
>>> coerce data into the unicode text type...it's really the only
>>> solution.
>>>
>> Email does not always follow a specific character set. I have tried
>> converting the data that comes in to utf-8 and it does not always work.
>> We receive Hebrew emails which come in mostly 2 flavors, UTF-8 and
>> windows-1255. Unfortunately, they are not compatible with one another.
>> SQL-ASCII and ASCII are different as someone on the list pointed out to
>> me. According to the documentation, SQL-ASCII makes no assumption about
>> encoding, so you can throw in any encoding you want.
>
> no, you can't! SQL-ASCII means that the database treats everything
> like ascii.  This means that any operation that deals with text could
> (and in the case of Hebrew, almost certianly will) be broken.  Simple
> things like getting the length of a string will be wrong.  If you are
> accepting unicode input, you absolutely must be using a unicode
> encoded backend.

er, I see the problem (single piece of text with multiple encodings
inside) :-).  ok, it's more complicated than I thought.  still, you
need to convert the email to utf8.  There simply must be a way,
otherwise your emails are not well defined.  This is a client side
problem...if you push it to the server in ascii, you can't use any
server side text operations reliably.

merlin

merlin

pgsql-general by date:

Previous
From: "Igor Neyman"
Date:
Subject: Re: Create Table Dinamic
Next
From: "Anderson dos Santos Donda"
Date:
Subject: Re: Create Table Dinamic