Thread: get rid of SQL_ASCII?

get rid of SQL_ASCII?

From
Peter Eisentraut
Date:
Can we consider getting rid of the SQL_ASCII server-side "encoding"?  I
don't see any good use for it, and it's often a support annoyance, and
it leaves warts all over the code.  This would presumably be a
multi-release effort.

As a first step in accommodating users who have existing SQL_ASCII
databases, we could change SQL_ASCII into a real encoding with
conversion routines to all other encodings that only convert 7-bit ASCII
characters.  That way, users who use SQL_ASCII as real ASCII or don't
care could continue to use it.  Others would be forced to either set
SQL_ASCII as the client encoding or adjust the encoding on the server.

On the client side, the default libpq client "encoding" SQL_ASCII would
be renamed to something like SAME or whatever, so the behavior would
stay the same.

Other ideas?  Are there legitimate uses for SQL_ASCII?



Re: get rid of SQL_ASCII?

From
Merlin Moncure
Date:
On Thu, Sep 5, 2013 at 7:47 AM, Peter Eisentraut <peter_e@gmx.net> wrote:
> Can we consider getting rid of the SQL_ASCII server-side "encoding"?  I
> don't see any good use for it, and it's often a support annoyance, and
> it leaves warts all over the code.  This would presumably be a
> multi-release effort.
>
> As a first step in accommodating users who have existing SQL_ASCII
> databases, we could change SQL_ASCII into a real encoding with
> conversion routines to all other encodings that only convert 7-bit ASCII
> characters.  That way, users who use SQL_ASCII as real ASCII or don't
> care could continue to use it.  Others would be forced to either set
> SQL_ASCII as the client encoding or adjust the encoding on the server.
>
> On the client side, the default libpq client "encoding" SQL_ASCII would
> be renamed to something like SAME or whatever, so the behavior would
> stay the same.
>
> Other ideas?  Are there legitimate uses for SQL_ASCII?

performance?

merlin



Re: get rid of SQL_ASCII?

From
Heikki Linnakangas
Date:
On 05.09.2013 15:47, Peter Eisentraut wrote:
> Can we consider getting rid of the SQL_ASCII server-side "encoding"?  I
> don't see any good use for it, and it's often a support annoyance, and
> it leaves warts all over the code.  This would presumably be a
> multi-release effort.

I think "warts all over the code" is an overstatement. There aren't that 
many places in the code that care about SQL_ASCII, and they're all 
related to encoding conversions.

> As a first step in accommodating users who have existing SQL_ASCII
> databases, we could change SQL_ASCII into a real encoding with
> conversion routines to all other encodings that only convert 7-bit ASCII
> characters.  That way, users who use SQL_ASCII as real ASCII or don't
> care could continue to use it.  Others would be forced to either set
> SQL_ASCII as the client encoding or adjust the encoding on the server.
>
> On the client side, the default libpq client "encoding" SQL_ASCII would
> be renamed to something like SAME or whatever, so the behavior would
> stay the same.
>
> Other ideas?  Are there legitimate uses for SQL_ASCII?

One use is if you want to use some special encoding that's not supported 
by PostgreSQL, and you want PostgreSQL to just regurgitate any strings 
as is. It's not common, but would be strange to remove that capability 
altogether, IMHO.

I agree it would be nice to have a "real" ASCII encoding, which only 
accepts 7-bit ASCII characters. And it would be nice if "SQL_ASCII" was 
called something else, like "UNDEFINED" or "BYTE_PER_CHAR", to make the 
meaning more clear. But I'm not in favor of deprecating it altogether.

Also, during backend initialization there is a phase where 
client_encoding has not been set yet, and we don't do any conversions 
yet. That's exactly what SQL_ASCII means, so even if we get rid of 
SQL_ASCII, we'd still need to have some encoding value in the backend to 
mean that intermediate state.

- Heikki



Re: get rid of SQL_ASCII?

From
"ktm@rice.edu"
Date:
On Thu, Sep 05, 2013 at 08:47:32AM -0400, Peter Eisentraut wrote:
> Can we consider getting rid of the SQL_ASCII server-side "encoding"?  I
> don't see any good use for it, and it's often a support annoyance, and
> it leaves warts all over the code.  This would presumably be a
> multi-release effort.
> 
> As a first step in accommodating users who have existing SQL_ASCII
> databases, we could change SQL_ASCII into a real encoding with
> conversion routines to all other encodings that only convert 7-bit ASCII
> characters.  That way, users who use SQL_ASCII as real ASCII or don't
> care could continue to use it.  Others would be forced to either set
> SQL_ASCII as the client encoding or adjust the encoding on the server.
> 
> On the client side, the default libpq client "encoding" SQL_ASCII would
> be renamed to something like SAME or whatever, so the behavior would
> stay the same.
> 
> Other ideas?  Are there legitimate uses for SQL_ASCII?
> 
Hi Peter,

Yes, we have processes that insert data from a large number of locales
into the same database and we need to process the information in a locale
agnostic way, just a a range of bytes. Not to mention how much faster it
can be.

Regards,
Ken



Re: get rid of SQL_ASCII?

From
Josh Berkus
Date:
Peter,

> Other ideas?  Are there legitimate uses for SQL_ASCII?

Migrating from MySQL.  We've had some projects where we couldn't fix
MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
receiving side.  If it hadn't been available, the user would have given
up on Postgres.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: get rid of SQL_ASCII?

From
"Joshua D. Drake"
Date:
On 09/05/2013 09:42 AM, Josh Berkus wrote:
>
> Peter,
>
>> Other ideas?  Are there legitimate uses for SQL_ASCII?
>
> Migrating from MySQL.  We've had some projects where we couldn't fix
> MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
> receiving side.  If it hadn't been available, the user would have given
> up on Postgres.

iconv?

>


-- 
Command Prompt, Inc. - http://www.commandprompt.com/  509-416-6579
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc
For my dreams of your image that blossoms   a rose in the deeps of my heart. - W.B. Yeats



Re: get rid of SQL_ASCII?

From
Alvaro Herrera
Date:
Joshua D. Drake wrote:
> 
> On 09/05/2013 09:42 AM, Josh Berkus wrote:

> >>Other ideas?  Are there legitimate uses for SQL_ASCII?
> >
> >Migrating from MySQL.  We've had some projects where we couldn't fix
> >MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
> >receiving side.  If it hadn't been available, the user would have given
> >up on Postgres.
> 
> iconv?

Command Prompt helped a customer normalize encodings in their data,
which was a mixture of Latin1 and UTF8.  PGLoader was used for this, in
two stages; the first run in UTF8 saved the rejected data to a file
which was loaded in the second run as Latin1.  This worked like a charm.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: get rid of SQL_ASCII?

From
"ktm@rice.edu"
Date:
On Thu, Sep 05, 2013 at 09:42:17AM -0700, Josh Berkus wrote:
> Peter,
> 
> > Other ideas?  Are there legitimate uses for SQL_ASCII?
> 
> Migrating from MySQL.  We've had some projects where we couldn't fix
> MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
> receiving side.  If it hadn't been available, the user would have given
> up on Postgres.
> 
+++1  :)

Ken



Re: get rid of SQL_ASCII?

From
Josh Berkus
Date:
On 09/05/2013 10:02 AM, Alvaro Herrera wrote:
> Joshua D. Drake wrote:
>> iconv?
> 
> Command Prompt helped a customer normalize encodings in their data,
> which was a mixture of Latin1 and UTF8.  PGLoader was used for this, in
> two stages; the first run in UTF8 saved the rejected data to a file
> which was loaded in the second run as Latin1.  This worked like a charm.

There's certainly alternatives.  But all of the alternatives increase
the cost of the migration (either in staff time or in downtime), which
increases the likelyhood that the organization will abandon the migration.

Anyway, I think we've established that there are enough "legitimate"
uses for SQL_ASCII that we can't casually discard it.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: get rid of SQL_ASCII?

From
"ktm@rice.edu"
Date:
On Thu, Sep 05, 2013 at 09:53:18AM -0700, Joshua D. Drake wrote:
> 
> On 09/05/2013 09:42 AM, Josh Berkus wrote:
> >
> >Peter,
> >
> >>Other ideas?  Are there legitimate uses for SQL_ASCII?
> >
> >Migrating from MySQL.  We've had some projects where we couldn't fix
> >MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the
> >receiving side.  If it hadn't been available, the user would have given
> >up on Postgres.
> 
> iconv?
> 

Yes, you can use iconv but then you have to check that it generated
values that do not break your system including the application logic.
That can prove a major stumbling block to changing DBs.

Regards,
Ken



Re: get rid of SQL_ASCII?

From
Craig Ringer
Date:
On 09/05/2013 08:47 PM, Peter Eisentraut wrote:
> Other ideas?  Are there legitimate uses for SQL_ASCII?

IMO people who want SQL_ASCII should actually be storing everything in
`bytea`; that's a truer reflection of what they're actually storing,
retrieving, and working with and how they're doing it.

Unfortunately there'll be enough users of it around that I don't think
we can drop it.

What we SHOULD be doing is making it an explicit decision to use
SQL_ASCII, and NEVER creating a cluster or database with that encoding
by default. Ever. If we can't decide what the correct default encoding
is (say, if locale is "C") we should error out unless a specific flag is
set.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: get rid of SQL_ASCII?

From
Florian Weimer
Date:
On 09/06/2013 09:14 AM, Craig Ringer wrote:
> On 09/05/2013 08:47 PM, Peter Eisentraut wrote:
>> Other ideas?  Are there legitimate uses for SQL_ASCII?
>
> IMO people who want SQL_ASCII should actually be storing everything in
> `bytea`; that's a truer reflection of what they're actually storing,
> retrieving, and working with and how they're doing it.

Practically speaking, the escaping gets in the way, and there isn't full 
feature parity with TEXT.  Regular expression matching seems to be 
missing, for instance.

But apart from that, yes, BYTEA would be the more appropriate choice.

-- 
Florian Weimer / Red Hat Product Security Team



Re: get rid of SQL_ASCII?

From
Tom Lane
Date:
Craig Ringer <craig@2ndquadrant.com> writes:
> What we SHOULD be doing is making it an explicit decision to use
> SQL_ASCII, and NEVER creating a cluster or database with that encoding
> by default. Ever. If we can't decide what the correct default encoding
> is (say, if locale is "C") we should error out unless a specific flag is
> set.

There's a large undercurrent of "I say it's bad for you" in this thread,
with frankly nothing to back it up.  If we try to be as nanny-ish as
you're suggesting here, we'll just annoy users.

And just to push back on the specific point: SQL_ASCII *is* the correct
default encoding for C locale.  Both are agnostic about the meaning of
anything outside the 7-bit ASCII set, while not rejecting such data.
        regards, tom lane



Re: get rid of SQL_ASCII?

From
Robert Haas
Date:
On Fri, Sep 6, 2013 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> There's a large undercurrent of "I say it's bad for you" in this thread,
> with frankly nothing to back it up.  If we try to be as nanny-ish as
> you're suggesting here, we'll just annoy users.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: get rid of SQL_ASCII?

From
Kevin Grittner
Date:
Robert Haas <robertmhaas@gmail.com> wrote:
> Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There's a large undercurrent of "I say it's bad for you" in
>> this thread, with frankly nothing to back it up.  If we try to
>> be as nanny-ish as you're suggesting here, we'll just annoy
>> users.
>
> +1.

+1

I can definitely see a place for an ASCII7 encoding which would
reject anything with the high bit set; but there is a clear place
for the current SQL_ASCII, too.  Eliminating it would be much pain
for no discernible gain.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company