Thread: get rid of SQL_ASCII?
Can we consider getting rid of the SQL_ASCII server-side "encoding"? I don't see any good use for it, and it's often a support annoyance, and it leaves warts all over the code. This would presumably be a multi-release effort. As a first step in accommodating users who have existing SQL_ASCII databases, we could change SQL_ASCII into a real encoding with conversion routines to all other encodings that only convert 7-bit ASCII characters. That way, users who use SQL_ASCII as real ASCII or don't care could continue to use it. Others would be forced to either set SQL_ASCII as the client encoding or adjust the encoding on the server. On the client side, the default libpq client "encoding" SQL_ASCII would be renamed to something like SAME or whatever, so the behavior would stay the same. Other ideas? Are there legitimate uses for SQL_ASCII?
On Thu, Sep 5, 2013 at 7:47 AM, Peter Eisentraut <peter_e@gmx.net> wrote: > Can we consider getting rid of the SQL_ASCII server-side "encoding"? I > don't see any good use for it, and it's often a support annoyance, and > it leaves warts all over the code. This would presumably be a > multi-release effort. > > As a first step in accommodating users who have existing SQL_ASCII > databases, we could change SQL_ASCII into a real encoding with > conversion routines to all other encodings that only convert 7-bit ASCII > characters. That way, users who use SQL_ASCII as real ASCII or don't > care could continue to use it. Others would be forced to either set > SQL_ASCII as the client encoding or adjust the encoding on the server. > > On the client side, the default libpq client "encoding" SQL_ASCII would > be renamed to something like SAME or whatever, so the behavior would > stay the same. > > Other ideas? Are there legitimate uses for SQL_ASCII? performance? merlin
On 05.09.2013 15:47, Peter Eisentraut wrote: > Can we consider getting rid of the SQL_ASCII server-side "encoding"? I > don't see any good use for it, and it's often a support annoyance, and > it leaves warts all over the code. This would presumably be a > multi-release effort. I think "warts all over the code" is an overstatement. There aren't that many places in the code that care about SQL_ASCII, and they're all related to encoding conversions. > As a first step in accommodating users who have existing SQL_ASCII > databases, we could change SQL_ASCII into a real encoding with > conversion routines to all other encodings that only convert 7-bit ASCII > characters. That way, users who use SQL_ASCII as real ASCII or don't > care could continue to use it. Others would be forced to either set > SQL_ASCII as the client encoding or adjust the encoding on the server. > > On the client side, the default libpq client "encoding" SQL_ASCII would > be renamed to something like SAME or whatever, so the behavior would > stay the same. > > Other ideas? Are there legitimate uses for SQL_ASCII? One use is if you want to use some special encoding that's not supported by PostgreSQL, and you want PostgreSQL to just regurgitate any strings as is. It's not common, but would be strange to remove that capability altogether, IMHO. I agree it would be nice to have a "real" ASCII encoding, which only accepts 7-bit ASCII characters. And it would be nice if "SQL_ASCII" was called something else, like "UNDEFINED" or "BYTE_PER_CHAR", to make the meaning more clear. But I'm not in favor of deprecating it altogether. Also, during backend initialization there is a phase where client_encoding has not been set yet, and we don't do any conversions yet. That's exactly what SQL_ASCII means, so even if we get rid of SQL_ASCII, we'd still need to have some encoding value in the backend to mean that intermediate state. - Heikki
On Thu, Sep 05, 2013 at 08:47:32AM -0400, Peter Eisentraut wrote: > Can we consider getting rid of the SQL_ASCII server-side "encoding"? I > don't see any good use for it, and it's often a support annoyance, and > it leaves warts all over the code. This would presumably be a > multi-release effort. > > As a first step in accommodating users who have existing SQL_ASCII > databases, we could change SQL_ASCII into a real encoding with > conversion routines to all other encodings that only convert 7-bit ASCII > characters. That way, users who use SQL_ASCII as real ASCII or don't > care could continue to use it. Others would be forced to either set > SQL_ASCII as the client encoding or adjust the encoding on the server. > > On the client side, the default libpq client "encoding" SQL_ASCII would > be renamed to something like SAME or whatever, so the behavior would > stay the same. > > Other ideas? Are there legitimate uses for SQL_ASCII? > Hi Peter, Yes, we have processes that insert data from a large number of locales into the same database and we need to process the information in a locale agnostic way, just a a range of bytes. Not to mention how much faster it can be. Regards, Ken
Peter, > Other ideas? Are there legitimate uses for SQL_ASCII? Migrating from MySQL. We've had some projects where we couldn't fix MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the receiving side. If it hadn't been available, the user would have given up on Postgres. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 09/05/2013 09:42 AM, Josh Berkus wrote: > > Peter, > >> Other ideas? Are there legitimate uses for SQL_ASCII? > > Migrating from MySQL. We've had some projects where we couldn't fix > MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the > receiving side. If it hadn't been available, the user would have given > up on Postgres. iconv? > -- Command Prompt, Inc. - http://www.commandprompt.com/ 509-416-6579 PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC, @cmdpromptinc For my dreams of your image that blossoms a rose in the deeps of my heart. - W.B. Yeats
Joshua D. Drake wrote: > > On 09/05/2013 09:42 AM, Josh Berkus wrote: > >>Other ideas? Are there legitimate uses for SQL_ASCII? > > > >Migrating from MySQL. We've had some projects where we couldn't fix > >MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the > >receiving side. If it hadn't been available, the user would have given > >up on Postgres. > > iconv? Command Prompt helped a customer normalize encodings in their data, which was a mixture of Latin1 and UTF8. PGLoader was used for this, in two stages; the first run in UTF8 saved the rejected data to a file which was loaded in the second run as Latin1. This worked like a charm. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
On Thu, Sep 05, 2013 at 09:42:17AM -0700, Josh Berkus wrote: > Peter, > > > Other ideas? Are there legitimate uses for SQL_ASCII? > > Migrating from MySQL. We've had some projects where we couldn't fix > MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the > receiving side. If it hadn't been available, the user would have given > up on Postgres. > +++1 :) Ken
On 09/05/2013 10:02 AM, Alvaro Herrera wrote: > Joshua D. Drake wrote: >> iconv? > > Command Prompt helped a customer normalize encodings in their data, > which was a mixture of Latin1 and UTF8. PGLoader was used for this, in > two stages; the first run in UTF8 saved the rejected data to a file > which was loaded in the second run as Latin1. This worked like a charm. There's certainly alternatives. But all of the alternatives increase the cost of the migration (either in staff time or in downtime), which increases the likelyhood that the organization will abandon the migration. Anyway, I think we've established that there are enough "legitimate" uses for SQL_ASCII that we can't casually discard it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Thu, Sep 05, 2013 at 09:53:18AM -0700, Joshua D. Drake wrote: > > On 09/05/2013 09:42 AM, Josh Berkus wrote: > > > >Peter, > > > >>Other ideas? Are there legitimate uses for SQL_ASCII? > > > >Migrating from MySQL. We've had some projects where we couldn't fix > >MySQL's non-enforcement text garbage, and had to use SQL_ASCII on the > >receiving side. If it hadn't been available, the user would have given > >up on Postgres. > > iconv? > Yes, you can use iconv but then you have to check that it generated values that do not break your system including the application logic. That can prove a major stumbling block to changing DBs. Regards, Ken
On 09/05/2013 08:47 PM, Peter Eisentraut wrote: > Other ideas? Are there legitimate uses for SQL_ASCII? IMO people who want SQL_ASCII should actually be storing everything in `bytea`; that's a truer reflection of what they're actually storing, retrieving, and working with and how they're doing it. Unfortunately there'll be enough users of it around that I don't think we can drop it. What we SHOULD be doing is making it an explicit decision to use SQL_ASCII, and NEVER creating a cluster or database with that encoding by default. Ever. If we can't decide what the correct default encoding is (say, if locale is "C") we should error out unless a specific flag is set. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On 09/06/2013 09:14 AM, Craig Ringer wrote: > On 09/05/2013 08:47 PM, Peter Eisentraut wrote: >> Other ideas? Are there legitimate uses for SQL_ASCII? > > IMO people who want SQL_ASCII should actually be storing everything in > `bytea`; that's a truer reflection of what they're actually storing, > retrieving, and working with and how they're doing it. Practically speaking, the escaping gets in the way, and there isn't full feature parity with TEXT. Regular expression matching seems to be missing, for instance. But apart from that, yes, BYTEA would be the more appropriate choice. -- Florian Weimer / Red Hat Product Security Team
Craig Ringer <craig@2ndquadrant.com> writes: > What we SHOULD be doing is making it an explicit decision to use > SQL_ASCII, and NEVER creating a cluster or database with that encoding > by default. Ever. If we can't decide what the correct default encoding > is (say, if locale is "C") we should error out unless a specific flag is > set. There's a large undercurrent of "I say it's bad for you" in this thread, with frankly nothing to back it up. If we try to be as nanny-ish as you're suggesting here, we'll just annoy users. And just to push back on the specific point: SQL_ASCII *is* the correct default encoding for C locale. Both are agnostic about the meaning of anything outside the 7-bit ASCII set, while not rejecting such data. regards, tom lane
On Fri, Sep 6, 2013 at 10:19 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > There's a large undercurrent of "I say it's bad for you" in this thread, > with frankly nothing to back it up. If we try to be as nanny-ish as > you're suggesting here, we'll just annoy users. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> wrote: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There's a large undercurrent of "I say it's bad for you" in >> this thread, with frankly nothing to back it up. If we try to >> be as nanny-ish as you're suggesting here, we'll just annoy >> users. > > +1. +1 I can definitely see a place for an ASCII7 encoding which would reject anything with the high bit set; but there is a clear place for the current SQL_ASCII, too. Eliminating it would be much pain for no discernible gain. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company