Thread: Re: [SQL] Case Preservation disregarding case sensitivity?
On Friday 27 October 2006 19:38, Joe wrote: > Hi Beau, > > On Fri, 2006-10-27 at 16:23 -0700, beau hargis wrote: > > I am hoping that there is an easy way to obtain case-preservation with > > case-insensitivity, or at the very least, case-preservation and complete > > case-sensitivity, or case-preservation and a consistant case-conversion > > strategy. > > > > The case of the column names need to be preserved because that is the way > > the schema is designed and most importantly (VERY, VERY IMPORTANT), > > column names are used in apps as hash values, or as named references > > which are case sensitive and as such need to be delivered to the client > > in exactly in the manner specified at the time of table creation. > > > > I went through the same issue in my conversion from MySQL to Postgres > and (since I had a small application) I ended up changing up all my > tables and columns "UserProfile" to user_profile. > > I'm afraid however, that it's MySQL that is the odd man out. I haven't > researched this completely but I believe PG follows either the FIPS-127 > or SQL-92 standard with respect to what are called "delimited > identifiers". Basically, this says if you want case sensitivity in > identifier names, you have to use double quotes wherever you refer to > the identifier. Without the double quotes, the SQL implementor can > either use UPPERCASE (as I believe Oracle and DB2 do) or lowercase (as > PG does) when it displays those identifiers. > > Joe Again, I am at the same point I was at when searching and searching for information on the problem, and I am not sure what the SQL standard has to say about it: I do not, and I dont think that anyone else who is struggling to deal with the problem does, care about case-sensitivity. I am interested in case preservation of column names. I do, indeed, want identifiers treated in a case insensitive way, but I want the case PRESERVED in the table definitions and I want that case, as preserved, to be reflected in the field names as returned by the server to any client library that connects and initiates a query. Case-preservation is not the same as case-sensitivity; nor is case-normalization the same as case-insensitivity. What PostgreSQL is doing is converting any, and all, identifiers to a lower case and then matching those against the identifiers (as stored in the table definition) in a case-sensitive manner. It 'normalizes' the case of the identifiers so that it has a common internal representation; the desires of the programmer and database architect be damned. Referenced specification details: From FIPS-127: ======================================= 3. Delimited identifiers. In the previous ANSI SQL specification, it was not possible for an application to specify identifiers with spaces or other special symbols. Also, it was not possible to protect against future assaults on the name space for (identifier) by additions to the (reserved word) list. The new facility for (delimited identifier) allows a user to enclose all identifiers in double-quotation marks, thereby ensuring that the name defined or referenced may contain spaces or other special symbols and will not be impacted by future additions to the (reserved word) list. ======================================= From SQL-92/Sec. 5.2: ======================================= 10)The <identifier body> of a <regular identifier> is equivalent to an <identifier body> in which every letter that is a lower- case letter is replaced by the equivalent upper-case letter or letters. This treatment includes determination of equiva- lence, representation in the Information and Definition Schemas, representation in the diagnostics area, and similar uses. 11)The <identifier body> of a <regular identifier> (with every letter that is a lower-case letter replaced by the equivalent upper-case letter or letters), treated as the repetition of a <character string literal> that specifies a <character set specification> of SQL_TEXT, shall not be equal, according to the comparison rules in Subclause 8.2, "<comparison predicate>", to any <reserved word> (with every letter that is a lower-case letter replaced by the equivalent upper-case letter or letters), treated as the repetition of a <character string literal> that specifies a <character set specification> of SQL_TEXT. 12)Two <regular identifier>s are equivalent if their <identifier body>s, considered as the repetition of a <character string literal> that specifies a <character set specification> of SQL_TEXT, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>". 13)A <regular identifier> and a <delimited identifier> are equiva- lent if the <identifier body> of the <regular identifier> (with every letter that is a lower-case letter replaced by the equiva- lent upper-case letter or letters) and the <delimited identifier body> of the <delimited identifier> (with all occurrences of <quote> replaced by <quote symbol> and all occurrences of <dou- blequote symbol> replaced by <double quote>), considered as the repetition of a <character string literal> that specifies a <character set specification> of SQL_TEXT and an implementation- defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>". 14)Two <delimited identifier>s are equivalent if their <delimited identifier body>s (with all occurrences of <quote> replaced by <quote symbol> and all occurrences of <doublequote symbol> replaced by <doublequote>), considered as the repetition of a <character string literal> that specifies a <character set spec- ification> of SQL_TEXT and an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>". ======================================== Having installed DB2 Enterprise today and taking it for a spin, it does indeed behave in a similar manner. However, after reading through both specifications, it seems that DB2 follows more of the spec than PostgreSQL. The specifications state that for purpose of comparing identifiers, both shall be converted to upper-case. DB2 displays all identifiers in upper-case whereas PostgreSQL displays all identifiers in lower-case. This alone would be a deviation from the specification. It seems, however, that the regular-identifier/delimited-identifier distinction is for the embedding of spaces or normally invalid characters from other character sets, not for actually creating case-sensitive identifiers. It is only a side-effect. It does not enforce any method for the representation of the identifiers which the client expects, either. Since a query can be constructed which specifies aliases (... AS ...), the presentation of the data is implementation/runtime specific. Considering the differences that already exist between database systems and their varying compliance with SQL and the various extensions that have been created, I do not consider that the preservation of case for identifiers would violate any SQL standard. The end result being eaiser migration, the continued operation of cross-platform tools and applications, and a database designer or data architect having more flexibility in the construction and presentation of a database. Again, the distinction between case-sensitivity and case-preservation, and, case-insensitivity and case-normalization needs to be recognized. Having case-preservation defined in the CREATE TABLE syntax and/or a server configuration seems, to me, to be a valuable addition. Of course, making it an option would not interfere with the current state of affairs. I have looked through the source code and have identified places which need to be patched to make this possible. Tomorrow I will work on that and see what it will impact in the code. Let me know what you think.
beau hargis <beauh@bluefrogmobile.com> writes: > Considering the differences that already exist between database systems and > their varying compliance with SQL and the various extensions that have been > created, I do not consider that the preservation of case for identifiers > would violate any SQL standard. That's not how I read the spec. It is true that we are not 100% spec compliant, but that isn't a good argument for moving further away from spec. Not to mention breaking backwards compatibility with our historical behavior. The change you propose would fix your application at the cost of breaking other people's applications. Perhaps you should consider fixing your app instead. regards, tom lane
At Teradata, we certainly interpreted the spec to allow case-preserving, but case-insensitive, identifiers. Users really liked it that way: If you re-created a CREATE TABLE statement from the catalog, you could get back exactly the case the user had entered, but people using the table didn't need to worry about case. And column titles in reports would have the nice case preserving information. Sort of like how Windows systems treat file names... The case is preserved, but you don't need to know it to access the file. I know UNIX users usually think "case-preserving with case-insensitive" a foreign concept, but that doesn't mean the average user feels the same. If I want my column named "WeeklyTotalSales", I really don't want to have to always quote it and type in the exact case. -----Original Message----- From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Tom Lane Sent: Monday, October 30, 2006 7:24 PM To: beau hargis Cc: pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity? beau hargis <beauh@bluefrogmobile.com> writes: > Considering the differences that already exist between database systems and > their varying compliance with SQL and the various extensions that have been > created, I do not consider that the preservation of case for identifiers > would violate any SQL standard. That's not how I read the spec. It is true that we are not 100% spec compliant, but that isn't a good argument for moving further away from spec. Not to mention breaking backwards compatibility with our historical behavior. The change you propose would fix your application at the cost of breaking other people's applications. Perhaps you should consider fixing your app instead. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Have you searched our list archives? http://archives.postgresql.org
"Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > At Teradata, we certainly interpreted the spec to allow case-preserving, > but case-insensitive, identifiers. Really? As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2) 26) A <regular identifier> and a <delimited identifier> are equivalent if the <identifier body> of the <regularidentifier> (with every letter that is a lower-case letter replaced by the corresponding upper-caseletter or letters) and the <delimited identifier body> of the <delimited identifier> (with all occurrences of <quote> replaced by <quote symbol> and all occurrences of <doublequote symbol> replaced by <doublequote>), considered as the repetition of a <character string literal> that specifies a <characterset specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>". 27) Two <delimited identifier>s are equivalent if their <delimited identifier body>s, considered as therepetition of a <character string literal> that specifies a <character set specification> of SQL_IDENTIFIERand an implementation-defined collation that is sensitive to case, compare equally according to the comparison rules in Subclause 8.2, "<comparison predicate>". Note well the "sensitive to case" bits there. Now consider CREATE TABLE tab ( "foobar" int, "FooBar" timestamp, "FOOBAR" varchar(3)); We can *not* reject this as containing duplicate column names, else we have certainly violated rule 27. Now what will you do with SELECT fooBar FROM tab; ? The spec is unquestionably on the side of "you selected the varchar column"; historical Postgres practice is on the side of "you selected the int column". AFAICS a case-insensitive approach would have to fail with some "I can't identify which column you mean" error. I am interested to see where you find support for that in the spec... regards, tom lane
We treated quoted identifiers as case-specific, as the spec requires. In the catalog, we stored TWO columns... The column name with case converted as appropriate (as PostgreSQL already does), used for looking up the attribute, And a second column, which was the column name with the case exactly as entered by the user. So, your example would work just fine. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, October 30, 2006 10:35 PM To: Chuck McDevitt Cc: beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity? "Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > At Teradata, we certainly interpreted the spec to allow case-preserving, > but case-insensitive, identifiers. Really? As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2) 26) A <regular identifier> and a <delimited identifier> are equivalent if the <identifier body> of the <regular identifier> (with every letter that is a lower-case letter replaced by the corresponding upper-case letter or letters) and the <delimited identifier body> of the <delimited identifier> (with all occurrences of <quote> replaced by<quote symbol> and all occurrences of <doublequote symbol> replaced by <double quote>), considered as the repetition of a <character string literal> that specifies a <character setspecification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compare equally accordingto the comparison rules in Subclause 8.2, "<comparison predicate>". 27) Two <delimited identifier>s are equivalent if their <delimited identifier body>s, considered as the repetition of a <character string literal> that specifies a <character set specification> of SQL_IDENTIFIER and an implementation-defined collation that is sensitive to case, compareequally according to the comparison rules in Subclause 8.2, "<comparison predicate>". Note well the "sensitive to case" bits there. Now consider CREATE TABLE tab ( "foobar" int, "FooBar" timestamp, "FOOBAR" varchar(3)); We can *not* reject this as containing duplicate column names, else we have certainly violated rule 27. Now what will you do with SELECT fooBar FROM tab; ? The spec is unquestionably on the side of "you selected the varchar column"; historical Postgres practice is on the side of "you selected the int column". AFAICS a case-insensitive approach would have to fail with some "I can't identify which column you mean" error. I am interested to see where you find support for that in the spec... regards, tom lane
Oh... And Microsoft SQLServer does something similar. At Greenplum, we've already gotten complaints from customers about this when they were switching from MSSQL to GP's PostgreSQL-based database. -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Monday, October 30, 2006 10:35 PM To: Chuck McDevitt Cc: beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case sensitivity? "Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > At Teradata, we certainly interpreted the spec to allow case-preserving, > but case-insensitive, identifiers.
There was some discussion a couple of years ago on the -hackers list about it, so you might like to review the archives. The consensus seemed to be that behaviour would need to be set no later than createdb time. The options I thought of were: . current postgres behaviour (we need to do this for legacy reasons, of course, as well as to keep happy the legions who hate using upper case for anything) . strictly spec compliant (same as current behaviour, but folding to upper case for unquoted identifiers rather than lower) . fully case sensitive even for unquoted identifiers (not spec compliant at all, but nevertheless possibly attractive especially for people migrating from MS SQLServer, where it is an option, IIRC). To this you propose, as I understand it, to have a fourth possibility which would be spec compliant for comparison purposes but would label result set columns with the case preserved name originally used (or would you use the casing used in the query?). These could be accomplished I think with a second catalog column like you suggest, in a number of places, but making sure all the code paths were covered might be somewhat laborious. We could probably add the second option without being nearly so invasive, though, and some people might feel that that would be sufficient. cheers andrew Chuck McDevitt wrote: > We treated quoted identifiers as case-specific, as the spec requires. > > In the catalog, we stored TWO columns... The column name with case > converted as appropriate (as PostgreSQL already does), used for looking > up the attribute, > And a second column, which was the column name with the case exactly as > entered by the user. > > So, your example would work just fine. > > > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, October 30, 2006 10:35 PM > To: Chuck McDevitt > Cc: beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case > sensitivity? > > "Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > >> At Teradata, we certainly interpreted the spec to allow >> > case-preserving, > >> but case-insensitive, identifiers. >> > > Really? > > As I see it, the controlling parts of the SQL spec are (SQL99 sec 5.2) > > 26) A <regular identifier> and a <delimited identifier> are > equivalent if the <identifier body> of the <regular > identifier> > (with every letter that is a lower-case letter replaced by > the > corresponding upper-case letter or letters) and the > <delimited > identifier body> of the <delimited identifier> (with all > occurrences of <quote> replaced by <quote symbol> and all > occurrences of <doublequote symbol> replaced by <double > quote>), > considered as the repetition of a <character string literal> > that specifies a <character set specification> of > SQL_IDENTIFIER > and an implementation-defined collation that is sensitive to > case, compare equally according to the comparison rules in > Subclause 8.2, "<comparison predicate>". > > 27) Two <delimited identifier>s are equivalent if their > <delimited > identifier body>s, considered as the repetition of a > <character > string literal> that specifies a <character set > specification> > of SQL_IDENTIFIER and an implementation-defined collation > that is sensitive to case, compare equally according to the > comparison rules in Subclause 8.2, "<comparison predicate>". > > Note well the "sensitive to case" bits there. Now consider > > CREATE TABLE tab ( > "foobar" int, > "FooBar" timestamp, > "FOOBAR" varchar(3) > ); > > We can *not* reject this as containing duplicate column names, else we > have certainly violated rule 27. Now what will you do with > > SELECT fooBar FROM tab; > > ? The spec is unquestionably on the side of "you selected the varchar > column"; historical Postgres practice is on the side of "you selected > the int column". AFAICS a case-insensitive approach would have to > fail with some "I can't identify which column you mean" error. I am > interested to see where you find support for that in the spec... > >
On Tue, Oct 31, 2006 at 12:55:46PM -0500, Andrew Dunstan wrote: > To this you propose, as I understand it, to have a fourth possibility > which would be spec compliant for comparison purposes but would label > result set columns with the case preserved name originally used (or > would you use the casing used in the query?). The big issue I see with this is that it would break PQfname on the client end, since that's case sensetive too. Most client languages are, so you really are between a rock and a hard place. Making PQfname case-insensetive also screws up in Tom's example. One way to appraoch this is to consider this a setting of the collation of the name datatype. If a case-insensetive collation is selected at initdb time, then Tom's example would indeed fail, but that's a choice someone made. Problem being, you'd have to export that choice to clients to make PQfname work, and that's going to messy. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
On Tue, 31 Oct 2006, Chuck McDevitt wrote: > We treated quoted identifiers as case-specific, as the spec requires. > > In the catalog, we stored TWO columns... The column name with case > converted as appropriate (as PostgreSQL already does), used for looking > up the attribute, > And a second column, which was the column name with the case exactly as > entered by the user. Wouldn't using that second column's value tend to often violate 5.2SR10 (at least that's the reference item in SQL92)? AFAICT, that rule basically says that the regular identifier is equivalent to the case-folded one for purposes of information and definition schema and similar purposes which seems like it would be intended to include things like column labeling for output. There's a little bit of flexibility there on both similar purposes and equivalence, though. ---- 10) The <identifier body> of a <regular identifier> is equivalent to an <identifier body> in which every letter that isa lower- case letter is replaced by the equivalent upper-case letter or letters. This treatment includes determinationof equiva- lence, representation in the Information and Definition Schemas, representation in the diagnosticsarea, and similar uses.
-----Original Message----- From: Stephan Szabo [mailto:sszabo@megazone.bigpanda.com] Sent: Tuesday, October 31, 2006 10:23 AM To: Chuck McDevitt Cc: Tom Lane; beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case On Tue, 31 Oct 2006, Chuck McDevitt wrote: > We treated quoted identifiers as case-specific, as the spec requires. > > In the catalog, we stored TWO columns... The column name with case > converted as appropriate (as PostgreSQL already does), used for looking > up the attribute, > And a second column, which was the column name with the case exactly as > entered by the user. Wouldn't using that second column's value tend to often violate 5.2SR10 (at least that's the reference item in SQL92)? AFAICT, that rule basically says that the regular identifier is equivalent to the case-folded one for purposes of information and definition schema and similar purposes which seems like it would be intended to include things like column labeling for output. There's a little bit of flexibility there on both similar purposes and equivalence, though. Equivalent, yes. But I can interpret that clause it mean I can show either the case folded or non-case-folded value in the information schema, as they are equivalent. Anyway, we have many things that are "enhancements" beyond the spec, and this could also be considered an enhancement.
"Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > Equivalent, yes. But I can interpret that clause it mean I can show > either the case folded or non-case-folded value in the information > schema, as they are equivalent. Well, that's an interesting bit of specs-lawyering, but I don't see how you can defend it against these rules in SQL99 5.2: 21) For every <identifier body> IB there is exactly one corresponding case-normal form CNF. CNF is an <identifierbody> derived from IB as follows. Let n be the number of characters in IB. For i ranging from 1 (one) to n, the i-th character M(i) ofIB is translated into the corresponding character or characters of CNF as follows. Case: a) If M(i) is a lower case character or a title case character for which an equivalent upper casesequence U is defined by Unicode, then let j be the number of characters in U; the next j charactersof CNF are U. b) Otherwise, the next character of CNF is M(i). 22) The case-normal form of the <identifier body> of a <regular identifier> is used for purposes such asand including determination of identifier equivalence, representation in the Definition and InformationSchemas, and representation in diagnostics areas. NOTE 44 - Any lower-case letters for which there are no upper- case equivalents are left in their lower-caseform. Again, obviously we are not compliant because we fold to lower rather than upper case, but I do not see how you can read (22) as not requiring the information schema to show the upper-cased form. The output of functions such as PQfname() might be considered closer to diagnostics info than information schema, but that's covered too. But the really serious problem with what you propose is that it would allow two table columns with names that the system considers distinct to show as the same string in the information schema and diagnostic outputs. That can't be acceptable --- it's going to break any application that does any nontrivial analysis of what it sees there, not to mention that it violates various primary key constraints in the information schema specification. regards, tom lane
Sorry, my last mail wasn't well thought out. Yes, the information_schema needs the case-folded name (although it might be ok to add additional columns to the information_schema for extra information). But, stepping back from all that, what is it the users want? 1) When re-creating a CREATE TABLE statement from whatever catalog info, they'd like the names to come back exactly as then entered them.If I do: CREATE TABLE BobsTable (WeeklySalesnumeric(10,2), "SomeStrangeName" int); They'd like to see exactly that when the CREATE TABLE gets re-created, not what we do now: CREATE TABLE bobstable (weeklysales numeric(10,2), "SomeStrangeName" int); 2) When doing reports, they'd like the name as entered to be the title of the column: Select * from bobstable; Would be nice if they saw this: WeeklySales SomeStrangeName ----------- --------------- For compatibility with existing apps and spec compliance, you'd still want PQfname() to return the case-folded name. But there isn't any reason you couldn't also return a "suggested title" field (PQftitle?) which preserves the user's case. You could also extend the concept of a PQftitle to make nicer titles for expressions. Instead of SELECT sum(WeeklySales) from BobsTable; Producing "?column?" or somesuch to use in the report, it could return a title like "sum(WeeklySales)" -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Tuesday, October 31, 2006 10:38 PM To: Chuck McDevitt Cc: Stephan Szabo; beau hargis; pgsql-sql@postgresql.org; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] [SQL] Case Preservation disregarding case "Chuck McDevitt" <cmcdevitt@greenplum.com> writes: > Equivalent, yes. But I can interpret that clause it mean I can show > either the case folded or non-case-folded value in the information > schema, as they are equivalent. Well, that's an interesting bit of specs-lawyering, but I don't see how you can defend it against these rules in SQL99 5.2: 21) For every <identifier body> IB there is exactly one corresponding case-normal form CNF. CNF is an <identifier body> derived from IB as follows. Let n be the number of characters in IB. For i ranging from 1 (one) to n, the i-th character M(i) of IB is translated into the corresponding character or characters of CNF as follows. Case: a) If M(i) is a lower case character or a title case character for which an equivalent upper case sequence U is defined by Unicode, then let j be the number of characters in U; the next j characters of CNF are U. b) Otherwise, the next character of CNF is M(i). 22) The case-normal form of the <identifier body> of a <regular identifier> is used for purposes such asand including determination of identifier equivalence, representation in the Definition and InformationSchemas, and representation in diagnostics areas. NOTE 44 - Any lower-case letters for which there are no upper- case equivalents are left in their lower-case form. Again, obviously we are not compliant because we fold to lower rather than upper case, but I do not see how you can read (22) as not requiring the information schema to show the upper-cased form. The output of functions such as PQfname() might be considered closer to diagnostics info than information schema, but that's covered too. But the really serious problem with what you propose is that it would allow two table columns with names that the system considers distinct to show as the same string in the information schema and diagnostic outputs. That can't be acceptable --- it's going to break any application that does any nontrivial analysis of what it sees there, not to mention that it violates various primary key constraints in the information schema specification. regards, tom lane
On Wed, 2006-11-01 at 11:31 -0500, Chuck McDevitt wrote: > But, stepping back from all that, what is it the users want? > > 1) When re-creating a CREATE TABLE statement from whatever catalog > info, they'd like the names to come back exactly as then entered them. > If I do: > CREATE TABLE BobsTable (WeeklySales numeric(10,2), > "SomeStrangeName" int); > > They'd like to see exactly that when the CREATE TABLE gets > re-created, not what we do now: > > CREATE TABLE bobstable (weeklysales numeric(10,2), > "SomeStrangeName" int); This would be very good indEEd. It can be very annoying trying to locate a table when the user swears they called it one thing and actually the case or quotation is different. Current behaviour isn't useful, even if it is "onspec" (or is that OnSpec?). Would be better to make this behaviour a userset switchable between the exactly compliant and the more intuitive. We have namespaces to differentiate between two sources of object names, so anybody who creates a schema where MyColumn is not the same thing as myColumn is not following sensible rules for conceptual distance. It's certainly an error of best practice, even if its not actually a bug. > 2) When doing reports, they'd like the name as entered to be the title > of the column: > Select * from bobstable; > > Would be nice if they saw this: > WeeklySales SomeStrangeName > ----------- --------------- ... > Producing "?column?" or somesuch to use in the report, it could return a > title like "sum(WeeklySales)" That would be just great. I'm not sure the spec says what the titles should be, does it? -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
"Simon Riggs" <simon@2ndquadrant.com> writes: > We have namespaces to differentiate between two sources of object names, > so anybody who creates a schema where MyColumn is not the same thing as > myColumn is not following sensible rules for conceptual distance. I'd agree that that is not a good design practice, but the fact remains that they *are* different per spec. > Would be better to make this behaviour a userset > switchable between the exactly compliant and the more intuitive. That's certainly not happening --- if you make any changes in the semantics of equality of type name, it would have to be frozen no later than initdb time, for exactly the same reasons we freeze locale then (hint: index ordering). regards, tom lane
beau hargis wrote: > Having installed DB2 Enterprise today and taking it for a spin, it does indeed > behave in a similar manner. However, after reading through both > specifications, it seems that DB2 follows more of the spec than PostgreSQL. > The specifications state that for purpose of comparing identifiers, both > shall be converted to upper-case. DB2 displays all identifiers in upper-case > whereas PostgreSQL displays all identifiers in lower-case. This alone would > be a deviation from the specification. True. We lowercase because historically we have, and because all-upper-case is hard to read. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
On Thu, 2006-11-02 at 10:51 -0500, Tom Lane wrote: > "Simon Riggs" <simon@2ndquadrant.com> writes: > > We have namespaces to differentiate between two sources of object names, > > so anybody who creates a schema where MyColumn is not the same thing as > > myColumn is not following sensible rules for conceptual distance. > > I'd agree that that is not a good design practice, but the fact remains > that they *are* different per spec. > > > Would be better to make this behaviour a userset > > switchable between the exactly compliant and the more intuitive. > > That's certainly not happening --- if you make any changes in the > semantics of equality of type name, it would have to be frozen no > later than initdb time, for exactly the same reasons we freeze > locale then (hint: index ordering). [Re-read all of this after Bruce's post got me thinking.] My summary of the thread, with TODO items noted: 1. PostgreSQL doesn't follow the spec, but almost does, with regard to comparison of unquoted and quoted identifiers. DB2 does this per spec. 2. TODO: We could follow the spec, but it would need an initdb option; some non-SQL:2003 standard PostgreSQL programs would not work as they do now. This is considered a minor, low priority item, though. 3. TODO: We could set column headers better if we wanted to (rather than ?column? we could use e.g. Sum_ColumnName etc) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Nov 14, 2006, at 2:42 PM, Simon Riggs wrote: > On Thu, 2006-11-02 at 10:51 -0500, Tom Lane wrote: >> "Simon Riggs" <simon@2ndquadrant.com> writes: >>> We have namespaces to differentiate between two sources of object >>> names, >>> so anybody who creates a schema where MyColumn is not the same >>> thing as >>> myColumn is not following sensible rules for conceptual distance. >> >> I'd agree that that is not a good design practice, but the fact >> remains >> that they *are* different per spec. >> >>> Would be better to make this behaviour a userset >>> switchable between the exactly compliant and the more intuitive. >> >> That's certainly not happening --- if you make any changes in the >> semantics of equality of type name, it would have to be frozen no >> later than initdb time, for exactly the same reasons we freeze >> locale then (hint: index ordering). > > [Re-read all of this after Bruce's post got me thinking.] > > My summary of the thread, with TODO items noted: > > 1. PostgreSQL doesn't follow the spec, but almost does, with regard to > comparison of unquoted and quoted identifiers. DB2 does this per spec. > > 2. TODO: We could follow the spec, but it would need an initdb option; > some non-SQL:2003 standard PostgreSQL programs would not work as > they do > now. This is considered a minor, low priority item, though. > > 3. TODO: We could set column headers better if we wanted to (rather > than ?column? we could use e.g. Sum_ColumnName etc) Did the idea of preserving the original case and using that for output column names, /d, etc. get shot down? I thought it would be a useful addition... -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
Jim Nasby wrote: > On Nov 14, 2006, at 2:42 PM, Simon Riggs wrote: >> On Thu, 2006-11-02 at 10:51 -0500, Tom Lane wrote: >>> "Simon Riggs" <simon@2ndquadrant.com> writes: >>>> We have namespaces to differentiate between two sources of object >>>> names, >>>> so anybody who creates a schema where MyColumn is not the same >>>> thing as >>>> myColumn is not following sensible rules for conceptual distance. >>> >>> I'd agree that that is not a good design practice, but the fact remains >>> that they *are* different per spec. >>> >>>> Would be better to make this behaviour a userset >>>> switchable between the exactly compliant and the more intuitive. >>> >>> That's certainly not happening --- if you make any changes in the >>> semantics of equality of type name, it would have to be frozen no >>> later than initdb time, for exactly the same reasons we freeze >>> locale then (hint: index ordering). >> >> [Re-read all of this after Bruce's post got me thinking.] >> >> My summary of the thread, with TODO items noted: >> >> 1. PostgreSQL doesn't follow the spec, but almost does, with regard to >> comparison of unquoted and quoted identifiers. DB2 does this per spec. >> >> 2. TODO: We could follow the spec, but it would need an initdb option; >> some non-SQL:2003 standard PostgreSQL programs would not work as they do >> now. This is considered a minor, low priority item, though. >> >> 3. TODO: We could set column headers better if we wanted to (rather >> than ?column? we could use e.g. Sum_ColumnName etc) > > Did the idea of preserving the original case and using that for output > column names, /d, etc. get shot down? I thought it would be a useful > addition... I think there is broad agreement that we need to provide optional minimally spec compliant behaviour (fold unquoted to upper case, otherwise as now). I am not sure how invasive either the "case preserved + case insensitive comparison" or the "case preserved + case sensitive comparison" option would be. I don't know that anything has been ruled out. Until someone produces a patch or a definite design and analysis we are a bit in the dark. Personally, I would like to see all of these as options ;-) I think Simon's item 3 above is a separate issue. cheers andrew
Chuck McDevitt wrote: > At Teradata, we certainly interpreted the spec to allow case-preserving, > but case-insensitive, identifiers. > Users really liked it that way My 2 thoughts: 1: It seems like this behavior of case sensitive-or-not-identifiers could/should be a config option -- either globally for the server, database, or at the connection/session level. Other databases *do* support this type of granular config of misc SQL behavior -- its essential for shared hosting environments. Without it some users just *cant* make the switch. Quoting all an app's identifiers -- or renaming camel-case to underscored -- show stopper. 2: Even though the spec state different (that identifiers should be treated as case sensitive or else folded), precedence seems to have changed that: a) The databases that enforce this rule are fewer, I believe. IMO SQL is now considered even higher than a 4GL language because it use is so widespread - laymen need to use it. b) the fact that different identifiers of mixed case could even coexist in a table-columns or 'AS' or 'JOIN' -- really represents a more of an err'd design -- and a case-insen option would detect this (unlike the current behavior). It would throw an immediate ("fail fast") runtime exception. So I think it's *safer*. (If tbl.rowId and tbl.rowid both exist in a table or AS identifiers, something bad _will_ happen when someone takes over a project) If there were a new default behavior (or just config option added), my vote would, without a doubt, be for case-insens (yet case preserving) mode... even when using quoting identifiers. This case sen. behavior doesn't seem to offer any advantage/safety. ken
On Sat, Dec 02, 2006 at 12:41:37AM -0700, Ken Johanson wrote: > 1: It seems like this behavior of case sensitive-or-not-identifiers > could/should be a config option -- either globally for the server, > database, or at the connection/session level. Other databases *do* > support this type of granular config of misc SQL behavior -- its > essential for shared hosting environments. Without it some users just > *cant* make the switch. Quoting all an app's identifiers -- or renaming > camel-case to underscored -- show stopper. What about option 3: use camelcase without underscares and don't quote. Then you get case-insensetivity and it's still readable. You're obviously talking about an app which isn't quoting identifiers, so I'm not sure what the actual problem is. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Martijn van Oosterhout wrote: > On Sat, Dec 02, 2006 at 12:41:37AM -0700, Ken Johanson wrote: >> 1: It seems like this behavior of case sensitive-or-not-identifiers >> could/should be a config option -- either globally for the server, >> database, or at the connection/session level. Other databases *do* >> support this type of granular config of misc SQL behavior -- its >> essential for shared hosting environments. Without it some users just >> *cant* make the switch. Quoting all an app's identifiers -- or renaming >> camel-case to underscored -- show stopper. > > What about option 3: use camelcase without underscares and don't quote. > Then you get case-insensetivity and it's still readable. > > You're obviously talking about an app which isn't quoting identifiers, > so I'm not sure what the actual problem is. > > Have a nice day, Yes, I do routinely use non-quoted identifiers. The problem is, that they are case-folded (to lower in PG's case), so my camel-case does not match. For the query to work I MUST quote identifiers hat have camel-case. SELECT pers."firstName", pers.lastname, ... Has your experience with PG been different? If so I presume you have have found a config that allows?: SELECT pers.firstName, pers.lastname, Ken
An update on pg_comparator as an efficient way to do master-slave replication.<br /><br />I have been using it for 2 yearson a "products" table that has grown from 12,000 rows to 24,000 rows. There are 3 slaves and 1 master. It is sync'edevery 10 minutes.<br /><br />It has never failed or caused problems.<br /><br />On 23039 rows, with under 100 rowschanged, over a 3mbit internet connection, and on a the sync takes 3.3 seconds, 0.94 seconds of which is CPU time (1.86GHZ intel dual core). Most of the time is waiting for the network. And that could be sped up considerably with compression(maybe 5-10 times for my data)... I don't think the postgres communications protocol considers compression anoption.<br /><br />I do not synchronize all the columns ... just the 15 most important ones<br /><br />Average number ofbytes per row is 284<br /><br /> Primary Key is an autoincrement integer id<br /><br />Databases are all on the internetat cheap colocation centers with suppsedly 10mbit high speed connections that realistically get about 3 mbit.<br/><br /> I ship a backup and restore of the table every week... in case there are tons of changes and the systemburps when there are too many.... I also schedule scripts that might make lots of changes to happen before the dump/restore.<br/><br />In my 15 years as a DBA, I have never had "replication" (which some say this isn't, and I say that'sa matter of how you define it) work so well.<br /><br /><br />