Thread: The case for preserving case.
Hello, postgresql hackers. I am working with a client with a 20k record MySQL database (that will shortly expand to 100k/1m) and a few thousand lines of PHP code that does a lot of DB interaction. Their application, with a lot of relationships between data and a bunch of data integrity requirements is perfectly suited to postgresql. The PHP code follows a coding standard wherein variables are assigned CamelCase identifiers. All of the objects persist themselves to the DB, with a variable per column; on object initialization db columns are read from the db and added as attributes of the object. All of this breaks when I start to use postgresql, because all of the attributes become lowercased. Fixing this problem involves one of three things: 1.) rewriting all the code to have lowercased identifiers. This is effectively renaming everything, as long camel case attributes become much harder to read when they're lowercased. This also changes the clients' preferred coding standard. 2.) using double quotes around all identifiers in sql statements. As you're probably aware, there's no string format in PHP that lets you write double quote marks unescaped (and do variable substitution), so this involves rewriting hundreds of lines and imposing ongoing overhead for every SQL query. 3.) escaping 4 lines in src/backend/parser/scansup.c , where identifiers are lowercased. I understand that the reason for lowercasing is because odbc connections, etc expect case insensitivity, but the current behaviour isn't an SQL standard nor is it really case insensitivity. I would love case insensitivity with case preservation, but since that evidently is a more complicated option, I would like to know how I can formulate the 'case preserving' option in a way to make it palatable for inclusion. -- nothing can happen inside a sphere that you could not inscribe upon it. ~mindlace http://mindlace.net
Hi EMF, emf wrote: > Hello, postgresql hackers. > All of this breaks when I start to use postgresql, because all of the > attributes become lowercased. Well, I'm afraid that something will have to change in the app. Your email arrives while we are in the middle of a small war around the subject of case changing, but none of the options going around at the moment will match your request. The two options considered are leaving things as they are (con: non standard complient) or folding to uppercase (con: ugly and requires transition, unfortunetly, in this order). > > Fixing this problem involves one of three things: > > 1.) rewriting all the code to have lowercased identifiers. This is > effectively renaming everything, as long camel case attributes become > much harder to read when they're lowercased. This also changes the > clients' preferred coding standard. Ok, so don't do it. > 2.) using double quotes around all identifiers in sql statements. As > you're probably aware, there's no string format in PHP that lets you > write double quote marks unescaped (and do variable substitution), so > this involves rewriting hundreds of lines and imposing ongoing > overhead for every SQL query. But is consistant with both SQL standard, and with current MySQL setup. This will also make porting the app to further database, such as commercial ones, easier. I think that is the proper way or you to go. > > 3.) escaping 4 lines in src/backend/parser/scansup.c , where > identifiers are lowercased. You can do that for your own postgres installation, if you really like. I'm not a member of the steering commitee, but I doubt you'll manage to get such a patch accepted. Maybe as a per-session setting.... > > I understand that the reason for lowercasing is because odbc > connections, etc expect case insensitivity, but the current behaviour > isn't an SQL standard Neither is yours, though. If we are going to change it, we might as well change it to the standard, don't you think? > nor is it really case insensitivity. It's as close as you can get without losing the benefits of doing hash lookups. > I would love case insensitivity with case preservation, but since that > evidently is a more complicated option, I would like to know how I can > formulate the 'case preserving' option in a way to make it palatable > for inclusion. Can't help you there, unless you want to compile Postgres yourself. Shachar -- Shachar Shemesh Lingnu Open Source Consulting http://www.lingnu.com/
On Apr 24, 2004, at 00:48, Tom Lane wrote: > You do realize that any such patch would be at least a thousand times > larger than that? I am coming from a state of ignorance past the fact that commenting out four lines of code appeared to create the behaviour I desired. I knew that just changing it to match the behaviour *I* wanted isn't the same thing as making a change that could work for everyone; that's why I asked what sort of implementation of this behaviour would be acceptable. > And have vast repercussions on existing client code? I don't want to impose this on anyone else, I just want a postgresql that doesn't mangle my case, as case carries meaning in my application. From what I've seen online, other people migratingaway from MySQL would like this behaviour to be an option as well. > I'm willing to debate this, but not with people who claim it's a > four-line change. Do some research. You are welcome to not pay attention to what I have to say; I will probably never be deeply involved in the PostgreSQL codebase. I am willing to do more work to make this option that is very useful to me more widely acceptable. -- Living on earth and in space are the same class of problem. In one, the environment is harshly inimical to humans: in the other, the inverse is true. ~mindlace http://mindlace.net
On Apr 24, 2004, at 11:17, Jan Wieck wrote: > I don't think that we will break backward compatibility for existing > PostgreSQL specific code in order to gain CamelCase+MySQL porting ease > by adopting an even less standard compliant behaviour than we > currently have. I understand and agree that breaking backward compatibility is not an option. > As things are today, we are case insensitive for unquoted identifiers > and breaking that is not an option. What you do to unquoted identifiers is not case insensitivity, but lowercase folding. > I see a chance for getting your desired behaviour, case preservation, > only as a side effect if a larger move towards the standard. That sounds great! I'd like to help if I can. > This would not be a simple per postmaster config option or even a > compiletime setting, but rather a per database option in the > pg_database system catalog, chosen at CREATE DATABASE time. This also sounds good, but with my vast ignorance of postgresql, I have no idea the proper way to tell scansup.c to knock it off (or fold up, or fold down) based on something in the pg_database system catalog. > The real problem with this is that it has far greater side effects > than you seem to imagine yet. [snip described problem] I am probably still not understanding, since if the internals always quote in their queries, it would seem that the internals could continue to use lowercase identifiers regardless of the DB setting. > I am certain that most of us are open for a more complete proposal > that includes moving towards the ANSI standard, but the change you > outlined below is not acceptable. I understand that. I am willing to do work to make a more complete proposal, but I would appreciate some guidance as to how to code something that would be more acceptable. I read in another thread that the stuff going on in scansup.c isn't allowed/shouldn't talk to the database, so I freely admit I don't know how to approach a palatable solution. Worst case scenario, I'm content with keeping my hacked version of postgresql so that I can get this application ported faster. Making that happen fast and taking advantage of postgresql's superior features will help convince my client that the thousands of dollars he's spending in this port were worthwhile ... once I've done that, gradually transitioning to case-insensitive identifiers is possible, but right now all he sees is big transition pain for gain he hasn't seen yet. -- nothing can happen inside a sphere that you could not inscribe upon it. ~mindlace http://mindlace.net