Thread: The case for preserving case.

The case for preserving case.

From
emf
Date:
Hello, postgresql hackers.

I am working with a client with a 20k record MySQL database (that will 
shortly expand to 100k/1m) and a few thousand lines of PHP code that 
does a lot of DB interaction.

Their application, with a lot of relationships between data and a bunch 
of data integrity requirements is perfectly suited to postgresql.

The PHP code follows a coding standard wherein variables are assigned 
CamelCase identifiers. All of the objects persist themselves to the DB, 
with a variable per column; on object initialization db columns are 
read from the db and added as attributes of the object.

All of this breaks when I start to use postgresql, because all of the 
attributes become lowercased.

Fixing this problem involves one of three things:

1.) rewriting all the code to have lowercased identifiers. This is 
effectively renaming everything, as long camel case attributes become 
much harder to read when they're lowercased. This also changes the 
clients' preferred coding standard.

2.) using double quotes around all identifiers in sql statements. As 
you're probably aware, there's no string format in PHP that lets you 
write double quote marks unescaped (and do variable substitution), so 
this involves rewriting hundreds of lines and imposing ongoing overhead 
for every SQL query.

3.) escaping 4 lines in src/backend/parser/scansup.c , where 
identifiers are lowercased.

I understand that the reason for lowercasing is because odbc 
connections, etc expect case insensitivity, but the current behaviour 
isn't an SQL standard nor is it really case insensitivity. I would love 
case insensitivity with case preservation, but since that evidently is 
a more complicated option, I would like to know how I can formulate the 
'case preserving' option in a way to make it palatable for inclusion.
--
nothing can happen inside a sphere
that you could not inscribe upon it.
~mindlace        http://mindlace.net



Re: The case for preserving case.

From
Shachar Shemesh
Date:
Hi EMF,

emf wrote:

> Hello, postgresql hackers.
> All of this breaks when I start to use postgresql, because all of the 
> attributes become lowercased.

Well, I'm afraid that something will have to change in the app. Your 
email arrives while we are in the middle of a small war around the 
subject of case changing, but none of the options going around at the 
moment will match your request.

The two options considered are leaving things as they are (con: non 
standard complient) or folding to uppercase (con: ugly and requires 
transition, unfortunetly, in this order).

>
> Fixing this problem involves one of three things:
>
> 1.) rewriting all the code to have lowercased identifiers. This is 
> effectively renaming everything, as long camel case attributes become 
> much harder to read when they're lowercased. This also changes the 
> clients' preferred coding standard.

Ok, so don't do it.

> 2.) using double quotes around all identifiers in sql statements. As 
> you're probably aware, there's no string format in PHP that lets you 
> write double quote marks unescaped (and do variable substitution), so 
> this involves rewriting hundreds of lines and imposing ongoing 
> overhead for every SQL query.

But is consistant with both SQL standard, and with current MySQL setup. 
This will also make porting the app to further database, such as 
commercial ones, easier. I think that is the proper way or you to go.

>
> 3.) escaping 4 lines in src/backend/parser/scansup.c , where 
> identifiers are lowercased.

You can do that for your own postgres installation, if you really like. 
I'm not a member of the steering commitee, but I doubt you'll manage to 
get such a patch accepted. Maybe as a per-session setting....

>
> I understand that the reason for lowercasing is because odbc 
> connections, etc expect case insensitivity, but the current behaviour 
> isn't an SQL standard

Neither is yours, though. If we are going to change it, we might as well 
change it to the standard, don't you think?

> nor is it really case insensitivity.

It's as close as you can get without losing the benefits of doing hash 
lookups.

> I would love case insensitivity with case preservation, but since that 
> evidently is a more complicated option, I would like to know how I can 
> formulate the 'case preserving' option in a way to make it palatable 
> for inclusion.

Can't help you there, unless you want to compile Postgres yourself.
        Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/



Re: The case for preserving case.

From
emf
Date:
On Apr 24, 2004, at 00:48, Tom Lane wrote:
> You do realize that any such patch would be at least a thousand times
> larger than that?

I am coming from a state of ignorance past the fact that commenting out 
four lines of code appeared to create the behaviour I desired. I knew 
that just changing it to match the behaviour *I* wanted isn't the same 
thing as making a change that could work for everyone; that's why I 
asked what sort of implementation of this behaviour would be 
acceptable.

>  And have vast repercussions on existing client code?

I don't want to impose this on anyone else, I just want a postgresql 
that doesn't mangle my case, as case carries meaning in my application. From what I've seen online, other people
migratingaway from MySQL 
 
would like this behaviour to be an option as well.

> I'm willing to debate this, but not with people who claim it's a
> four-line change.  Do some research.

You are welcome to not pay attention to what I have to say; I will 
probably never be deeply involved in the PostgreSQL codebase.

I am willing to do more work to make this option that is very useful to 
me more widely acceptable.
--
Living on earth and in space are the same class of
problem. In one, the environment is harshly inimical to
humans: in the other, the inverse is true.
~mindlace                            http://mindlace.net



Re: The case for preserving case.

From
emf
Date:
On Apr 24, 2004, at 11:17, Jan Wieck wrote:
> I don't think that we will break backward compatibility for existing 
> PostgreSQL specific code in order to gain CamelCase+MySQL porting ease 
> by adopting an even less standard compliant behaviour than we 
> currently have.

I understand and agree that breaking backward compatibility is not an 
option.

>  As things are today, we are case insensitive for unquoted identifiers 
> and breaking that is not an option.

What you do to unquoted identifiers is not case insensitivity, but 
lowercase folding.

>  I see a chance for getting your desired behaviour, case preservation, 
> only as a side effect if a larger move towards the standard.

That sounds great! I'd like to help if I can.

> This would not be a simple per postmaster config option or even a 
> compiletime setting, but rather a per database option in the 
> pg_database system catalog, chosen at CREATE DATABASE time.

This also sounds good, but with my vast ignorance of postgresql, I have 
no idea the proper way to tell scansup.c to knock it off (or fold up, 
or fold down) based on something in the pg_database system catalog.

> The real problem with this is that it has far greater side effects 
> than you seem to imagine yet.

[snip described problem] I am probably still not understanding, since 
if the internals always quote in their queries, it would seem that the 
internals could continue to use lowercase identifiers regardless of the 
DB setting.

> I am certain that most of us are open for a more complete proposal 
> that includes moving towards the ANSI standard, but the change you 
> outlined below is not acceptable.

I understand that. I am willing to do work to make a more complete 
proposal, but I would appreciate some guidance as to how to code 
something that would be more acceptable. I read in another thread that 
the stuff going on in scansup.c isn't allowed/shouldn't talk to the 
database, so I freely admit I don't know how to approach a palatable 
solution.

Worst case scenario, I'm content with keeping my hacked version of 
postgresql so that I can get this application ported faster. Making 
that happen fast and taking advantage of postgresql's superior features 
will help convince my client that the thousands of dollars he's 
spending in this port were worthwhile ... once I've done that, 
gradually transitioning to case-insensitive identifiers is possible, 
but right now all he sees is big transition pain for gain he hasn't 
seen yet.
--
nothing can happen inside a sphere
that you could not inscribe upon it.
~mindlace        http://mindlace.net