Thread: Case preserving - suggestions

Case preserving - suggestions

From
Shachar Shemesh
Date:
Hi list,

A postgresql migration I am doing (the same one for which the OLE DB 
driver was written) has finally passed the proof-of-concept stage 
(phew). I now have lots and lots of tidbits, tricks and tips for SQL 
Server migration, which I would love to put online. Is pgFoundry the 
right place? I understand that the code snippets section is not yet 
operative, but I would still love to put it online ASAP (i.e. - before I 
forget), and to have it all in one place.

One problem detected during that stage, however, was that the program 
pretty much relies on the collation being case insensitive. I am now 
trying to gather the info regarding adding case preserving to 
Postgresql. I already suggested that we do that by changing the 
procedures, and the idea was turned down. For example, a column UNIQUE 
constraint must enforce that only one instance of a string be present, 
case insensitive. Then again, making everything lower/upper case before 
putting it in was also rejected. Case preserving is what we are looking for.

Now, one idea that floated through my mind, and I have not yet looked 
into how difficult it would be to implement was to define a new system 
wide collation, called, for example, en_USCI. Have that collation define 
'a' and 'A' as "the same character". I'm looking for someone with more 
experience with these things than me (i.e. - just about anyone) to say 
whether such a thing is doable. I know I can reorder sort criteria using 
collation, but can I make two characters be actually the same? As a side 
note, I'll mention that MsSQL uses the collation field to define case 
insensitivity.

Assuming that fails, how hard would it be to create a case insensitive 
PostgreSQL? Would that be more like changing a couple of places (say, 
hash computation and string compares), or would that entail making 
hundreds of little changes all over the code? Is there anything in the 
regression testing infrastructure that can help check such a change?

Many thanks,            Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/



Re: Case preserving - suggestions

From
Tom Lane
Date:
Shachar Shemesh <psql@shemesh.biz> writes:
> One problem detected during that stage, however, was that the program 
> pretty much relies on the collation being case insensitive. I am now 
> trying to gather the info regarding adding case preserving to 
> Postgresql. I already suggested that we do that by changing the 
> procedures, and the idea was turned down. For example, a column UNIQUE 
> constraint must enforce that only one instance of a string be present, 
> case insensitive.

Is there a reason you can't just docreate unique index on mytable (upper(mycolumn))
?

> Now, one idea that floated through my mind, and I have not yet looked 
> into how difficult it would be to implement was to define a new system 
> wide collation, called, for example, en_USCI.

I don't know anything about the difficulty of this either.  I suspect
that it would be a bad idea to create a collation in which strcoll could
claim that nonidentical strings are equal --- you would likely find that
surprising things break.  You could try it though and see what happens.

> Assuming that fails, how hard would it be to create a case insensitive 
> PostgreSQL?

It would be a serious mistake to try to make a "case insensitive
PostgreSQL", if by that you mean hacking the system structure to enforce
case insensitivity globally.  You could reasonably think about inventing
a case-insensitive textual datatype, however.  (In fact, I'd not be
surprised if someone did that already.)  All you'd really need is to
write a set of case-insensitive comparison operators and define a btree
opclass with them.  Everything else you might want to do could be
handled by making the type binary-equivalent to text and piggybacking
on the existing functions for text.
        regards, tom lane


Re: Case preserving - suggestions

From
Robert Treat
Date:
On Sunday 06 June 2004 13:47, Shachar Shemesh wrote:
> Hi list,
>
> A postgresql migration I am doing (the same one for which the OLE DB
> driver was written) has finally passed the proof-of-concept stage
> (phew). I now have lots and lots of tidbits, tricks and tips for SQL
> Server migration, which I would love to put online. Is pgFoundry the
> right place? I understand that the code snippets section is not yet
> operative, but I would still love to put it online ASAP (i.e. - before I
> forget), and to have it all in one place.
>

We have a couple of articles about sql server migration up on techdocs now, so 
that seems the most appropriate place for documenting the process you went 
through.  

Robert Treat
-- 
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: Case preserving - suggestions

From
Shachar Shemesh
Date:
Robert Treat wrote:

>On Sunday 06 June 2004 13:47, Shachar Shemesh wrote:
>  
>
>>Hi list,
>>
>>A postgresql migration I am doing (the same one for which the OLE DB
>>driver was written) has finally passed the proof-of-concept stage
>>(phew). I now have lots and lots of tidbits, tricks and tips for SQL
>>Server migration, which I would love to put online. Is pgFoundry the
>>right place? I understand that the code snippets section is not yet
>>operative, but I would still love to put it online ASAP (i.e. - before I
>>forget), and to have it all in one place.
>>
>>    
>>
>
>We have a couple of articles about sql server migration up on techdocs now, so 
>that seems the most appropriate place for documenting the process you went 
>through.  
>
>Robert Treat
>  
>
We wrote a new data type (a tinyint replacement), that will not be part 
of postgresql any time soon. We need to put the code up somewhere.
We wrote a perl script that takes the output of the database dump done 
by MS-SQL, and converts it into an SQL script for postgres (including 
yanking the data from the MS-SQL tables, and inlining it into the 
Postgresql script using copy). That needs to go somewhere.
It is highly likely that people will find bugs, or want to make 
additions, to the above two, so source control seems like a necessary idea.
In short, I think a pgFoundry project seems better suited for our needs 
than static documentation.
            Shachar

-- 
Shachar Shemesh
Lingnu Open Source Consulting
http://www.lingnu.com/



Re: Case preserving - suggestions

From
Andrew Dunstan
Date:
Shachar Shemesh wrote:

> Robert Treat wrote:
>
>> On Sunday 06 June 2004 13:47, Shachar Shemesh wrote:
>>  
>>
>>> Hi list,
>>>
>>> A postgresql migration I am doing (the same one for which the OLE DB
>>> driver was written) has finally passed the proof-of-concept stage
>>> (phew). I now have lots and lots of tidbits, tricks and tips for SQL
>>> Server migration, which I would love to put online. Is pgFoundry the
>>> right place? I understand that the code snippets section is not yet
>>> operative, but I would still love to put it online ASAP (i.e. - 
>>> before I
>>> forget), and to have it all in one place.
>>>
>>>   
>>
>>
>> We have a couple of articles about sql server migration up on 
>> techdocs now, so that seems the most appropriate place for 
>> documenting the process you went through. 
>> Robert Treat
>>  
>>
> We wrote a new data type (a tinyint replacement), that will not be 
> part of postgresql any time soon. We need to put the code up somewhere.
> We wrote a perl script that takes the output of the database dump done 
> by MS-SQL, and converts it into an SQL script for postgres (including 
> yanking the data from the MS-SQL tables, and inlining it into the 
> Postgresql script using copy). That needs to go somewhere.
> It is highly likely that people will find bugs, or want to make 
> additions, to the above two, so source control seems like a necessary 
> idea.
> In short, I think a pgFoundry project seems better suited for our 
> needs than static documentation.
>
>             Shachar
>

Feel free to apply for a pgFoundry project - most are approved very quickly.

cheers

andrew