Thread: Case insensitive collation
The subject has been discussed on this mailing list before, recently.
To be able to switch from SQL Server to Postgresql, for me this is essential.
Therefore the question: are there plans to create a set of case insensitive, and maybe also accent insensitive collations in the near future?
I have no idea how complex this is, but it seems to me, looking at MySQL and SQL Server that it's not such a strange thing to expect from a database server.
I know I can use "lower" (even on indexes) and citext, but this feels like patchwork, and would mean a lot of work when converting our database (with 122 tables).
Regards,
Marcel van Pinxteren
Marcel van Pinxteren
Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> writes: > Therefore the question: are there plans to create a set of case > insensitive, and maybe also accent insensitive collations in the near > future? Not from the Postgres project -- we just use the collations supplied by the operating system. regards, tom lane
From the Microsoft site I learned
that they combine collation and "ComparisonStyle" to a collation name.
I thought that case insensitivity had to be built into the collation, but apparently MS built case sensitivity in the database engine.
This would mean that Postgresql would need to build case (in)sensitivity into her engine as well.
Judging from the small amount of discussion on this subject, I am afraid this is not going to happen anytime soon. Alas, we will stay with SQL Server then (or maybe MySQL, but I will have to investigate).
Met vriendelijke groet,
Marcel van Pinxteren
--------------------------
Volg onze bouw op http://nieuwekampen.blogspot.com
Marcel van Pinxteren
--------------------------
Volg onze bouw op http://nieuwekampen.blogspot.com
On Wed, Jan 16, 2013 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> writes:Not from the Postgres project -- we just use the collations supplied by
> Therefore the question: are there plans to create a set of case
> insensitive, and maybe also accent insensitive collations in the near
> future?
the operating system.
regards, tom lane
On 2013-01-16, Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> wrote: > --90e6ba6140da259e8204d36d0fa3 > Content-Type: text/plain; charset=ISO-8859-1 > > From the Microsoft site I learned > http://msdn.microsoft.com/en-us/library/ms188046(v=sql.105).aspx > that they combine collation and "ComparisonStyle" to a collation name. > > I thought that case insensitivity had to be built into the collation, but > apparently MS built case sensitivity in the database engine. > This would mean that Postgresql would need to build case (in)sensitivity > into her engine as well. what result are you tring to get? -- ⚂⚃ 100% natural
Desired behaviour:
1. If there is a row with 'ABC' (in a unique column) in the table, a row with 'abc' should not be allowed
2. If I do SELECT * FROM aTable WHERE aColumn = 'ABC', I should see a row with 'abc' as well (if there is one in the table)
This has been described in this mailing list a few months ago, in more detail.
You could look into running the DB on an OS that does support case insensitive collation. It'll likely perform better too.
On 16 January 2013 20:40, Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> wrote:
From the Microsoft site I learnedthat they combine collation and "ComparisonStyle" to a collation name.I thought that case insensitivity had to be built into the collation, but apparently MS built case sensitivity in the database engine.This would mean that Postgresql would need to build case (in)sensitivity into her engine as well.Judging from the small amount of discussion on this subject, I am afraid this is not going to happen anytime soon. Alas, we will stay with SQL Server then (or maybe MySQL, but I will have to investigate).Met vriendelijke groet,
Marcel van Pinxteren
--------------------------
Volg onze bouw op http://nieuwekampen.blogspot.comOn Wed, Jan 16, 2013 at 8:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> writes:Not from the Postgres project -- we just use the collations supplied by
> Therefore the question: are there plans to create a set of case
> insensitive, and maybe also accent insensitive collations in the near
> future?
the operating system.
regards, tom lane
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
Marcel van Pinxteren, 18.01.2013 14:13: > Desired behaviour: > 1. If there is a row with 'ABC' (in a unique column) in the table, a row with 'abc' should not be allowed That's an easy one: create unique index on foo (lower(the_column));
On Fri, Jan 18, 2013 at 6:13 AM, Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> wrote:
Desired behaviour:1. If there is a row with 'ABC' (in a unique column) in the table, a row with 'abc' should not be allowed2. If I do SELECT * FROM aTable WHERE aColumn = 'ABC', I should see a row with 'abc' as well (if there is one in the table)This has been described in this mailing list a few months ago, in more detail.
Have you seen the citext module?: http://www.postgresql.org/docs/current/static/citext.html.
It does pretty much that (albeit by having columns be the citext type instead of text or varchar).
As I mentioned in my original post, I don't want to use citext or lower().
I tested on Windows, but as I mentioned in one of my first posts, collation and case sensitivity are separate things.
With this, we are back at the beginning of the circle, so I'll leave it there.
Maybe I'll check back in a year or so, to see if case insensitivity has been implemented.
Good luck
Marcel van Pinxteren, 21.01.2013 13:22: > As I mentioned in my original post, I don't want to use citext or lower(). Why not for the unique index/constraint?
On Mon, Jan 21, 2013 at 5:22 AM, Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> wrote: > As I mentioned in my original post, I don't want to use citext or lower(). > I tested on Windows, but as I mentioned in one of my first posts, collation > and case sensitivity are separate things. Wait, is there an actual reason for not using it, or you just don't want to? I don't think the postgresql hackers are gonna find "I don't want to do it that way" a compelling reason to work on your issue.
To be honest, the reason I don't want to use citext and lower(), is me being lazy. If I have to use these features, there is more work for me converting from SQL Server to Postgresql. I have to make more changes to my database, and more to my software.
But, developers are generally lazy, so you could argue that this reason is "compelling".
The other reason, is that I assume that "lower()" adds overhead, so makes things slower than they need to be.
Whether that is true, and if that is a compelling reason, I don't know.
Whether that is true, and if that is a compelling reason, I don't know.
Marcel
On 21 January 2013 17:25, Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> wrote:
Case insensitive collation adds overhead too. It wouldn't surprise me if that were more than lower() adds - collation is complicated stuff.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
The other reason, is that I assume that "lower()" adds overhead, so makes things slower than they need to be.
Whether that is true, and if that is a compelling reason, I don't know.
--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.
On Mon, Jan 21, 2013 at 9:25 AM, Marcel van Pinxteren <marcel.van.pinxteren@gmail.com> wrote: > To be honest, the reason I don't want to use citext and lower(), is me being > lazy. If I have to use these features, there is more work for me converting > from SQL Server to Postgresql. I have to make more changes to my database, > and more to my software. > But, developers are generally lazy, so you could argue that this reason is > "compelling". > The other reason, is that I assume that "lower()" adds overhead, so makes > things slower than they need to be. > Whether that is true, and if that is a compelling reason, I don't know. Honestly as a lazy DBA I have to say it'd be pretty easy to write a script to convert any unique text index into a unique text index with a upper() in it. As another poster added, collation ain't free either. I'd say you should test it to see. My experience tells me that having an upper() (or lower()) index is not a big performance hit. If the storage of the index would be too much due to large text fields then make it a md5(lower()) index, which WILL cost more CPU wise, but allow for > 3k or so of text in a column to be indexed and cost less IO wise.
Scott Marlowe wrote: > Honestly as a lazy DBA I have to say it'd be pretty easy to write a > script to convert any unique text index into a unique text index with > a upper() in it. As another poster added, collation ain't free > either. I'd say you should test it to see. My experience tells me > that having an upper() (or lower()) index is not a big performance > hit. If the storage of the index would be too much due to large text > fields then make it a md5(lower()) index, which WILL cost more CPU > wise, but allow for > 3k or so of text in a column to be indexed and > cost less IO wise. Depending on what sort of search you want to do, it might be dead simple to use tsearch2 (which is case insensitive) or trigram indexing (for which a similarity search is case insensitive). -Kevin
Marcel van Pinxteren wrote on 21.01.2013 17:25: > The other reason, is that I assume that "lower()" adds overhead It won't add any noticeable overhead for the unique index.
On Mon, Jan 21, 2013 at 1:45 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Mon, Jan 21, 2013 at 9:25 AM, Marcel van Pinxteren > <marcel.van.pinxteren@gmail.com> wrote: >> To be honest, the reason I don't want to use citext and lower(), is me being >> lazy. If I have to use these features, there is more work for me converting >> from SQL Server to Postgresql. I have to make more changes to my database, >> and more to my software. >> But, developers are generally lazy, so you could argue that this reason is >> "compelling". >> The other reason, is that I assume that "lower()" adds overhead, so makes >> things slower than they need to be. >> Whether that is true, and if that is a compelling reason, I don't know. > > Honestly as a lazy DBA I have to say it'd be pretty easy to write a > script to convert any unique text index into a unique text index with > a upper() in it. But changing the application to create queries with upper() in the queries could be very hard. And without that, the function index would be useless. That objection doesn't apply to citext. I don't know what object to that is. Cheers, Jeff