Thread: PostgreSQL Gotchas
http://sql-info.de/postgresql/postgres-gotchas.html Any comments from folks on the list ? Cheers, Aly. -- Aly S.P Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject"
Compared to MySQL ditching referential integrity because of a typo, I would consider these 'gotchas' extremely minor, hence the reason I use Postgresql not MySQL. Postgresql does what you expect from an RDBMS system out of the box in 99.99% of cases. I don't have to toggle things on special like, or install additional modules, or check every time I create a foreign key that the system will actualy enforce it.
In short Postgresql makes less work for a DBA than MySQL when you actualy care about your data. It's cleaner, faster and easier to deal with in all cases of real world database applications.
I'm not even gonna talk about the query planner in MySQL trying to deal with an eight way join, let's just say that if you plan on using anything even near third normal, MySQL will dog. Oh - yeah - and the fact that it doesn't scale...
If those 'gotchas' are all one has against Postgresql... I don't know why people are still even using Oracle or DB2 ;) MySQL you say, I thought we were talking about RDBMSes here.
MySQL is to linux, what Jet is to Windows IMHO, oh wait - Jet has foreign keys by default...
Alex
In short Postgresql makes less work for a DBA than MySQL when you actualy care about your data. It's cleaner, faster and easier to deal with in all cases of real world database applications.
I'm not even gonna talk about the query planner in MySQL trying to deal with an eight way join, let's just say that if you plan on using anything even near third normal, MySQL will dog. Oh - yeah - and the fact that it doesn't scale...
If those 'gotchas' are all one has against Postgresql... I don't know why people are still even using Oracle or DB2 ;) MySQL you say, I thought we were talking about RDBMSes here.
MySQL is to linux, what Jet is to Windows IMHO, oh wait - Jet has foreign keys by default...
Alex
On 10/6/05, Aly S.P Dharshi <aly.dharshi@telus.net> wrote:
http://sql-info.de/postgresql/postgres-gotchas.html
Any comments from folks on the list ?
Cheers,
Aly.
--
Aly S.P Dharshi
aly.dharshi@telus.net
"A good speech is like a good dress
that's short enough to be interesting
and long enough to cover the subject"
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
This sure sounds like a flamewar bait email? On Oct 6, 2005, at 11:07 AM, Aly S.P Dharshi wrote: > http://sql-info.de/postgresql/postgres-gotchas.html > > Any comments from folks on the list ? > > Cheers, > > Aly. > > -- > Aly S.P Dharshi > aly.dharshi@telus.net > > "A good speech is like a good dress > that's short enough to be interesting > and long enough to cover the subject" > > ---------------------------(end of > broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Gavin M. Roy 800 Pound Gorilla gmr@ehpg.net
No flamewar here, I am just trying to see if opinions of others on this, as Jim had posted a MySQL one, and that there was a PostgreSQL one, so I wanted to see if these are valid, if they aren't then that site should be updated to reflect this. Cheers, Aly. On Thu, 6 Oct 2005, Gavin M. Roy wrote: > This sure sounds like a flamewar bait email? > > On Oct 6, 2005, at 11:07 AM, Aly S.P Dharshi wrote: > >> http://sql-info.de/postgresql/postgres-gotchas.html >> >> Any comments from folks on the list ? >> >> Cheers, >> >> Aly. >> >> -- Aly S.P Dharshi >> aly.dharshi@telus.net >> >> "A good speech is like a good dress >> that's short enough to be interesting >> and long enough to cover the subject" >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match >> > > Gavin M. Roy > 800 Pound Gorilla > gmr@ehpg.net > > -- Aly S.P Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject"
Then wouldn't it be more appropriate to ask: What's the status of the things listed on the PostgreSQL gotchas. Are they bugs? Are they valid? Have the been resolved? What does the community thing of those gotchas? I personally don't take an open ended question like "URL: Any comments?" as trying to be productive. But I could also just be burnt out on the whole PgSQL vs MySQL flame war or reading too much into what I appraised to be terse email. If so I apologize. Regards, Gavin On Oct 6, 2005, at 12:01 PM, Aly S.P Dharshi wrote: > No flamewar here, I am just trying to see if opinions of others on > this, as Jim had posted a MySQL one, and that there was a > PostgreSQL one, so I wanted to see if these are valid, if they > aren't then that site should be updated to reflect this. > > Cheers, > > Aly. > > On Thu, 6 Oct 2005, Gavin M. Roy wrote: > > >> This sure sounds like a flamewar bait email? >> >> On Oct 6, 2005, at 11:07 AM, Aly S.P Dharshi wrote: >> >> >>> http://sql-info.de/postgresql/postgres-gotchas.html >>> Any comments from folks on the list ? >>> Cheers, >>> Aly. >>> -- Aly S.P Dharshi >>> aly.dharshi@telus.net >>> >>> "A good speech is like a good dress >>> that's short enough to be interesting >>> and long enough to cover the subject" >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 9: In versions below 8.0, the planner will ignore your desire to >>> choose an index scan if your joining column's datatypes do not >>> match >>> >> >> Gavin M. Roy >> 800 Pound Gorilla >> gmr@ehpg.net >> >> >> > > -- > Aly S.P Dharshi > aly.dharshi@telus.net > > "A good speech is like a good dress > that's short enough to be interesting > and long enough to cover the subject" >
They're all valid, but most apply to versions < 8.0 or < 7.4 even, and the others are pretty esoteric issues that you don't see often. The missing from clause thing is likely to be the biggest surprise most folks run into. I find the supposed bad performance of aggregates is bunk. On my workstation (not even a server, just my personal workstation with all kinds of other crap running on it) I get a response time of about 1 second for a count(*) in version 7.4 of postgresql: select count(*) from locators; count --------- 1000000 (1 row) takes anywhere from .75 to 1.0 second. The same dataset, on oracle takes 0.75 seconds on a very fast oracle server, and anywhere from 5 to 20 seconds on a slower server, like one equivalent to my workstation. On Thu, 2005-10-06 at 14:01, Aly S.P Dharshi wrote: > No flamewar here, I am just trying to see if opinions of others on this, > as Jim had posted a MySQL one, and that there was a PostgreSQL one, so I > wanted to see if these are valid, if they aren't then that site should be > updated to reflect this. > > Cheers, > > Aly. > > On Thu, 6 Oct 2005, Gavin M. Roy wrote: > > > This sure sounds like a flamewar bait email? > > > > On Oct 6, 2005, at 11:07 AM, Aly S.P Dharshi wrote: > > > >> http://sql-info.de/postgresql/postgres-gotchas.html > >> > >> Any comments from folks on the list ? > >> > >> Cheers, > >> > >> Aly. > >> > >> -- Aly S.P Dharshi > >> aly.dharshi@telus.net > >> > >> "A good speech is like a good dress > >> that's short enough to be interesting > >> and long enough to cover the subject" > >> > >> ---------------------------(end of broadcast)--------------------------- > >> TIP 9: In versions below 8.0, the planner will ignore your desire to > >> choose an index scan if your joining column's datatypes do not > >> match > >> > > > > Gavin M. Roy > > 800 Pound Gorilla > > gmr@ehpg.net > > > >
Okay that is a fair statement to make, hence restating: - What is the status of those items listed on the PostgreSQL gotchas - Are they bugs ? - Are they valid statements ? - If they are bugs are they resolved ? - What does the PG community thing of this list of gotchas ? http://sql-info.de/postgresql/postgres-gotchas.html On Thu, 6 Oct 2005, Gavin M. Roy wrote: > Then wouldn't it be more appropriate to ask: > > What's the status of the things listed on the PostgreSQL gotchas. Are they > bugs? Are they valid? Have the been resolved? What does the community thing > of those gotchas? > > I personally don't take an open ended question like "URL: Any comments?" as > trying to be productive. But I could also just be burnt out on the whole > PgSQL vs MySQL flame war or reading too much into what I appraised to be terse > email. If so I apologize. > > Regards, > > Gavin -- Aly S.P Dharshi aly.dharshi@telus.net "A good speech is like a good dress that's short enough to be interesting and long enough to cover the subject"
On Thu, Oct 06, 2005 at 12:07:12PM -0600, Aly S.P Dharshi wrote: > http://sql-info.de/postgresql/postgres-gotchas.html > > Any comments from folks on the list ? The implicit from and mildly non-standard case folding are the only two that approach being real issues people may stumble over, IMO. If you look at the others that may be problematic in real use they're mostly for archaic versions of PG. The others that are still valid with 8.0 are... not so much problems with PG as differences in the way PG works that users should be aware of. Cheers, Steve
Don't think so. The author sounds like a PostgreSQL proponent to me. It also sounds like most of the issues have been addressed with recent builds. > -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Gavin M. Roy > Sent: Thursday, October 06, 2005 11:40 AM > To: Aly S.P Dharshi > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL Gotchas > > This sure sounds like a flamewar bait email? > > On Oct 6, 2005, at 11:07 AM, Aly S.P Dharshi wrote: > > > http://sql-info.de/postgresql/postgres-gotchas.html > > > > Any comments from folks on the list ? > > > > Cheers, > > > > Aly. > > > > -- > > Aly S.P Dharshi > > aly.dharshi@telus.net > > > > "A good speech is like a good dress > > that's short enough to be interesting > > and long enough to cover the subject" > > > > ---------------------------(end of > > broadcast)--------------------------- > > TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > > > Gavin M. Roy > 800 Pound Gorilla > gmr@ehpg.net > > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster
Sorry.
Alex
Alex
On 10/6/05, Gavin M. Roy <gmr@ehpg.net> wrote:
This sure sounds like a flamewar bait email?
On Oct 6, 2005, at 11:07 AM, Aly S.P Dharshi wrote:
> http://sql-info.de/postgresql/postgres-gotchas.html
>
> Any comments from folks on the list ?
>
> Cheers,
>
> Aly.
>
> --
> Aly S.P Dharshi
> aly.dharshi@telus.net
>
> "A good speech is like a good dress
> that's short enough to be interesting
> and long enough to cover the subject"
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
> choose an index scan if your joining column's datatypes do not
> match
>
Gavin M. Roy
800 Pound Gorilla
gmr@ehpg.net
---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster
> On 10/6/05, Aly S.P Dharshi <aly ( dot ) dharshi ( at ) telus ( dot ) net> wrote: > > http://sql-info.de/postgresql/postgres-gotchas.html > > Any comments from folks on the list ? - It's a lot shorter than MySQL's gotchas list. - 8 of the 13 are for versions of PostgreSQL <= 8.1 - Of the remaining, I consider "select as" to be really trivial (and it appears a work-around can be hacked). - lowercase folding. I DO sometimes wish I could use fieldID, etc. without quoting it. - I've never found count(*) to be slow. - I don't know enough about the "UNICODE means "UTF-8"" and "RANDOM() failures" to comment. CSN __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
On Thu, Oct 06, 2005 at 12:07:12PM -0600, Aly S.P Dharshi wrote: > http://sql-info.de/postgresql/postgres-gotchas.html > > Any comments from folks on the list ? Several. First, it looks to me like this is a pretty transparent attempt to troll, so I'm not going to go there. Second, if you actually go through that list, you find: * Problems fixed in the last version or three of PostgreSQL. This is an incentive to upgrade, not a gotcha. * Problems that will be fixed in the next version of PostgreSQL. This means that problems get on developers' radar and get fixed. I suppose by some extremely un-generous method of assessment, this could be a gotcha. * Incomplete information, e.g. not mentioning Slony-I as an upgrade path. * One deviation from the standard that won't be changed: fold-to-lower. * Nits so minor as not to be worth addressing (non-optional AS in FROM clauses vs. SQL standard's making that AS optional there) Third, there are real gotchas, but those are generally reported as bugs on pgsql-bugs and acted upon as such. Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
--- CSN <cool_screen_name90001@yahoo.com> wrote: > - 8 of the 13 are for versions of PostgreSQL <= 8.1 Doh! - 8 of the 13 are for versions of PostgreSQL < 8.1! __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
On Thu, 2005-06-10 at 12:07 -0600, Aly S.P Dharshi wrote: > http://sql-info.de/postgresql/postgres-gotchas.html > > Any comments from folks on the list ? "SELECT column alias, ...": this is a known issue. AFAIK it is not easy to solve. "Unquoted object names fold to lower case": this is intentional, both because the developers prefer this behavior and because it is consistent with the behavior of prior PostgreSQL versions. "Implicit FROM item and unintended cross joins": fixed in 8.1, as the gotcha notes. "COUNT(*) very slow": this is a known issue -- see the -hackers archives for many prior discussions. MVCC makes this hard to solve effectively (whether applications should actually be using COUNT(*) on large tables with no WHERE clause is another matter...) -Neil
CSN wrote: >- I don't know enough about the "UNICODE means >"UTF-8"" and "RANDOM() failures" to comment. > > I'm hardly an expert, but I've done enough with unicode to know that you can easily convert utf-8 to any other flavor of unicode you might want to use. Though, why you'd want to use something other than utf-8 in the first place I don't understand.
* Aly S.P Dharshi (aly.dharshi@telus.net) wrote: > - What is the status of those items listed on the PostgreSQL gotchas > - Are they bugs ? > - Are they valid statements ? > - If they are bugs are they resolved ? > - What does the PG community thing of this list of gotchas ? > > http://sql-info.de/postgresql/postgres-gotchas.html There's only 6 things on there which apply to the current version of PostgreSQL, and one of those will be fixed in 8.1. The remaining items are: SELECT column alias,... This issue looks like it has probably been hashed over a few times and at the end of the day the answer is "Best to use AS for clarity anyway, and it is documented to be necessary, and Postgres upstream has decided to require it instead of making alot of things reserved which wouldn't otherwise need to be." Unquoted object names fold to lower case I don't really see this as too much of an issue, personally, but I do know some people have run into it. The example they give seems a bit off tho, as I thought Oracle just folded to upper-case (in which case "table" != table in Oracle, but "table" = table in Postgres, kind of depends on how you look at it). I'm not sure the spec says one way or another. I've had problems with this when people were building things against a MySQL database w/o proper quoting and assuming it was case-sensitive when under Postgres it wasn't. RANDOM() failures This looks like an interesting problem, and might be a bug to be fixed. I think it depends on the what the SQL specification says for this case, if it says anything at all, as to if it should be considered a bug or not. On the surface it certainly does look like an oddity though. COUNT(*) very slow As someone else has pointed out, it's only slow if you've got a large dataset. There's certainly workarounds for this issue (generally involving a couple of functions for keeping track of the number of rows). In this way, Postgres actually gives the user the flexibility to choose the performance loss on insert/deletes which is necessary to track the number of rows seperately, or not to. UNICODE means "UTF-8" This is an interesting point. To be perfectly honest, it seems like Postgres' UNICODE/UTF-8/etc support could stand to be somewhat better. I've not used it much myself but I do see comments about it on the lists from time to time. I think there was some work done on this for 8.1 and I expect support will only continue to improve in these areas. Just my 2c. Thanks, Stephen
Attachment
Stephen Frost wrote: > UNICODE means "UTF-8" > This is an interesting point. To be perfectly honest, it seems like > Postgres' UNICODE/UTF-8/etc support could stand to be somewhat better. > I've not used it much myself but I do see comments about it on the > lists from time to time. I think there was some work done on this > for 8.1 and I expect support will only continue to improve in these > areas. Yes, this is completely fixed in 8.1, except we allow UNICODE to be a synonym for UTF-8. We also added Win32 support for UTF8 in 8.1. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Thu, Oct 06, 2005 at 01:19:33PM -0600, Aly S.P Dharshi wrote: > Okay that is a fair statement to make, hence restating: > > - What is the status of those items listed on the PostgreSQL gotchas Some of them are for sure gotchas. For users of tranditionally-non-toy (as opposed to recently-non-toy) systems, I'd think that there are three serious issues: the column alias (because it strictly violates the SQL definition), the folding to lower case (same reason), and the UNICODE==UTF-8 issue (because it violates the UNICODE definition). > - Are they valid statements ? > - If they are bugs are they resolved ? Some of them are valid and resolved (the site actually notes as much in many cases). One thing I do note is that there's a workaround for the have-to-dump issue: you can use Slony to upgrade (by design). It won't work for everyone, but it's a big step up in many cases. A -- Andrew Sullivan | ajs@crankycanuck.ca "The year's penultimate month" is not in truth a good way of saying November. --H.W. Fowler
On Thu, Oct 06, 2005 at 12:29:51PM -0700, David Fetter wrote: > * Problems that will be fixed in the next version of PostgreSQL. > This means that problems get on developers' radar and get fixed. I > suppose by some extremely un-generous method of assessment, this > could be a gotcha. > > * Incomplete information, e.g. not mentioning Slony-I as an upgrade > path. From the few times I've emailed the author he seems happy to accept better text. I'm sure he'd be happy to mention slony as an upgrade option (though it would be best if there was a good doc describing this we could point to). Likewise I believe he'd note things that will be fixed in a given version (such as 8.1 or 8.2). > * One deviation from the standard that won't be changed: fold-to-lower. Does the standard even specify which case you fold to? I agree, this is only a gotcha if you're used to the very non-standard behavior in MySQL. > * Nits so minor as not to be worth addressing (non-optional AS in FROM > clauses vs. SQL standard's making that AS optional there) *shrug* I think it's useful to be able to see why something you might expect to work doesn't. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Oct 06, 2005 at 04:18:03PM -0400, Stephen Frost wrote: > COUNT(*) very slow > As someone else has pointed out, it's only slow if you've got a large > dataset. There's certainly workarounds for this issue (generally > involving a couple of functions for keeping track of the number of > rows). In this way, Postgres actually gives the user the flexibility > to choose the performance loss on insert/deletes which is necessary to > track the number of rows seperately, or not to. Actually, I think the real issue here is that unlike every other database I've used, PostgreSQL can't do covering index scans, where only the index is read and not the base table. This is due to not having MVCC visibility info in the index. But there has been discussion on -hackers about a way to get close to this behavior; something along the lines of setting a bit in the index once a tuple is visible to all running transactions. This would be a win on larger indexes that don't have a lot of insert/update/delete activity. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Oct 06, 2005 at 02:30:53PM -0400, Alex Turner wrote: > MySQL is to linux, what Jet is to Windows IMHO, oh wait - Jet has foreign > keys by default... MySQL is the WindowsME of databases <- first hit searching for MySQL on CafePress. :) -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Thu, Oct 06, 2005 at 05:29:14PM -0500, Jim C. Nasby wrote: > On Thu, Oct 06, 2005 at 12:29:51PM -0700, David Fetter wrote: > > * Problems that will be fixed in the next version of PostgreSQL. > > This means that problems get on developers' radar and get fixed. > > I suppose by some extremely un-generous method of assessment, this > > could be a gotcha. > > > > * Incomplete information, e.g. not mentioning Slony-I as an > > upgrade path. > > From the few times I've emailed the author he seems happy to accept > better text. I'm sure he'd be happy to mention slony as an upgrade > option (though it would be best if there was a good doc describing > this we could point to). Likewise I believe he'd note things that > will be fixed in a given version (such as 8.1 or 8.2). I'll draft something up :) > > * One deviation from the standard that won't be changed: fold-to-lower. > > Does the standard even specify which case you fold to? It specifies fold-to-upper, and that's just wrong. :P > I agree, this is only a gotcha if you're used to the very > non-standard behavior in MySQL. > > > * Nits so minor as not to be worth addressing (non-optional AS in > > FROM clauses vs. SQL standard's making that AS optional there) > > *shrug* I think it's useful to be able to see why something you > might expect to work doesn't. I suppose it's good to have those. There are other things I think of as bugs, and would like to have addressed internally before I send them off to gotchas land. PG's incomplete support for DOMAINs, for example, or the not-quite-it handling of composite types, or things that are missing (imho) important features... Cheers, D -- David Fetter david@fetter.org http://fetter.org/ phone: +1 510 893 6100 mobile: +1 415 235 3778 Remember to vote!
Neil Conway wrote: > >"COUNT(*) very slow": this is a known issue -- see the -hackers archives >for many prior discussions. MVCC makes this hard to solve effectively >(whether applications should actually be using COUNT(*) on large tables >with no WHERE clause is another matter...) > >-Neil > > And it's not like a count(*) on an Oracle database of any decently-sized dataset is blazing fast, or even in blazing's ballpark. The only thing I could see actually being an issue is the random() one and add missing from. The rest are trivial. The random() thing is interesting, esoteric, and probably has never been a problem in a real situation. (Or has exactly once, when he wrote that gotcha) Jeff
On Thu, Oct 06, 2005 at 12:54:43PM -0700, CSN wrote: > - lowercase folding. I DO sometimes wish I could use > fieldID, etc. without quoting it. I believe that may be against ANSI SQL. In any case, the only databases I can think of that don't fold-case in some form are MySQL and Access. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
Yep, I think the SQL spec says fold to uppercase. I'm not sure why PostgreSQL folds to lowercase instead, but if folding has to occur, I prefer lowercase. CSN --- "Jim C. Nasby" <jnasby@pervasive.com> wrote: > On Thu, Oct 06, 2005 at 12:54:43PM -0700, CSN wrote: > > - lowercase folding. I DO sometimes wish I could > use > > fieldID, etc. without quoting it. > > I believe that may be against ANSI SQL. In any case, > the only databases > I can think of that don't fold-case in some form are > MySQL and Access. > -- > Jim C. Nasby, Sr. Engineering Consultant > jnasby@pervasive.com > Pervasive Software http://pervasive.com > work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf > cell: 512-569-9461 > __________________________________ Yahoo! Mail - PC Magazine Editors' Choice 2005 http://mail.yahoo.com
Stephen Frost <sfrost@snowman.net> writes: > Unquoted object names fold to lower case > I don't really see this as too much of an issue, personally, but I do > know some people have run into it. The example they give seems a bit > off tho, as I thought Oracle just folded to upper-case (in which case > "table" != table in Oracle, but "table" = table in Postgres, kind of > depends on how you look at it). I'm not sure the spec says one way or > another. Yeah, it does, and the spec sides with Oracle. This is explained exactly in the manual: : Quoting an identifier also makes it case-sensitive, whereas unquoted : names are always folded to lower case. For example, the identifiers FOO, : foo, and "foo" are considered the same by PostgreSQL, but "Foo" and : "FOO" are different from these three and each other. (The folding of : unquoted names to lower case in PostgreSQL is incompatible with the SQL : standard, which says that unquoted names should be folded to upper : case. Thus, foo should be equivalent to "FOO" not "foo" according to the : standard. If you want to write portable applications you are advised to : always quote a particular name or never quote it.) > RANDOM() failures > This looks like an interesting problem, and might be a bug to be > fixed. I think it depends on the what the SQL specification says for > this case, if it says anything at all, as to if it should be > considered a bug or not. On the surface it certainly does look like > an oddity though. Well, it's surely legitimate material as a "gotcha". The example is taken from http://archives.postgresql.org/pgsql-general/2004-11/msg01375.php and the "previous discussion" referred to is this thread: http://archives.postgresql.org/pgsql-hackers/2004-10/msg00082.php As you can see, there wasn't a lot of agreement that we ought to change it. Arguably, if we did change it we'd get ripped for the "gotcha" of poor optimization when the user forgets to mark nonvolatile functions properly. (Personally, though, I'm in favor of tightening it up.) regards, tom lane
On Thu, Oct 06, 2005 at 05:47:36PM -0500, Jeffrey Melloy wrote: > The only thing I could see actually being an issue is the random() one > and add missing from. The rest are trivial. The random() thing is > interesting, esoteric, and probably has never been a problem in a real > situation. (Or has exactly once, when he wrote that gotcha) The random() issue has a workaround that the gotchas page doesn't mention: SELECT id, is_true FROM (SELECT id, RANDOM() < 0.5 AS is_true FROM some_table OFFSET 0) AS t_tmp WHERE is_true; Tom Lane brought up the OFFSET 0 trick a couple of days ago in the "Avoiding evaluating functions twice" thread: http://archives.postgresql.org/pgsql-general/2005-10/msg00107.php -- Michael Fuhr
On 10/7/05, Michael Fuhr <mike@fuhr.org> wrote: > On Thu, Oct 06, 2005 at 05:47:36PM -0500, Jeffrey Melloy wrote: > > The only thing I could see actually being an issue is the random() one > > and add missing from. The rest are trivial. The random() thing is > > interesting, esoteric, and probably has never been a problem in a real > > situation. (Or has exactly once, when he wrote that gotcha) > > The random() issue has a workaround that the gotchas page doesn't > mention: > > SELECT id, is_true > FROM (SELECT id, RANDOM() < 0.5 AS is_true FROM some_table OFFSET 0) AS t_tmp > WHERE is_true; > > Tom Lane brought up the OFFSET 0 trick a couple of days ago in the > "Avoiding evaluating functions twice" thread: > > http://archives.postgresql.org/pgsql-general/2005-10/msg00107.php Thanks, I've updated the list. I'd like to take the opportunity to point out that following the original, unexpected "success" of the MySQL gotchas list, I created one for PostgreSQL for the sake of "balance". Though I'm really having to scrape the barrel for material ;-). The MySQL list is a little outdated; I'm going through it with updates for version 5. Ian Barwick
CSN wrote: > Yep, I think the SQL spec says fold to uppercase. I'm > not sure why PostgreSQL folds to lowercase instead, > but if folding has to occur, I prefer lowercase. I think preference was why lowercase was chosen many moons ago. It's stayed that way because otherwise existing users would be storming the -hackers list with pitchforks and flaming torches. -- Richard Huxton Archonet Ltd
On Thu, Oct 06, 2005 at 07:29:44PM -0400, Tom Lane wrote: > Well, it's surely legitimate material as a "gotcha". The example is > taken from > http://archives.postgresql.org/pgsql-general/2004-11/msg01375.php > and the "previous discussion" referred to is this thread: > http://archives.postgresql.org/pgsql-hackers/2004-10/msg00082.php > > As you can see, there wasn't a lot of agreement that we ought to change > it. Arguably, if we did change it we'd get ripped for the "gotcha" of > poor optimization when the user forgets to mark nonvolatile functions > properly. (Personally, though, I'm in favor of tightening it up.) Aren't there a good number of performance issues if you mis-mark a function? In any case, ISTM it'd be much better to perform poorly rather than give bad/wrong results. I don't really see much dissention there, so I'd vote for making the change for 8.2. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
On Fri, Oct 07, 2005 at 08:44:34AM +0100, Richard Huxton wrote: > CSN wrote: > >Yep, I think the SQL spec says fold to uppercase. I'm > >not sure why PostgreSQL folds to lowercase instead, > >but if folding has to occur, I prefer lowercase. > > I think preference was why lowercase was chosen many moons ago. It's > stayed that way because otherwise existing users would be storming the > -hackers list with pitchforks and flaming torches. If any change was made I'm sure it would be to allow the user to decide which way case was folded. But IMHO, anyone messing around with object names that won't fold is asking for trouble anyway. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
I use PostgeSQL less than year. Before I worked with MS SQL Server 2000, MySQL 3&4, Oracle8i and Interbase. Also, I studied standards SQL:1999 and SQL:2003. So, after switching to PostgreSQL I've encountered with several things that seem strange to me. Of course, several of them are simply not implemented yet and are in the list of unsopported features: http://www.postgresql.org/docs/8.0/interactive/unsupported-features-sql-standard.html. But some seem to be 'old diseases'. Here is my list of the issues: http://chernowiki.ru/Dev/PostgreSQLComparedWithSQL2003AndOracleSQLServerDB2Etc Perhaps I'm wrong with some issues - any comments are welcome. On 08/10/05, Jim C. Nasby <jnasby@pervasive.com> wrote: > On Fri, Oct 07, 2005 at 08:44:34AM +0100, Richard Huxton wrote: > > CSN wrote: > > >Yep, I think the SQL spec says fold to uppercase. I'm > > >not sure why PostgreSQL folds to lowercase instead, > > >but if folding has to occur, I prefer lowercase. > > > > I think preference was why lowercase was chosen many moons ago. It's > > stayed that way because otherwise existing users would be storming the > > -hackers list with pitchforks and flaming torches. > > If any change was made I'm sure it would be to allow the user to decide > which way case was folded. But IMHO, anyone messing around with object > names that won't fold is asking for trouble anyway. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- Best regards, Nikolay
On Sat, Oct 08, 2005 at 03:32:00PM +0400, Nikolay Samokhvalov wrote: > http://chernowiki.ru/Dev/PostgreSQLComparedWithSQL2003AndOracleSQLServerDB2Etc > > Perhaps I'm wrong with some issues - any comments are welcome. The only thing I can comment on is updatable views. You can make updatable views using RULEs. The only thing is that they're not *automatically* updateable. In theory, if someone came up will a program that from given <view definition> produced the appropriate INSERT, UPDATE and DELETE rules, it might be incorporated. Currently you just have to do it manually... Case-insensetive text comparisons can be acheived using the citext module on gborg. http://gborg.postgresql.org/project/citext/projdisplay.php Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On 08/10/05, Martijn van Oosterhout <kleptog@svana.org> wrote: > On Sat, Oct 08, 2005 at 03:32:00PM +0400, Nikolay Samokhvalov wrote: > > http://chernowiki.ru/Dev/PostgreSQLComparedWithSQL2003AndOracleSQLServerDB2Etc > > > > Perhaps I'm wrong with some issues - any comments are welcome. > > The only thing I can comment on is updatable views. You can make > updatable views using RULEs. The only thing is that they're not > *automatically* updateable. OK, I'll make this correction. But for me, updatable views are views for which DBMS supports insert/update/delete operations as for tables. Ideally, people shouldn't distinguish table and view - that's what theory stands for (see Date's thoutghs about it: http://www.dbmsmag.com/int9410.html, he also has a cycle of articles on this theme: http://www.dbdebunk.citymax.com/page/page/622302.htm). PostgreSQL doesn't support updates even for simple views such as select-with-restriction. What it does support - not updatable views, but some kind of INSTEAD OFF triggers (another form of). > > In theory, if someone came up will a program that from given <view > definition> produced the appropriate INSERT, UPDATE and DELETE rules, > it might be incorporated. Currently you just have to do it manually... There is a good theory (Date), but it cannot be implemented for any practical DBMS. The cause lies in differences between theory and practice. And the major difference is possibility to define tables w/o PK (in other words, possible duplicate rows). Nevertheless, all major commercial RDMSs support some subset of views that can be updated.. SQL:2003 defines a quite large subset, but the definition is pretty mazy... > > Case-insensetive text comparisons can be acheived using the citext > module on gborg. > > http://gborg.postgresql.org/project/citext/projdisplay.php Thanks for the link, I'll try it. However, I suppose that such basic feature as support collations should be implemented in core. > > Have a nice day, > -- > Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > > tool for doing 5% of the work and then sitting around waiting for someone > > else to do the other 95% so you can sue them. > > > -- Best regards, Nikolay
On Sat, Oct 08, 2005 at 06:05:29PM +0400, Nikolay Samokhvalov wrote: > On 08/10/05, Martijn van Oosterhout <kleptog@svana.org> wrote: > > The only thing I can comment on is updatable views. You can make > > updatable views using RULEs. The only thing is that they're not > > *automatically* updateable. > > OK, I'll make this correction. But for me, updatable views are views > for which DBMS supports insert/update/delete operations as for > tables. Ideally, people shouldn't distinguish table and view - that's > what theory stands for (see Date's thoutghs about it: > http://www.dbmsmag.com/int9410.html, he also has a cycle of articles > on this theme: http://www.dbdebunk.citymax.com/page/page/622302.htm). > PostgreSQL doesn't support updates even for simple views such as > select-with-restriction. What it does support - not updatable views, > but some kind of INSTEAD OFF triggers (another form of). You've got me confused. What are INSTEAD OF triggers? PostgreSQL does support views that look exactly like tables. You can decide on INSERT what to do whith fields not in the view, which columns you allow UPDATE and what the semantics should be for DELETE if the view is a join on multiple tables. All PostgreSQL doesn't do is create these rules for you. For an example see here, all the way at the bottom. RULEs are not TRIGGERs: http://www.varlena.com/varlena/GeneralBits/82.php <snip> > PK (in other words, possible duplicate rows). Nevertheless, all major > commercial RDMSs support some subset of views that can be updated.. > SQL:2003 defines a quite large subset, but the definition is pretty > mazy... PostgreSQL allows any view to be updatable, no matter how complex it is. You just have to create the rules yourself. There have been attempts to automate the process, they just havn't been clean enough to pass muster. And people who really want updateable views can make them already. Hope this clarifies things for you, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Neil Conway wrote: > "SELECT column alias, ...": this is a known issue. AFAIK it is not > easy to solve. It's easy to solve. There's even a patch linked from the page. I seem to remember that some (versions of some?) other SQL implementation did not *allow* the AS there, complicating matters for people moving to PostgreSQL. -- Peter Eisentraut http://developer.postgresql.org/~petere/
On 10/8/05, Nikolay Samokhvalov <samokhvalov@gmail.com> wrote: > I use PostgeSQL less than year. Before I worked with MS SQL Server > 2000, MySQL 3&4, Oracle8i and Interbase. Also, I studied standards > SQL:1999 and SQL:2003. So, after switching to PostgreSQL I've > encountered with several things that seem strange to me. Of course, > several of them are simply not implemented yet and are in the list of > unsopported features: > http://www.postgresql.org/docs/8.0/interactive/unsupported-features-sql-standard.html. > But some seem to be 'old diseases'. > > Here is my list of the issues: > http://chernowiki.ru/Dev/PostgreSQLComparedWithSQL2003AndOracleSQLServerDB2Etc > > Perhaps I'm wrong with some issues - any comments are welcome. > From the referenced page... "We can insert into column of type TIME the value such as '12:15' and then obtain '12:15:00' making select, but we couldn't do the same with dates." Huh? Minutes and seconds have a valid value of zero. Months and days don't. Date types need to contain a valid date, and time types need to contain a valid time. Anything else seems like, well, MySQL.
Tom Lane wrote: >Stephen Frost <sfrost@snowman.net> writes: > > >>Unquoted object names fold to lower case >> I don't really see this as too much of an issue, personally, but I do >> know some people have run into it. The example they give seems a bit >> off tho, as I thought Oracle just folded to upper-case (in which case >> "table" != table in Oracle, but "table" = table in Postgres, kind of >> depends on how you look at it). I'm not sure the spec says one way or >> another. >> >> > >Yeah, it does, and the spec sides with Oracle. This is explained >exactly in the manual: > >: Quoting an identifier also makes it case-sensitive, whereas unquoted >: names are always folded to lower case. For example, the identifiers FOO, >: foo, and "foo" are considered the same by PostgreSQL, but "Foo" and >: "FOO" are different from these three and each other. (The folding of >: unquoted names to lower case in PostgreSQL is incompatible with the SQL >: standard, which says that unquoted names should be folded to upper >: case. Thus, foo should be equivalent to "FOO" not "foo" according to the >: standard. If you want to write portable applications you are advised to >: always quote a particular name or never quote it.) > > Any chance of adding a configuration option for future versions in order to set case folding behavior? It seems that PostgreSQL has really attempted to be as standards-compliant as possible and this is one area where improvement could be made without breaking backward compatibility... We already have configuration options for setting whether NULL = NULL returns TRUE or NULL, and a few other legacy options for preventing past non-standard behavior from breaking compatibility with current versions. > > > >>RANDOM() failures >> This looks like an interesting problem, and might be a bug to be >> fixed. I think it depends on the what the SQL specification says for >> this case, if it says anything at all, as to if it should be >> considered a bug or not. On the surface it certainly does look like >> an oddity though. >> >> > >Well, it's surely legitimate material as a "gotcha". The example is >taken from >http://archives.postgresql.org/pgsql-general/2004-11/msg01375.php >and the "previous discussion" referred to is this thread: >http://archives.postgresql.org/pgsql-hackers/2004-10/msg00082.php > >As you can see, there wasn't a lot of agreement that we ought to change >it. Arguably, if we did change it we'd get ripped for the "gotcha" of >poor optimization when the user forgets to mark nonvolatile functions >properly. (Personally, though, I'm in favor of tightening it up.) > > It all depends on which is worse? Incorrect results or bad performance on poorly written functions? I would side with bad performance any day if it was user error and could be fixed with the help of the people on the PERFORM list. If we didn't believe that, we would all be using MySQL, right? Best Wishes, Chris Travers Metatron Technology Consulting
On Thu, 2005-10-13 at 13:09, Chris Travers wrote: > Any chance of adding a configuration option for future versions in order > to set case folding behavior? It seems that PostgreSQL has really > attempted to be as standards-compliant as possible and this is one area > where improvement could be made without breaking backward compatibility... > > We already have configuration options for setting whether NULL = NULL > returns TRUE or NULL, and a few other legacy options for preventing past > non-standard behavior from breaking compatibility with current versions. The problem, as explained to me when I floated this idea, is that the CATALOGS are all in lower case, and many references to them are behind the case folding mechanism (i.e. they get accessed by their lower case names directly by backend software). So, one would also have to have some way of working around that.
Nikolay Samokhvalov wrote: >On 08/10/05, Martijn van Oosterhout <kleptog@svana.org> wrote: > > >>On Sat, Oct 08, 2005 at 03:32:00PM +0400, Nikolay Samokhvalov wrote: >> >> >>>http://chernowiki.ru/Dev/PostgreSQLComparedWithSQL2003AndOracleSQLServerDB2Etc >>> >>>Perhaps I'm wrong with some issues - any comments are welcome. >>> >>> >>The only thing I can comment on is updatable views. You can make >>updatable views using RULEs. The only thing is that they're not >>*automatically* updateable. >> >> > >OK, I'll make this correction. But for me, updatable views are views >for which DBMS supports insert/update/delete operations as for >tables. Ideally, people shouldn't distinguish table and view - that's >what theory stands for (see Date's thoutghs about it: >http://www.dbmsmag.com/int9410.html, he also has a cycle of articles >on this theme: http://www.dbdebunk.citymax.com/page/page/622302.htm). > > Who do you mean by "people?" Users? DBA's? Now, obviously the DBA will always be able to distinguish between them. Otherwise he/she wouldn't be much of a DBA would he/she? >PostgreSQL doesn't support updates even for simple views such as >select-with-restriction. What it does support - not updatable views, >but some kind of INSTEAD OFF triggers (another form of). > > Ok... Your complaint is that PostgreSQL doesn't support automatically generating insert/update/delete rules for views. Ultimately, however, this may not be done simply because there are other priorities which people are devoting their time and energy to instead. I know some people have put in some work on this item but I don't know what the status is. Also, what SQL Server 2003 calles a trigger, we call a rule. What we call a trigger is different and I am not sure SQL Server 2003 has such an equivalent. > > >>In theory, if someone came up will a program that from given <view >>definition> produced the appropriate INSERT, UPDATE and DELETE rules, >>it might be incorporated. Currently you just have to do it manually... >> >> > >There is a good theory (Date), but it cannot be implemented for any >practical DBMS. The cause lies in differences between theory and >practice. And the major difference is possibility to define tables w/o >PK (in other words, possible duplicate rows). Nevertheless, all major >commercial RDMSs support some subset of views that can be updated.. >SQL:2003 defines a quite large subset, but the definition is pretty >mazy... > > Again, this is likely doable. There are hidden fields that I suppose could be extended in a view to reference unique rows (maybe ctid since the whole thing is expanded in a single SQL statement, but I haven't tried it). Best Wishes, Chris Travers Metatron Technology Consulting
Chris Travers <chris@travelamericas.com> writes: > Any chance of adding a configuration option for future versions in order > to set case folding behavior? It seems that PostgreSQL has really > attempted to be as standards-compliant as possible and this is one area > where improvement could be made without breaking backward compatibility... Fixing the case issue is not *anywhere* near that easy. See past discussions. >> http://archives.postgresql.org/pgsql-general/2004-11/msg01375.php >> and the "previous discussion" referred to is this thread: >> http://archives.postgresql.org/pgsql-hackers/2004-10/msg00082.php >> >> As you can see, there wasn't a lot of agreement that we ought to change >> it. Arguably, if we did change it we'd get ripped for the "gotcha" of >> poor optimization when the user forgets to mark nonvolatile functions >> properly. (Personally, though, I'm in favor of tightening it up.) > > It all depends on which is worse? Incorrect results or bad performance > on poorly written functions? I would side with bad performance any day > if it was user error and could be fixed with the help of the people on > the PERFORM list. Well, as I said, I'm personally in favor of tightening up the query-flattening rules, but that will not in itself make volatile functions in subselects 100% gotcha-free. regards, tom lane
Scott Marlowe wrote: >On Thu, 2005-10-13 at 13:09, Chris Travers wrote: > >The problem, as explained to me when I floated this idea, is that the >CATALOGS are all in lower case, and many references to them are behind >the case folding mechanism (i.e. they get accessed by their lower case >names directly by backend software). > >So, one would also have to have some way of working around that. > > > So basically, the problem is that any fix for case folding would touch a fair bit of code and possibly cause other problems. However, I haven't seen anyone worry about performance issues in such a fix, just that it might be a fair bit of work. However, if it is not on the TODO list already, it seems that it should be there as it really is necessary for standards-compliance. Best Wishes, Chris Travers Metatron Technology Consulting
Chris Travers <chris@travelamericas.com> writes: > So basically, the problem is that any fix for case folding would touch a > fair bit of code and possibly cause other problems. However, I haven't > seen anyone worry about performance issues in such a fix, just that it > might be a fair bit of work. More that it's likely to create serious forwards, backwards, and sideways compatibility problems :-(. The work involved is mostly in figuring out how to deal with that. Since the end reward for all this work would be having to read CATALOGS WRITTEN IN ALL UPPER CASE, none of the key developers seem very interested ... regards, tom lane
Tom Lane wrote: >Chris Travers <chris@travelamericas.com> writes: > > >>So basically, the problem is that any fix for case folding would touch a >>fair bit of code and possibly cause other problems. However, I haven't >>seen anyone worry about performance issues in such a fix, just that it >>might be a fair bit of work. >> >> > >More that it's likely to create serious forwards, backwards, and >sideways compatibility problems :-(. The work involved is mostly >in figuring out how to deal with that. > > Just thinking based on this conversation.... Obviously, you and others are the authorities on the code that I am not. However, here are the thoughts that occur to me.... First, I am not sure that it is necessary to make things that bad. It seems that the main objection has to do with table and column names in system catalogs. Otherwise, whichever case a table name is in, it is in. I don't see anyone running out and saying "we should match either upper or lower cases in a single session without knowing which one a table is in" though I admit that the usability issues might lead to such a request in the future. Perhaps this is the biggest sideways compatibility issue you were mentioning. Otherwise, it might be sufficient for the backend to quote the names of the catalogs anytime it uses them. Again, this means touching a lot of code and possibly creating bugs, etc. But I haven't heard any reason to think that it would be a (significant) performance hit. >Since the end reward for all this work would be having to read CATALOGS >WRITTEN IN ALL UPPER CASE, none of the key developers seem very >interested ... > > Why would this be required? If an individual developer wants to have a db with folding to uppercase, that developer can easily quote any references to system tables himself. I don't see why this is a requirement. "Case sensitivity" is "Case sensitivity" is "Case sensitivity" ... Furthermore, if you are going to go there, then you have to deal with the inevitable question of what we do when people request the same behavior for other tables. See above. However, none of this is required to be standards-compatibile, unless I am missing something else.... Best Wishes, Chris Travers Metatron Technology Consulting
Chris Travers <chris@travelamericas.com> writes: > Tom Lane wrote: >> Since the end reward for all this work would be having to read CATALOGS >> WRITTEN IN ALL UPPER CASE, none of the key developers seem very >> interested ... >> > Why would this be required? If you write, say, select max(relpages) from pg_class; and the lexer thinks that it should fold unquoted identifiers to upper case, then the catalog entries defining these names had better read PG_CLASS, RELPAGES, and MAX, not the lower-case names they contain today. So this wouldn't be something you could flip on-the-fly --- at the latest, an installation would have to commit to upper or lower case at initdb time, because the initial contents of all the system catalogs would need to match the choice. Please read the previous discussions on the topic, if you want to pontificate about it. regards, tom lane
On Thu, Oct 13, 2005 at 12:21:14PM -0700, Chris Travers wrote: > >Since the end reward for all this work would be having to read CATALOGS > >WRITTEN IN ALL UPPER CASE, none of the key developers seem very > >interested ... > > > > > Why would this be required? If an individual developer wants to have a > db with folding to uppercase, that developer can easily quote any > references to system tables himself. I don't see why this is a > requirement. "Case sensitivity" is "Case sensitivity" is "Case > sensitivity" ... I think you're slightly missing the point. Most people won't need to change any queries because unquoted identifiers are case-folded anyway. Which way doesn't matter, as long as it's consistant. No the main objection would be this (see below). Does anyone really want this? I'll take lowercase anyday... (Note, it's not exact, some things aren't SQL identifiers so would be left alone. But idea is the same.) TEST=# \l List of databases Name | Owner | Encoding -----------+----------+---------- TEMPLATE0 | POSTGRES | UNICODE TEMPLATE1 | POSTGRES | UNICODE <snip> (6 rows) TEST=# \df+ bit List of functions Result data type | Schema | Name | Argument data types | Owner | Language | Source code | Description ------------------+------------+------+-----------------------+----------+----------+-------------+------------------------------ BIT | PG_CATALOG | BIT | BIGINT | POSTGRES | INTERNAL | BITFROMINT8 | INT8 To BITSTRING BIT | PG_CATALOG | BIT | BIT, INTEGER, BOOLEAN | POSTGRES | INTERNAL | BIT | Adjust BIT() to typomodlength BIT | PG_CATALOG | BIT | INTEGER | POSTGRES | INTERNAL | BITFROMINT4 | INT4 To BITSTRING (3 rows) TEST=# \d website Table "PUBLIC.WEBSITE" Column | Type | Modifiers -----------+---------+----------- FILENAME | TEXT | TITLE | TEXT | PARENT | TEXT | HITS | INTEGER | HIDDEN | BOOLEAN | HITSMONTH | INTEGER | Indexes: "WEBSITE_FILENAME" UNIQUE, BTREE (FILENAME) -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Ok. here are some indepth thoughts after reviewing as many prior threads as I could find on the archives. Tom Lane wrote: >Chris Travers <chris@travelamericas.com> writes: > > >>Tom Lane wrote: >> >> >>>Since the end reward for all this work would be having to read CATALOGS >>>WRITTEN IN ALL UPPER CASE, none of the key developers seem very >>>interested ... >>> >>> >>> >>Why would this be required? >> >> > >If you write, say, > > select max(relpages) from pg_class; > > I have gone back and read the previous discussion and I still do not understand what the real impediment is (to at least putting it on the TODO list, at least). I see a lot of reasons that don't make any sense to me, and have come across one substantial obstacle that has not been mentioned yet to my knowledge. I understood this exmaple from your previous comment. But aside from the aggregate issue, I fail to see why fixing it is a requirement. Perhaps I am being unclear in my thoughts and we are talking past eachother. If I write that statement, and it gets back an error saying that no table is named PG_CLASS, that is my fault as an individual developer. After all I set the configuration option to fold to uppercase, right? The relevant question is very simple. Where does one draw the line between between the responsibility of the programmer and the responsibility of the DBA? Personally I think it is important to offer modes that offer as much standards-compliance as possible. From the previous discussion, it was mentioned that the backend treats identifiers as quoted. It seems to me that this should make it *easier* rather than *harder* to impliment because this is largely a change regarding what a given SQL statement means. I.e. most of the work I had foreseen has already been done. (My proposal would have been to have the backend treat identifiers used internally as already double-quoted.) If you are folding to lower case, your example, select max(relpages) from pg_class, is the same as SELECT MAX("relpages") FROM "pg_class"; If you are folding to upper case, your example is the same as SELECT MAX("RELPAGES") FROM "PG_CLASS"; Of course, we don't expect this to give us any results today. In essence, I don't see why we would expect this to return any results. If you issue an incorrect SQL statement, whose fault is that? Now, the one place where this might create a problem is in the information_schema. The problem here is not the same as any issue I have seen discussed before, but the fact that case folding could create non-standard behavior here absent other changes. The only option I see here is to create a second INFORMATION_SCHEMA with upper case view names. >and the lexer thinks that it should fold unquoted identifiers to upper >case, then the catalog entries defining these names had better read >PG_CLASS, RELPAGES, and MAX, not the lower-case names they contain >today. > Ok. so the only problem out of these three that I see is with MAX. PG_CLASS and RELPAGES are the responsibility of the developer, IMHO. Also, with functions and aggregates, this is not the problem that it is with tables (as the name isn't usually sent back to the client), so I don't know how much logic it would be to differentiate between table/column names which might need to be folded and function/aggregates which could continue the way that they are currently done at least for now. It might also be possible to create duplicate entries to the catelogs for builtin functions/aggregates in the catalogs so that this case folding is not causing the same type of problem. However, for builtin functions/aggregates, I am not sure if this is likely to have any significant performance hit. > So this wouldn't be something you could flip on-the-fly --- at >the latest, an installation would have to commit to upper or lower case >at initdb time, because the initial contents of all the system catalogs >would need to match the choice. > > Ok, so I see the objection basically being that changing the semantics of the SQL statement would need to be done in a way that prevents user-issued queries from having to know which way this is done. There is no way around this objection because it is inconsistant with the very idea of semantic changes to the SQL parser. Yet we have done this in the past in areas I have previously mentioned. So the question really is what is really required to make this work in a semantically clean way. IMO, the following requirement is acceptable: SELECT MAX("relpages") FROM "pg_class" But the following is not: SELECT "max"("relpages") FROM "pg_class" However, I am sure that there will be people who don't see that as OK. So, I would suggest that if they neeed to avoid quoting relpages and pg_class, then they can create a PG_CATALOG schema and a PG_CLASS view. Maybe this could be a pgfoundry project even. But I don't see it as a requirement of the core team. The bigger issue is with "MAX" v. "max" and in "INFORMATION_SCHEMA" v. "information_schema." This might require duplicate entries in the system catalogs. >Please read the previous discussions on the topic, if you want to >pontificate about it. > > Is there still any specific reason why this does not belong on the TODO list? I am not arguing that this should be a priority in development. I am however arguing that since the behavior is non-standard, it might be worth acknowledging this and at least suggesting that it should be fixed at some indefinite point in the future.... If this was a requirement for me, I would hire someone to make the changes and submit a patch... It is just an attempt to ensure that it is on the roadmap at this time. Best Wishes, Chris Travers Metatron Technology Consulting
Martijn van Oosterhout wrote: >On Thu, Oct 13, 2005 at 12:21:14PM -0700, Chris Travers wrote: > > >>>Since the end reward for all this work would be having to read CATALOGS >>>WRITTEN IN ALL UPPER CASE, none of the key developers seem very >>>interested ... >>> >>> >>> >>> > >I think you're slightly missing the point. Most people won't need to >change any queries because unquoted identifiers are case-folded anyway. > > Right. That is part of the case for at least putting this on the TODO list. >Which way doesn't matter, as long as it's consistant. No the main >objection would be this (see below). Does anyone really want this? I'll >take lowercase anyday... (Note, it's not exact, some things aren't SQL >identifiers so would be left alone. But idea is the same.) > > Ok. If someone wanted it, we would see a patch already. I understand this. Probably making AS optional would be a bigger priority because such a patch exists already :-) However..... This is not only non-standard behavior but can also cause headaches in certain porting circumstances. Adding it to the TODO list would at least invite others to take this up *if* they (or their customers) required it. The work I see required would minimally be: 1) Add a GUC variable to allow for changing the case folding behavior. 2) Add duplicate entries for builtin aggregates and functions in the catalogs but with upper-case names 3) Adding a duplicate INFORMATION_SCHEMA with views off information_schema Additionally if people depend on case folding for catalogs, you would want to create a PG_CATALOG schema with views off pg_catalog. However, this seems to be peripheral and not central to the issues of managing the data given that identifiers are treated as already double-quoted. I *could* see a case where a company that does portability work (like EnterpriseDB, which specializes in Oracle portability) might find such a feature as important. Adding the item to the TODO list would invite patches where they are needed. Best Wishes, Chris Travers Metatron Technology Consulting
Tom Lane <tgl@sss.pgh.pa.us> writes: > If you write, say, > > select max(relpages) from pg_class; > > and the lexer thinks that it should fold unquoted identifiers to upper > case, then the catalog entries defining these names had better read > PG_CLASS, RELPAGES, and MAX, not the lower-case names they contain > today. Well the case of unquoted identifiers could be finessed by having it match RELPAGES first and fail over to relpages second. It could even be made to match RelPages and whatever if there isn't any ambiguity. I think the problem case arises when you have code that has a quoted "relpages". In that case the code is just not going to work unless the column really is named "relpages" not "RELPAGES". So to be completely backward compatible you need "relpages" as well as an unquoted relpages to work. And to be spec compliant you need "RELPAGES" to work too. That makes things a bit sticky. -- greg
Greg Stark <gsstark@mit.edu> writes: > Tom Lane <tgl@sss.pgh.pa.us> writes: >> and the lexer thinks that it should fold unquoted identifiers to upper >> case, then the catalog entries defining these names had better read >> PG_CLASS, RELPAGES, and MAX, not the lower-case names they contain >> today. > Well the case of unquoted identifiers could be finessed by having it match > RELPAGES first and fail over to relpages second. It could even be made to > match RelPages and whatever if there isn't any ambiguity. [ shrug... ] Sure, you could invent some rule that might sort of work most of the time. But then you've abandoned the sole rationale for the entire project, which is to *adhere to the standard*. Any kind of funny business with the case folding rules will make things worse not better from that standpoint. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > Tom Lane <tgl@sss.pgh.pa.us> writes: > >> and the lexer thinks that it should fold unquoted identifiers to upper > >> case, then the catalog entries defining these names had better read > >> PG_CLASS, RELPAGES, and MAX, not the lower-case names they contain > >> today. > > > Well the case of unquoted identifiers could be finessed by having it match > > RELPAGES first and fail over to relpages second. It could even be made to > > match RelPages and whatever if there isn't any ambiguity. > > [ shrug... ] Sure, you could invent some rule that might sort of work > most of the time. But then you've abandoned the sole rationale for the > entire project, which is to *adhere to the standard*. Any kind of funny > business with the case folding rules will make things worse not better > from that standpoint. Well sure, it would only be worthwhile if you could come up with rules that complied with the standard 100% of the time that the standard specifies behaviour. But if you could do that and satisfy 99% of the backwards compatibility issues including any catalog related issues then it seems like it would be worthwhile. But on further thought, if you want to have pg_dump et al output lowercase names (which I certainly prefer) then I think what you would have to do is have a bit that travels with every identifier that indicates whether it was quoted or not. So two identifiers match if either is an unquoted identifier and they match case insensitively. Or if both are quoted and they match case sensitively. Actually I think you can get closer to the standard if you interpret "case insensitively" above to mean they match after downcasing the unquoted identifier(s) or upcasing the unquoted identifier(s) but not if one of them is unquoted and the other is quoted and mixed case. There's still some funny business: if you create both "foo" and "FOO" and then refer to one of them with an unquoted foo. Since it's an unquoted identifier it matches both. If you give preference to the "FOO" then that would follow the standard. Also if you try to create both a "foo" and an unquoted foo then this method would say that's a conflict whereas the standard would say it was acceptable. I think this is an improvement on what's there now though. It lets Postgres have case-preserving case-insensitive unquoted identifiers which people have asked about multiple times and it solves most of the inter-database compatibility problems with case sensitivity. -- greg
On 10/13/2005 2:40 PM, Tom Lane wrote: > Chris Travers <chris@travelamericas.com> writes: >> So basically, the problem is that any fix for case folding would touch a >> fair bit of code and possibly cause other problems. However, I haven't >> seen anyone worry about performance issues in such a fix, just that it >> might be a fair bit of work. > > More that it's likely to create serious forwards, backwards, and > sideways compatibility problems :-(. The work involved is mostly > in figuring out how to deal with that. > > Since the end reward for all this work would be having to read CATALOGS > WRITTEN IN ALL UPPER CASE, none of the key developers seem very > interested ... Just an idea ... if the release that offers UPPER case folding as an option also makes sure that all internal and tool references to system catalog objects are properly quoted (as they IMHO should be anyway), then it would be reduced to a third party tool/application problem accessing the system catalog in a database that has this new config option selected. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Greg Stark <gsstark@mit.edu> writes: > So two identifiers match if either is an unquoted identifier and they match > case insensitively. Or if both are quoted and they match case sensitively. Which part of "adhere to the standard" are you failing to get? The standard is 100% clear about what it means by case folding, and the above is not it. regards, tom lane
Tom Lane <tgl@sss.pgh.pa.us> writes: > Greg Stark <gsstark@mit.edu> writes: > > So two identifiers match if either is an unquoted identifier and they match > > case insensitively. Or if both are quoted and they match case sensitively. > > Which part of "adhere to the standard" are you failing to get? The > standard is 100% clear about what it means by case folding, and the > above is not it. I guess I had a few unstated assumptions: 1) That we don't care what algorithm the standard specifies as long as we're functionally equivalent. 2) That we don't care about producing errors in all the cases where following the standard would produce an error. Ie, that as long as we accept any code that the standard says should work we don't care if we sometimes accept code that the standard says should produce errors like "column not found". Also my position somewhat evolved between the beginning of the mail and the end. Specifically I think we can aim to be compatible with specific common standards-compliant idioms even if we don't follow the letter of the standard. For example I think there's a ton of code out there that is nigh impossible to port to Postgres only because it defines tables with columns like "FOO" and then refers to them as foo or FOO. And vice versa, code that defines columns as foo or FOO and then refers to them in queries as "FOO". There's no reason to say it's useless to support that and make lots of code work fine under Postgres even if that doesn't mean we fit the standard 100%. -- greg
On Fri, Oct 14, 2005 at 10:34:53AM -0400, Jan Wieck wrote: > On 10/13/2005 2:40 PM, Tom Lane wrote: > >Since the end reward for all this work would be having to read CATALOGS > >WRITTEN IN ALL UPPER CASE, none of the key developers seem very > >interested ... > > Just an idea ... > > if the release that offers UPPER case folding as an option also makes > sure that all internal and tool references to system catalog objects are > properly quoted (as they IMHO should be anyway), then it would be > reduced to a third party tool/application problem accessing the system > catalog in a database that has this new config option selected. The main problem I see is that the most people who are complaining are people who create their tables as "MixedCaseQuoted" and then refer to them as Unquoted. Even following the SQL standard precisely will not fix this, given that behaviour is non-standard also. Seems to me we'd be better off creating an option "lowercase_quoted_anyway" which solves everything, at the expense of being even less compliant. Once you've loaded the schema, you pg_dump it again and then you can reload it without the option. All fixed... -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > The main problem I see is that the most people who are complaining are > people who create their tables as "MixedCaseQuoted" and then refer to > them as Unquoted. Even following the SQL standard precisely will not > fix this, given that behaviour is non-standard also. > Seems to me we'd be better off creating an option > "lowercase_quoted_anyway" which solves everything, at the expense of > being even less compliant. Hmm. That solves the problem for people who are used to MySQL or Access or whatever it is that behaves like that, but that problem hasn't really been on the radar screen that I've noticed. The push-back I've been paying attention to is from people who say "this application expects the standard-mandated behavior, why aren't you following the standard?" Although ... it's true that there seem to be very few apps relying on case sensitivity per se, ie, expecting "Foo" and "foo" to be different. The complaints that I can remember were about programs that expected "FOO" and FOO (not quoted) to be the same. So always-smash-to-lower- case might indeed solve most of the real-world problem for the Oracle camp as well. Comments anyone? regards, tom lane
Tom Lane wrote: > Although ... it's true that there seem to be very few apps relying on > case sensitivity per se, ie, expecting "Foo" and "foo" to be > different. The complaints that I can remember were about programs > that expected "FOO" and FOO (not quoted) to be the same. So > always-smash-to-lower- case might indeed solve most of the real-world > problem for the Oracle camp as well. That seems to be the right compromise. -- Peter Eisentraut http://developer.postgresql.org/~petere/
Greg Stark wrote: >Tom Lane <tgl@sss.pgh.pa.us> writes: > > > >>If you write, say, >> >> select max(relpages) from pg_class; >> >>and the lexer thinks that it should fold unquoted identifiers to upper >>case, then the catalog entries defining these names had better read >>PG_CLASS, RELPAGES, and MAX, not the lower-case names they contain >>today. >> >> > >Well the case of unquoted identifiers could be finessed by having it match >RELPAGES first and fail over to relpages second. It could even be made to >match RelPages and whatever if there isn't any ambiguity. > > Personally I don't see why relpages and pg_class are problems. After all, the backend treats identifiers as already quoted internally. If you change what the language means, you are going to get backward compatibility issues. End of story. But we have done this sort of thing before so it is not the end of the world as long as we provide a way to get back. Indeed treating identifiers as already quoted should prevent a huge number of bugs that might otherwise be problematic. But because there are no guarantees that the system catalogs won't change with each new major release, this isn't a problem. I.e. it isn't the responsibility of the core development team (IMO) to make this work. The information_schema is a bigger problem, but I don't see why you can't have two information schemas with cases to match each behavior. Sure it means a little more disk space, but I don't really see why it would be a major issue. You do have an issue with MAX v. max which could get nasty. I have proposed solving this one with a duplicate entry in the system catalogs with upper names for aggregates and builtin functions. Basically, if what I have read is accurate, this seems to be the least amount of work to make the following statement work: SELECT max("relpages") FROM "pg_class"; What we don't want to see is the requirement for a statement even less standard, like: SELECT "max"("relpage") FROM "pg_class"; Note that if you allow system catalogs to be created in upper case, and the backend treats identifiers as already double-quoted, then you have to go through and case fold these again, which is problematic, impacts performance, and adds the potential for many bugs. >I think the problem case arises when you have code that has a quoted >"relpages". In that case the code is just not going to work unless the column >really is named "relpages" not "RELPAGES". > >So to be completely backward compatible you need "relpages" as well as an >unquoted relpages to work. And to be spec compliant you need "RELPAGES" to >work too. That makes things a bit sticky. > > Is this really a requirement given that the system catalogs are not guaranteed to be backwards compatible and have been changed in the past? Best Wishes, Chris Travers
At 11:43 AM 10/15/2005 -0400, Tom Lane wrote: >Martijn van Oosterhout <kleptog@svana.org> writes: > > Seems to me we'd be better off creating an option > > "lowercase_quoted_anyway" which solves everything, at the expense of > > being even less compliant. I think that'll be a good option to have. >paying attention to is from people who say "this application expects the >standard-mandated behavior, why aren't you following the standard?" Because sometimes too many people don't follow the official standard. And sometimes the standard sucks ;). It have been normal to have everything in upper case decades ago, but nowadays the new norm appears to be everything in lower case (ever seen what kids nowadays type/txt out?). Not saying that's right, but I find everything in lower case easier to read than everything in upper case. >Although ... it's true that there seem to be very few apps relying on >case sensitivity per se, ie, expecting "Foo" and "foo" to be different. >The complaints that I can remember were about programs that expected >"FOO" and FOO (not quoted) to be the same. So always-smash-to-lower- >case might indeed solve most of the real-world problem for the Oracle >camp as well. > >Comments anyone? If it's an option and doesn't cost very much I'm for it. It must be a very small subset who rely on "Foo" and "foo" to be different AND also rely on "FOO" and FOO to be the same. Wonder what's the term for the even smaller subset who intentionally would want that. Link.
On Sun, Oct 16, 2005 at 10:00:25PM +0800, Lincoln Yeoh wrote: > >Martijn van Oosterhout <kleptog@svana.org> writes: > >> Seems to me we'd be better off creating an option > >> "lowercase_quoted_anyway" which solves everything, at the expense of > >> being even less compliant. > > I think that'll be a good option to have. Actually, perhaps an even more restricted version would be better. Lowercase quoted identifiers only if they are all uppercase. So then: Foo == FOO = foo == "foo" == "FOO" != "Foo" assuming it can be implemented with some mediocum of efficiency. > It must be a very small subset who rely on "Foo" and "foo" to be different > AND also rely on "FOO" and FOO to be the same. Wonder what's the term for > the even smaller subset who intentionally would want that. The strictly standards compliant group? But then, they'd probably want us to remove LIMIT/OFFSET so I'm not sure we need to worry about that. Note to implementor: In 'SELECT 1 as "Title"', the quoted string should not be lowercased, even if you are lowercasing everything else... Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
For the record, I am trying to flesh out my thoughts on this issue. I am not asking Tom to do this BTW :-) because I am sure he has more pressing contributions to make. However, because it can be an issue for those migrating from, say, Oracle, it would be good to have a discussion on what would be minimally required to make this happen. > >Well sure, it would only be worthwhile if you could come up with rules that >complied with the standard 100% of the time that the standard specifies >behaviour. > I doubt that the standard says anything about system catalogs. If our system catalogs are in lower case, but case is normally folded to upper case, then this can create some problems, but if the backend is already treating identifiers as already quoted when it looks in the catalogs, I see this as a non-issue. The system catalogs are not guaranteed to be backwards compatible anyway. For those cases of aggregates and functions where we get problems like not finding the MAX aggregate, we can solve that by having two MAX aggregates which are identical, but one is in upper case and the other in lower case. The same would be required > But if you could do that and satisfy 99% of the backwards >compatibility issues including any catalog related issues then it seems like >it would be worthwhile. > > What you are asking here is functionally impossible if you expect user queries to operate as is against the catalogs when the semantics of the language have changed. When you have semantic issues regarding standards-compliance, you are going to have to choose between standards-compliance and backwards-compatibility. Hence I was suggesting that an option be provided to the user to choose between these options. Scratches head.... I guess if you really wanted to allow the queries to really go without modifications, I guess you could create a second (duplicate) PG_CATALOG schema with views like PG_CLASS.... This seems to me to be iceing on the cake, however, and not part of the minimal requirements. Gotta love VIEWs :-) I guess that would solve every case of backwards-compatibility that I can find where the problem is not bad application design (of the sort that currently causes problems). >But on further thought, if you want to have pg_dump et al output lowercase >names (which I certainly prefer) then I think what you would have to do is >have a bit that travels with every identifier that indicates whether it was >quoted or not. > > If this is what you want, then leave folding to lower case. Simple :-) >So two identifiers match if either is an unquoted identifier and they match >case insensitively. Or if both are quoted and they match case sensitively. > > Sorry, not standards-compliant, as Tom pointed out.... Here are the basic assumptions I would make in such a solution: 1) We want the option of choosing standards-compliance over backwards-compatibility. 2) When in doubt, throw and error and abort the transaction. 3) NEVER EVER guess as to what the user really meant when the exact instructions are ambiguous. If we didn't make these assumptions, we would be using MySQL :-) Best Wishes, Chris Travers Metatron Technology COnsulting
Attachment
Jan Wieck wrote: > On 10/13/2005 2:40 PM, Tom Lane wrote: > >> Chris Travers <chris@travelamericas.com> writes: >> >>> So basically, the problem is that any fix for case folding would >>> touch a fair bit of code and possibly cause other problems. >>> However, I haven't seen anyone worry about performance issues in >>> such a fix, just that it might be a fair bit of work. >> >> >> More that it's likely to create serious forwards, backwards, and >> sideways compatibility problems :-(. The work involved is mostly >> in figuring out how to deal with that. >> >> Since the end reward for all this work would be having to read CATALOGS >> WRITTEN IN ALL UPPER CASE, none of the key developers seem very >> interested ... > > > Just an idea ... > > if the release that offers UPPER case folding as an option also makes > sure that all internal and tool references to system catalog objects > are properly quoted (as they IMHO should be anyway), then it would be > reduced to a third party tool/application problem accessing the system > catalog in a database that has this new config option selected. This is largely what I have been suggesting. However, you still have two (small) problems that could be solved fairly easily I think. Take Tom's example: SELECT MAX(relpages) FROM pg_class; Now, I am perfectly happy to require relpages and pg_class to be quoted (if you don't want to, there are always views....), however, MAX is also an identifier, which if I understand Tom correctly is also folded to lower case. If you fold MAX to upper case, you may not find the aggregate. I don't see an easy answer aside from having separate entries for "max" and "MAX" which are functionally the same. Secondly, you have one other area where you could get non-standard behavior if you require the user to quote lower-case system identifiers. This is the information_schema. Solving this would simply require a separate INFORMATION_SCHEMA, maybe composed as views off the views and tables in information_schema. Finally, if this was such a pressing issue, I am sure it would already be on the TODO list. My suggestion is that we put it on the TODO list simply as a way of ensuring that when people run into this problem, we are showing that we are open to contributions in this area. I am sure that there are firms out there which will run into it eventually because Oracle is standards-compliant in this area and there are many applications that people are trying to port over which could run into these sort of issues. This is open source and I see no problem with pushing the work in an area like this back to the contributor who is helping his/her customer. After all, this helps companies which do consulting make money... If someone can make a case for preserving case in identifiers (which I doubt can be made given the fact that not all identifiers are created equal, so is MAX() the same as Max() the same as mAx()?), then we can put that in the TODO list as well. The TODO list is already long, so I don't see any reason not to put an entry there. Best Wishes, Chris Travers Metatron Technology Consulting
Attachment
Martijn van Oosterhout <kleptog@svana.org> writes: > Actually, perhaps an even more restricted version would be better. > Lowercase quoted identifiers only if they are all uppercase. So then: No, I think the original proposal was better. This one doesn't fix things for the lusers who think "MixedCase" and unquoted MixedCase should be the same. > Note to implementor: In 'SELECT 1 as "Title"', the quoted string should > not be lowercased, even if you are lowercasing everything else... You don't get to have that, I think, because the lexer is not context aware. It's not clear to me why it's a good idea anyway. regards, tom lane
On Sat, Oct 15, 2005 at 05:46:03PM -0700, Chris Travers wrote: > This is largely what I have been suggesting. However, you still have > two (small) problems that could be solved fairly easily I think. Take > Tom's example: > > SELECT MAX(relpages) FROM pg_class; > > Now, I am perfectly happy to require relpages and pg_class to be quoted > (if you don't want to, there are always views....), however, MAX is also > an identifier, which if I understand Tom correctly is also folded to > lower case. If you fold MAX to upper case, you may not find the > aggregate. I don't see an easy answer aside from having separate > entries for "max" and "MAX" which are functionally the same. I think you're certainly waving away a lot of potential issues with saying that "as long as system identifiers are quoted we're fine". Tom's example is just the tip of the iceberg. - consider functions like nextval/currval/substring. Some of these are created and used internally too, should they always be quoted? Would these functions all appear twice in the functions list? - consider oid,xmin,xmax,etc are they system identifiers? You can't provide two copies of those, since each column can only appear once in the catalogs. - consider pg_dump, psql and other such tools with intimate knowledge of the catalogs, they would all need to updated. - consider the number of third party apps that uses these "system identifiers". Updating all of those is a massive task, for what gain? There is some sympathy around here for the view that in this case the standard is simply wrong. As said on this thread already, uppercase only was appropriate for twenty years ago but not now. Older programming languages tended to have their keywords in uppercase, yet no modern language would consider such an act. So, I think in the long run there is going to be some sympathy for helping people with "QUOTEDUPPERCASE" identifiers to port their stuff but I don't think PostgreSQL itself will ever default to folding up. Hence the discussion about folding quoted identifiers also, which solves the issue without masses of changes everywhere. Curiously, MySQL does as we do, although they don't accept standard quoting at all (using backquotes). Do people complain about them too? Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
On Sun, Oct 16, 2005 at 12:40:25PM -0400, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: > > Actually, perhaps an even more restricted version would be better. > > Lowercase quoted identifiers only if they are all uppercase. So then: > > No, I think the original proposal was better. This one doesn't fix > things for the lusers who think "MixedCase" and unquoted MixedCase > should be the same. Okay, how do we refer to the "RI_FKey_cascade_del" function then? Or trigger "RI_ConstraintTrigger_2204908". Using $$ quoting? Invent another quoting style? > > Note to implementor: In 'SELECT 1 as "Title"', the quoted string should > > not be lowercased, even if you are lowercasing everything else... > > You don't get to have that, I think, because the lexer is not context > aware. It's not clear to me why it's a good idea anyway. Depends at what level you do the conversion. But lowercasing here means that people producing HTML tables using psql and providing multi word titles are going to be mightily disappointed if their titles are lowercased on them. Then again, if it's an optional switch, maybe it doesn't matter... Lowercasing only full uppercased identifiers is more something you might have turned on all the time. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a > tool for doing 5% of the work and then sitting around waiting for someone > else to do the other 95% so you can sue them.
Attachment
Tom Lane wrote: > Although ... it's true that there seem to be very few apps relying on > case sensitivity per se, ie, expecting "Foo" and "foo" to be different. > The complaints that I can remember were about programs that expected > "FOO" and FOO (not quoted) to be the same. So always-smash-to-lower- > case might indeed solve most of the real-world problem for the Oracle > camp as well. Is it worth having a GUC variable that enables / disable this? I hate to add GUCs but this seems like a fairly significant change in expected behavior. Not sure it's a good idea, but I thought I should ask the question anyway. Matt
Tom Lane wrote: > > Note to implementor: In 'SELECT 1 as "Title"', the quoted string > > should not be lowercased, even if you are lowercasing everything > > else... > > You don't get to have that, I think, because the lexer is not context > aware. It's not clear to me why it's a good idea anyway. This is used to create pretty column headers in table output. We'd surely get criticism if this no longer worked. I guess you could get around that if you leave the case-folding in the lexer as is but instead make the nameeq function case insensitive. This is sort of the way the Windows file system works (I think). (And we know that *never* leads to problems... :-/) -- Peter Eisentraut http://developer.postgresql.org/~petere/
Martijn van Oosterhout <kleptog@svana.org> writes: > Okay, how do we refer to the "RI_FKey_cascade_del" function then? Or > trigger "RI_ConstraintTrigger_2204908". Using $$ quoting? Invent > another quoting style? Yeah, I was wondering about those myself. Probably we could just change their names to something that could still be spelled. Or do nothing --- I can't see any obvious reason why the average user would need to reference those functions/triggers anyway, and if he did, he could turn off the force-lowercase flag. > Depends at what level you do the conversion. But lowercasing here means > that people producing HTML tables using psql and providing multi word > titles are going to be mightily disappointed if their titles are > lowercased on them. If they care about case, they wouldn't be turning this flag on anyway. What we are looking for here is a simple, understandable hack that solves 95% of the problem for legacy applications. Not something that we'd encourage people to have on at all times. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > I guess you could get around that if you leave the case-folding in the > lexer as is but instead make the nameeq function case insensitive. If you do that then you have to nail down the behavior at initdb time (else flipping the switch will corrupt all your system catalog indexes). To be selective about which occurrences of identifiers get smashed to lower, I think we would have to postpone the smashing into analyze.c. This would be a lot messier than doing it once in the lexer --- but on the other hand, it would avoid violating the rule stated in gram.y, namely that lexer and grammar behavior must not vary depending on GUC variable settings. On the third hand, we could get around that restriction by not allowing the GUC variable to change within a session, which might be a good idea anyway to avoid confusing applications. regards, tom lane
Matthew wrote: > Tom Lane wrote: > >> Although ... it's true that there seem to be very few apps relying on >> case sensitivity per se, ie, expecting "Foo" and "foo" to be different. >> The complaints that I can remember were about programs that expected >> "FOO" and FOO (not quoted) to be the same. So always-smash-to-lower- >> case might indeed solve most of the real-world problem for the Oracle >> camp as well. > > > Is it worth having a GUC variable that enables / disable this? I > hate to add GUCs but this seems like a fairly significant change in > expected behavior. Not sure it's a good idea, but I thought I should > ask the question anyway. It is a good idea for a very simple reason. When given a choice between standards-compliance and backwards-compatibility with prior versions, it is important to give the option to make this change. The only other option is to make it a initdb time decision. Autocommit is a bad example. A better option is the GUC variable that allows you to go back to the way we used to do things and allow NULL = NULL to return TRUE instead of NULL. In both these cases, the difference is a semantic change in what a given SQL statement means. Best Wishes, Chris Travers Metatron Technology Consulting
Matthew <matthew@zeut.net> writes: > Is it worth having a GUC variable that enables / disable this? That's a given, I think. We're certainly not going to make smash-to- lower-case the only available behavior. One issue we might want to reflect on is how easy it should be to change the variable's setting. Our bad experience with autocommit leaves me a bit wary of fundamental behavioral changes that can be flicked on and off at whim. However, we have pretty much the same problem staring us in the face for backslash-in-literal-strings behavior, so maybe we can find an answer we like for both issues at the same time. regards, tom lane
Martijn van Oosterhout wrote: >On Sat, Oct 15, 2005 at 05:46:03PM -0700, Chris Travers wrote: > > >>This is largely what I have been suggesting. However, you still have >>two (small) problems that could be solved fairly easily I think. Take >>Tom's example: >> >>SELECT MAX(relpages) FROM pg_class; >> >>Now, I am perfectly happy to require relpages and pg_class to be quoted >>(if you don't want to, there are always views....), however, MAX is also >>an identifier, which if I understand Tom correctly is also folded to >>lower case. If you fold MAX to upper case, you may not find the >>aggregate. I don't see an easy answer aside from having separate >>entries for "max" and "MAX" which are functionally the same. >> >> > >I think you're certainly waving away a lot of potential issues with saying >that "as long as system identifiers are quoted we're fine". Tom's >example is just the tip of the iceberg. > >- consider functions like nextval/currval/substring. Some of these are >created and used internally too, should they always be quoted? Would >these functions all appear twice in the functions list? > > I think I mentioned that the case issue is of particular importance for functions and aggregates because we don't want to type: SELECT "max"("relpages") FROM "pg_class" and this is definitely not standard anyway. I think any functions that are shipped with PostgreSQL should come in both an upper case and lower case form. I am willing to commit to building these in the system catalogs. Same with basic types (like "INT" and "int") for example. Or another option would just be to lower case all reserved SQL words though this would lead to a wide range of possible issues. >- consider oid,xmin,xmax,etc are they system identifiers? You can't >provide two copies of those, since each column can only appear once in >the catalogs. > > No you can't but if you need to refer to oid, xmin, and xmax from outside the backend, you can always quote them. Besides if you really nead that level of compatibility, consider the following.... CREATE VIEW "PG_CLASS" AS SELECT "oid" AS "OID", "relname" as "RELNAME", ..... FROM "pg_class"; and then create associated rules. So yes, we can have two entries in the system catalogs if we need to because views are wonderful things :-) The only real exception is with functions and agregates, and these are bigger issues. BTW, not to split hairs here too much, but for those customers that need this now, I suppose one could always use this same trick. CREATE SCHEMA "PUBLIC"; CREATE VIEW "PUBLIC"."TABLE1" AS SELECT "field1" AS "FIELD1".... FROM "public"."table1"; CREATE RULE irule AS ON INSERT INTO "TABLE1" DO INSTEAD INSERT INTO "public"."table1" ("field1"...) VALUES (NEW."FIELD1"...); etc.... Could be a lot of work, but it should be possible to write a Perl script to generate these for you :-) Sounds like a project for me :-P >- consider pg_dump, psql and other such tools with intimate knowledge >of the catalogs, they would all need to updated. > > If you need to create backward compatibility via views, my proposal would allow for that. Also if this is a GUC variable, there is no reason it can't be set by the application where needed. Also, as far as I am concerned, I would rather leave the system catalogs alone and change them as *little* as possible for these reasons. >- consider the number of third party apps that uses these "system >identifiers". Updating all of those is a massive task, for what gain? > > See above regarding views. The only difference between shipping it with this mode enabled is that it provides an easy means for customers to avoid the problem and we have a centralized solution. >There is some sympathy around here for the view that in this case the >standard is simply wrong. As said on this thread already, uppercase >only was appropriate for twenty years ago but not now. Older >programming languages tended to have their keywords in uppercase, yet >no modern language would consider such an act. > >So, I think in the long run there is going to be some sympathy for >helping people with "QUOTEDUPPERCASE" identifiers to port their stuff >but I don't think PostgreSQL itself will ever default to folding up. > > Who ever said anything about defaulting. I simply said an option should be present. FWIW I prefer the way PostgreSQL does it, but I have to recognize that it will bite people migrating from RDBMS's that do follow the standards. >Hence the discussion about folding quoted identifiers also, which >solves the issue without masses of changes everywhere. > >Curiously, MySQL does as we do, although they don't accept standard >quoting at all (using backquotes). Do people complain about them too? > > MySQL's unquoted table names being case sensitive is even more broken than ours ;-) Besides if people using MySQL cared about standards, I don't think they would still be using MySQL... It seems to me that the first step in any solution is going to be creating a tool that automatically generates views in appropriate schema using upper case (or maybe optionally lower case) table and column names. This should be fairly simple to do. Then such a tool could be used to provide whatever sort of compatibility we need. Indeed this may in fact be more flexible in helping people migrate to BrokenSystemsLikeMySQL because it would allow you to create views off tables using lower case names as well. A second step will be to ensure that the backend really does treat identifiers as already double-quoted. However, this is only a partial fix. It doesn't solve the larger issue of (albeit badly written but standards-compliant) queries like: SELECT "MYCOLUMN" FROM MYTABLE; My fundamental point is that the above statement means something specific according to the SQL standard, and it means something *different* in PostgreSQL. The time may not be ripe for a solution but if it is at least on the TODO list, then maybe people will be less bit by it if and when we actually do support this behavior. Best Wishes, Chris Travers Metatron Technology Consulting
Chris Travers <chris@verkiel.metatrontech.com> writes: > I make the following assumptions: > 1) All backend references to identifiers are treated as quoted by the > backend as mentioned in a previous thread. > 2) Same with official clients like psql. > 3) We don't guarantee complete backward compatibility in the system > catalogs anyway (note that this is a trivial problem to solve if 1 above > is solved-- schemas, views, and rules can accomplish some cool tricks > :-) ). None of these assumptions are acceptable, particularly not #2. In reality #2 is a proxy for "I'm willing to break all existing clients that look at the system catalogs". Which we're not. regards, tom lane
Since nobody has told me why my proposal doesn't work, I figure I will point out that it does solve this problem nicely. However, it makes the assumptions that we treat the identifiers as quoted when the backend uses them to look them up in the system catalogs. Also I am rethinking a few details.... I make the following assumptions: 1) All backend references to identifiers are treated as quoted by the backend as mentioned in a previous thread. 2) Same with official clients like psql. 3) We don't guarantee complete backward compatibility in the system catalogs anyway (note that this is a trivial problem to solve if 1 above is solved-- schemas, views, and rules can accomplish some cool tricks :-) ). If these are not done, then it would be too much work to do at the moment even if a customer required it. However, if this is the case, we need to get these areas done first (they probably should be done anyway). Quite frankly if system catalog backwards-compatibility is that much of an issue for people, if others commit to the rest of the work, I will be happy to create a PG_CATALOG schema with views off pg_catalog.... All then that we need to do is to put it in the search path :-) Oh, and the same for information_schema and INFORMATION_SCHEMA tool. > >Okay, how do we refer to the "RI_FKey_cascade_del" function then? Or >trigger "RI_ConstraintTrigger_2204908". Using $$ quoting? Invent >another quoting style? > > Right, and when we start reqiring non-standard quoting styles, then we are not standards-compliant either :-) Also it occurs to me that we that there are three options for dealing with case issues. These do *not* need to be done at the parser level as long as the backend treats catalog identifiers as previously quoted. But there are really two options... 1) Try to differentiate between function names and aggregates in the parser. This seems like a non-starter to me. But if I am missing something I am open to suggestions.... 2) My original idea of duplicate catalog entries. I.e. one for "MAX" and one for "max" Again, I am willing to do this work if people decide that they are going to go ahead and try to get this in the next release. Best Wishes, Chris Travers Metatron Technology Consulting
Attachment
On 10/16/2005 12:40 PM, Tom Lane wrote: > Martijn van Oosterhout <kleptog@svana.org> writes: >> Actually, perhaps an even more restricted version would be better. >> Lowercase quoted identifiers only if they are all uppercase. So then: > > No, I think the original proposal was better. This one doesn't fix > things for the lusers who think "MixedCase" and unquoted MixedCase > should be the same. What is bad about leaving pg_catalog all lower case and expect everyone to query the catalog quoted? We would be totally free as to the flexibility of the new option. One setting could even act 100% according to the standard ... a feature I'm missing so far in all the proposals. Jan > >> Note to implementor: In 'SELECT 1 as "Title"', the quoted string should >> not be lowercased, even if you are lowercasing everything else... > > You don't get to have that, I think, because the lexer is not context > aware. It's not clear to me why it's a good idea anyway. > > regards, tom lane -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <JanWieck@Yahoo.com> writes: > What is bad about leaving pg_catalog all lower case and expect everyone > to query the catalog quoted? The fact that it will break every nontrivial client currently in existence. Those quotes aren't there in the clients and we can't suddenly mandate them to become there. Also, that approach does not work for built-in functions (MAX, etc) regards, tom lane
On 10/17/2005 10:16 AM, Tom Lane wrote: > Jan Wieck <JanWieck@Yahoo.com> writes: >> What is bad about leaving pg_catalog all lower case and expect everyone >> to query the catalog quoted? > > The fact that it will break every nontrivial client currently in > existence. Those quotes aren't there in the clients and we can't > suddenly mandate them to become there. Also, that approach does not > work for built-in functions (MAX, etc) Ugh ... got me on the MAX part. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Tom Lane wrote: >Chris Travers <chris@verkiel.metatrontech.com> writes: > > >>I make the following assumptions: >> >> > > > >>1) All backend references to identifiers are treated as quoted by the >>backend as mentioned in a previous thread. >>2) Same with official clients like psql. >>3) We don't guarantee complete backward compatibility in the system >>catalogs anyway (note that this is a trivial problem to solve if 1 above >>is solved-- schemas, views, and rules can accomplish some cool tricks >>:-) ). >> >> > >None of these assumptions are acceptable, particularly not #2. In >reality #2 is a proxy for "I'm willing to break all existing clients >that look at the system catalogs". Which we're not. > > Ok then. #1 Is the biggie. If it is not done, it should be done before one can go any further. If it is solved, however, then it is trivial to fix 2 and 3. This week I am going to begin work on a "shadow upper case views" tool that could be used to automatically solve #'s 2 and 3. This tool would be used to automatically create separate schemas and views with all upper case names (including OID. If CTID, XMAX, and XMIN are needed, they could be added easily enough. I.e. #1 is the only assumption which will fully *break* my proposal and is the only clear prerequisite. #2 and #3 were voiced with the idea of making the change minimalist. However, again, there are tricks with views that can be used to solve these problems and it is largely an issue of automating this process so we don't have to create views by hand for every table in the pg_catalog. On second thought, this automation would be needed anyway because the information schema tables and views would also be effected. And this automation may be good enough to resolve many of the issues in existing versions that people may have with mixed quoting environments. Best Wishes, Chris Travers Metatron Technology Consulting
Jan Wieck wrote: > On 10/17/2005 10:16 AM, Tom Lane wrote: > >> Jan Wieck <JanWieck@Yahoo.com> writes: >> >>> What is bad about leaving pg_catalog all lower case and expect >>> everyone to query the catalog quoted? >> >> >> The fact that it will break every nontrivial client currently in >> existence. Those quotes aren't there in the clients and we can't >> suddenly mandate them to become there. Also, that approach does not >> work for built-in functions (MAX, etc) > > > Ugh ... got me on the MAX part. For built-in functions and aggregates, I see no option other than having duplicate entries in the catalogs. I.e. have an aggregate called "max" and a duplicate aggregate called "MAX" (nobody has explained to me why this is a bad idea aside from the single point of truth concerns). The rest can be solved with clever uses of views :-) Furthermore the creation of these views should be easily automated. I am willing to write a tool to automate this.
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Matthew <matthew@zeut.net> writes: > > Is it worth having a GUC variable that enables / disable this? > > That's a given, I think. We're certainly not going to make smash-to- > lower-case the only available behavior. A GUC variable for this would be quite nice.. I had some difficulty porting a MySQL application to PostgreSQL because it used PHP and PEAR, which allows you to create a database 'object' and then reference things inside it which are actually column names, ie: $db = ...; $myvar = $db->Column1; That kind of stuff, where it did quoting on the actual DB request and made moving it from MySQL to PostgreSQL a big pain, even though the rest of the application was pretty good SQL. What made it that much more difficult was that there were places in the application which used direct SQL statements and the identifiers in those weren't quoted. :/ Or maybe that was the original problem... Anyhow, I *think* PEAR has been changed now to not quote but I'm not 100% sure; I just fixed the application, noted the issue, and moved on. > One issue we might want to reflect on is how easy it should be to change > the variable's setting. Our bad experience with autocommit leaves me a > bit wary of fundamental behavioral changes that can be flicked on and > off at whim. However, we have pretty much the same problem staring us > in the face for backslash-in-literal-strings behavior, so maybe we can > find an answer we like for both issues at the same time. That'd be very nice... Thanks, Stephen