Thread: DatabaseMetaData.getExtraNameCharacters
Hi all, We are developing a very simple application that permits users to define their own tables. We would like to leave the user the freedom to use any characters for table and field identifiers. We tried using accented letters, like è or ò, and it works. Then we saw the method java.sql.DatabaseMetaData.getExtraNameCharacters() and we thought about verifing our identifiers against this method. The problem is that getExtraNameCharacters() returns an empty string, at least using a 7.4 jdbc driver. Is this the right behaviour? (I mean: should we refrain to use "strange" letters?) Thanks a lot, Giuseppe
On Wed, 25 May 2005, Giuseppe Sacco wrote: > We are developing a very simple application that permits users to define > their own tables. We would like to leave the user the freedom to use any > characters for table and field identifiers. > > We tried using accented letters, like è or ò, and it works. Then we saw > the method java.sql.DatabaseMetaData.getExtraNameCharacters() and we > thought about verifing our identifiers against this method. > > The problem is that getExtraNameCharacters() returns an empty string, at > least using a 7.4 jdbc driver. > Postgresql allows pretty much any character that is valid in your encoding to be used in an unquoted identifier, so trying to list them all would be impossible, but there should be no problem using them. Kris Jurka
Hi Kris, Il giorno mer, 25-05-2005 alle 12:08 -0500, Kris Jurka ha scritto: [...] > Postgresql allows pretty much any character that is valid in your encoding > to be used in an unquoted identifier, so trying to list them all would be > impossible, but there should be no problem using them. Thanks for your reply, but I would really like to have a list of valid or invalid characters in order to test the string before executing the statement. There are characters that should be excluded: $%^°§[} ... If the valid characters list is really huge, is there any list of invalid characters and a JDBC3.0 method that return it? I checked that mySQL gives '#$' as result string. What are the results given by other JDBCs? Bye, Giuseppe
On Wed, 25 May 2005, Giuseppe Sacco wrote: > Thanks for your reply, but I would really like to have a list of valid > or invalid characters in order to test the string before executing the > statement. There are characters that should be excluded: $%^°§[} ... > > If the valid characters list is really huge, is there any list of > invalid characters and a JDBC3.0 method that return it? The backend lexing rules (src/backend/parser/scan.l) show this for identifiers: ident_start [A-Za-z\200-\377_] ident_cont [A-Za-z\200-\377_0-9\$] identifier {ident_start}{ident_cont}* So \200-\377 is octal for any character with the high bit set. The list of what characters this could map to numbers in the tens of thousands for unicode, so it's not really feasibly to return in this method. JDBC does not offer a method to retrieve the list of invalid characters. I'm guessing because for other databases that number would be the extremely large list we have for valid ones. The above lexing rules will tell you though. Kris Jurka
Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto: [...] > ident_start [A-Za-z\200-\377_] > ident_cont [A-Za-z\200-\377_0-9\$] > identifier {ident_start}{ident_cont}* > > So \200-\377 is octal for any character with the high bit set. The list > of what characters this could map to numbers in the tens of thousands for > unicode, so it's not really feasibly to return in this method. [...] Thanks again Kris, if I understand correclty, the valid charset is the one computed by the attached class. It seems to me that this is 191 characters long. Could you please let me know where I am wrong? Giuseppe class pgchars { public pgchars() {} static public void main(String[] args) { String x="ABCDEFGHIJKLMNOPQRSTUVWXYZ"; x += x.toLowerCase(); x += "0123456789"; for (int i=2*8*8; i < 3*8*8+7*8+7; i++) { char[] c = { (char)i }; x += new String(c); } x += "_$"; System.out.println("final string size: " + x.length()); } }
On Wed, 25 May 2005, Giuseppe Sacco wrote: > Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto: > [...] > > ident_start [A-Za-z\200-\377_] > > ident_cont [A-Za-z\200-\377_0-9\$] > > identifier {ident_start}{ident_cont}* > > > > So \200-\377 is octal for any character with the high bit set. The list > > of what characters this could map to numbers in the tens of thousands for > > unicode, so it's not really feasibly to return in this method. > > if I understand correclty, the valid charset is the one computed by the > attached class. It seems to me that this is 191 characters long. > Could you please let me know where I am wrong? You are assuming that each character is only one byte. The backend lexing rules are a byte by byte operation, but the JDBC side is returning a String of characters. Consider the character "Latin Small Letter s with Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal. This is one character in the result of getExtraNameCharacters. Kris Jurka
Il giorno mer, 25-05-2005 alle 14:59 -0500, Kris Jurka ha scritto: [...] > You are assuming that each character is only one byte. The backend lexing > rules are a byte by byte operation, but the JDBC side is returning a > String of characters. Consider the character "Latin Small Letter s with > Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal. This > is one character in the result of getExtraNameCharacters. Thank you very much for your reply. Yes, I assumed that this was a one-byte character encoding and I didn't know that \200 < \305\233 < \377. I was also assuming (maybe wrongly) that JDBC driver convert java utf-8 characters to ASCII when connecting to postgresql backend. This is why I counted 191 characters. I also made a test: "CREATE TABLE A\u015B ( A VARCHAR(30) )" fails when calling java.sql.Statement.execute(). The error is: ERROR: syntax error at or near "[" Looking at the postgresql log, I found the statement was translated in "CREATE TABLE A[ ( A VARCHAR(30))" Similarly, "CREATE TABLE A\u00C0 ( A VARCHAR(30) )" succeeds. so maybe postgresql doesn't accept a broad range of utf-8 characters, while it might accept the ASCII characters from \200 to \377 when it is a Character.isLetterOrDigit(). May it be something dependant on the database encoding? Thanks again, Giuseppe