Thread: DatabaseMetaData.getExtraNameCharacters

DatabaseMetaData.getExtraNameCharacters

From

Giuseppe Sacco

Date:

25 May 2005, 12:14:55

Hi all,
We are developing a very simple application that permits users to define
their own tables.
We would like to leave the user the freedom to use any characters for
table and field identifiers.

We tried using accented letters, like è or ò, and it works. Then we saw
the method java.sql.DatabaseMetaData.getExtraNameCharacters() and we
thought about verifing our identifiers against this method.

The problem is that getExtraNameCharacters() returns an empty string, at
least using a 7.4 jdbc driver.

Is this the right behaviour? (I mean: should we refrain to use "strange"
letters?)

Thanks a lot,
Giuseppe

Re: DatabaseMetaData.getExtraNameCharacters

From

Kris Jurka

Date:

25 May 2005, 14:08:54

On Wed, 25 May 2005, Giuseppe Sacco wrote:

> We are developing a very simple application that permits users to define
> their own tables. We would like to leave the user the freedom to use any
> characters for table and field identifiers.
>
> We tried using accented letters, like è or ò, and it works. Then we saw
> the method java.sql.DatabaseMetaData.getExtraNameCharacters() and we
> thought about verifing our identifiers against this method.
>
> The problem is that getExtraNameCharacters() returns an empty string, at
> least using a 7.4 jdbc driver.
>

Postgresql allows pretty much any character that is valid in your encoding
to be used in an unquoted identifier, so trying to list them all would be
impossible, but there should be no problem using them.

Kris Jurka

Re: DatabaseMetaData.getExtraNameCharacters

From

Giuseppe Sacco

Date:

25 May 2005, 15:11:50

Hi Kris,

Il giorno mer, 25-05-2005 alle 12:08 -0500, Kris Jurka ha scritto:
[...]
> Postgresql allows pretty much any character that is valid in your encoding
> to be used in an unquoted identifier, so trying to list them all would be
> impossible, but there should be no problem using them.

Thanks for your reply, but I would really like to have a list of valid
or invalid characters in order to test the string before executing the
statement. There are characters that should be excluded: $%&#^°§[} ...

If the valid characters list is really huge, is there any list of
invalid characters and a JDBC3.0 method that return it?

I checked that mySQL gives '#$' as result string. What are the results
given by other JDBCs?

Bye,
Giuseppe

Re: DatabaseMetaData.getExtraNameCharacters

From

Kris Jurka

Date:

25 May 2005, 15:25:45

On Wed, 25 May 2005, Giuseppe Sacco wrote:

> Thanks for your reply, but I would really like to have a list of valid
> or invalid characters in order to test the string before executing the
> statement. There are characters that should be excluded: $%&#^°§[} ...
>
> If the valid characters list is really huge, is there any list of
> invalid characters and a JDBC3.0 method that return it?

The backend lexing rules (src/backend/parser/scan.l) show this for
identifiers:

ident_start             [A-Za-z\200-\377_]
ident_cont              [A-Za-z\200-\377_0-9\$]
identifier              {ident_start}{ident_cont}*

So \200-\377 is octal for any character with the high bit set.  The list
of what characters this could map to numbers in the tens of thousands for
unicode, so it's not really feasibly to return in this method.

JDBC does not offer a method to retrieve the list of invalid characters.
I'm guessing because for other databases that number would be the
extremely large list we have for valid ones.  The above lexing rules will
tell you though.

Kris Jurka

Re: DatabaseMetaData.getExtraNameCharacters

From

Giuseppe Sacco

Date:

25 May 2005, 15:52:33

Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto:
[...]
> ident_start             [A-Za-z\200-\377_]
> ident_cont              [A-Za-z\200-\377_0-9\$]
> identifier              {ident_start}{ident_cont}*
>
> So \200-\377 is octal for any character with the high bit set.  The list
> of what characters this could map to numbers in the tens of thousands for
> unicode, so it's not really feasibly to return in this method.
[...]

Thanks again Kris,
if I understand correclty, the valid charset is the one computed by the
attached class. It seems to me that this is 191 characters long.
Could you please let me know where I am wrong?

Giuseppe
class pgchars {
    public pgchars() {}
    static public void main(String[] args) {
        String x="ABCDEFGHIJKLMNOPQRSTUVWXYZ";

        x += x.toLowerCase();
        x += "0123456789";

        for (int i=2*8*8; i < 3*8*8+7*8+7; i++) {
            char[] c = { (char)i };
            x += new String(c);
        }

        x += "_$";

        System.out.println("final string size: " + x.length());
    }
}

Re: DatabaseMetaData.getExtraNameCharacters

From

Kris Jurka

Date:

25 May 2005, 17:00:08

On Wed, 25 May 2005, Giuseppe Sacco wrote:

> Il giorno mer, 25-05-2005 alle 13:25 -0500, Kris Jurka ha scritto:
> [...]
> > ident_start             [A-Za-z\200-\377_]
> > ident_cont              [A-Za-z\200-\377_0-9\$]
> > identifier              {ident_start}{ident_cont}*
> >
> > So \200-\377 is octal for any character with the high bit set.  The list
> > of what characters this could map to numbers in the tens of thousands for
> > unicode, so it's not really feasibly to return in this method.
>
> if I understand correclty, the valid charset is the one computed by the
> attached class. It seems to me that this is 191 characters long.
> Could you please let me know where I am wrong?

You are assuming that each character is only one byte.  The backend lexing
rules are a byte by byte operation, but the JDBC side is returning a
String of characters.  Consider the character "Latin Small Letter s with
Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal.  This
is one character in the result of getExtraNameCharacters.

Kris Jurka

Re: DatabaseMetaData.getExtraNameCharacters

From

Giuseppe Sacco

Date:

26 May 2005, 05:46:47

Il giorno mer, 25-05-2005 alle 14:59 -0500, Kris Jurka ha scritto:
[...]
> You are assuming that each character is only one byte.  The backend lexing
> rules are a byte by byte operation, but the JDBC side is returning a
> String of characters.  Consider the character "Latin Small Letter s with
> Acute" (\u015B) gets encoded in UTF-8 as C5 9B or \305\233 in octal.  This
> is one character in the result of getExtraNameCharacters.

Thank you very much for your reply.

Yes, I assumed that this was a one-byte character encoding and I didn't
know that \200 < \305\233 < \377.

I was also assuming (maybe wrongly) that JDBC driver convert java utf-8
characters to ASCII when connecting to postgresql backend. This is why I
counted 191 characters.

I also made a test: "CREATE TABLE A\u015B ( A VARCHAR(30) )"

fails when calling java.sql.Statement.execute(). The error is:
ERROR: syntax error at or near "["

Looking at the postgresql log, I found the statement was translated in
"CREATE TABLE A[ ( A VARCHAR(30))"

Similarly, "CREATE TABLE A\u00C0 ( A VARCHAR(30) )" succeeds.

so maybe postgresql doesn't accept a broad range of utf-8 characters,
while it might accept the ASCII characters from \200 to \377 when it is
a Character.isLetterOrDigit().

May it be something dependant on the database encoding?

Thanks again,
Giuseppe