Thread: 7.0.3 dumps aren't accessible via JDBC in 7.1

7.0.3 dumps aren't accessible via JDBC in 7.1

From
pgsql-bugs@postgresql.org
Date:
Rainer Mager (rmager@vgkk.com) reports a bug with a severity of 1
The lower the number the more severe it is.

Short Description
7.0.3 dumps aren't accessible via JDBC in 7.1

Long Description
If the character '\255' exists in a 7.0.3 dump then JDBC barfs on reading this character in 7.1. Apparently this
characteris a dash character in unicode (not UTF-8), 0x00ad. The problem is that it is getting dumped (and restored) as
asingle byte and when JDBC reads it as 0xad it expects another byte after it (as according to the UTF-8 spec, anything
over0x7f must have another byte). 

Sample Code


No file was uploaded with this report

Re: 7.0.3 dumps aren't accessible via JDBC in 7.1

From
Peter T Mount
Date:
Quoting pgsql-bugs@postgresql.org:

> Rainer Mager (rmager@vgkk.com) reports a bug with a severity of 1
> The lower the number the more severe it is.
>
> Short Description
> 7.0.3 dumps aren't accessible via JDBC in 7.1

I'd say this is not JDBC as JDBC doesn't deal with dumps, but...

> Long Description
> If the character '\255' exists in a 7.0.3 dump then JDBC barfs on
> reading this character in 7.1. Apparently this character is a dash
> character in unicode (not UTF-8), 0x00ad. The problem is that it is
> getting dumped (and restored) as a single byte and when JDBC reads it as
> 0xad it expects another byte after it (as according to the UTF-8 spec,
> anything over 0x7f must have another byte).

Hmmm, this sounds like either a backend issue, or something is misconfigured.
Have you got unicode support enabled in the backend?

Peter

--
Peter Mount peter@retep.org.uk
PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/
RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/

RE: 7.0.3 dumps aren't accessible via JDBC in 7.1

From
"Rainer Mager"
Date:
Hi Peter and all,

    I may have described this poorly, let me try again.

1. We have a Unicode database that has a particular dash character in it
that gets dumped incorrectly. When dumped (from 7.0.x) the dash becomes the
character 0xAD but is not properly encoded in UTF-8 (at least my limited
knowledge of UTF-8 says so). My understanding is that all characters above
0x7F should be encoded but this particular character is not encoded/escaped
at all in the dump.

2. The given dump can be imported into 7.1 without visible problem to 7.1.
The character in question can be viewed via queries in PSQL. The only time
that there is a problem is when the character is accessed via JDBC. If byte
array returned from the backend to the JDBC driver is supposed to be UTF-8
format but (I believe) is incorrectly formatted. The character in question
is, still, by itself as 0xAD and this is not valid UTF-8. When the Java
UTF-8 to internal Unicode converter hits this character it dies and the
resulting string is truncated right before the character. I think there may
be a bug in Java that an InvalidEncoding exception isn't thrown but
nonetheless the bytes aren't valid UTF-8.

3. My guess is that some part of Postgres' UTF-8 conversion routines are
wrong. I looked at the code but couldn't find the relavant parts.


I hope this is clearer.


Thanks,

--Rainer


> -----Original Message-----
> From: Peter T Mount [mailto:peter@retep.org.uk]
> Sent: Friday, April 20, 2001 8:13 PM
> To: rmager@vgkk.com; pgsql-bugs@postgresql.org; pgsql-bugs@postgresql.org
> Cc: pgsql-bugs@postgresql.org
> Subject: Re: [BUGS] 7.0.3 dumps aren't accessible via JDBC in 7.1
>
>
> Hmmm, this sounds like either a backend issue, or something is
> misconfigured.
> Have you got unicode support enabled in the backend?
>
> Peter
>
> --
> Peter Mount peter@retep.org.uk
> PostgreSQL JDBC Driver: http://www.retep.org.uk/postgres/
> RetepPDF PDF library for Java: http://www.retep.org.uk/pdf/