Thread: TypeInfoCache

TypeInfoCache

From

Daniel Migowski

Date:

19 December 2007, 10:38:40

Hello dear developers,

I just updated the TypeInfoCache to return LONGVARCHAR for the
PostgreSQL Type text. Currently VARCHAR(0) is returned, which seems not
correct in my eyes, and confuses some major software packets like
Chrystal Reports.

Patch attached,
with best regards,
Daniel Migowski
--

 |¯¯|¯¯|    *IKOffice GmbH             Daniel Migowski*
 |  |  |/|                            Mail: dmigowski ÄT ikoffice.de
 |  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52
 |  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55
 |__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76

            Geschäftsführer: Ingo Kuhlmann, Daniel Migowski
            Amtsgericht Oldenburg, HRB 201467
            Steuernummer: 64/211/01864

Index: TypeInfoCache.java
===================================================================
RCS file: /cvsroot/jdbc/pgjdbc/org/postgresql/jdbc2/TypeInfoCache.java,v
retrieving revision 1.11
diff -u -r1.11 TypeInfoCache.java
--- TypeInfoCache.java    2 Dec 2007 06:48:43 -0000    1.11
+++ TypeInfoCache.java    19 Dec 2007 14:36:22 -0000
@@ -71,7 +71,7 @@
         {"char", new Integer(Oid.CHAR), new Integer(Types.CHAR), "java.lang.String", new Integer(Oid.CHAR_ARRAY)},
         {"bpchar", new Integer(Oid.BPCHAR), new Integer(Types.CHAR), "java.lang.String", new
Integer(Oid.BPCHAR_ARRAY)},
         {"varchar", new Integer(Oid.VARCHAR), new Integer(Types.VARCHAR), "java.lang.String", new
Integer(Oid.VARCHAR_ARRAY)},
-        {"text", new Integer(Oid.TEXT), new Integer(Types.VARCHAR), "java.lang.String", new Integer(Oid.TEXT_ARRAY)},
+        {"text", new Integer(Oid.TEXT), new Integer(Types.LONGVARCHAR), "java.lang.String", new
Integer(Oid.TEXT_ARRAY)},
         {"name", new Integer(Oid.NAME), new Integer(Types.VARCHAR), "java.lang.String", new Integer(Oid.NAME_ARRAY)},
         {"bytea", new Integer(Oid.BYTEA), new Integer(Types.BINARY), "[B", new Integer(Oid.BYTEA_ARRAY)},
         {"bool", new Integer(Oid.BOOL), new Integer(Types.BIT), "java.lang.Boolean", new Integer(Oid.BOOL_ARRAY)},

Re: TypeInfoCache

From

Gregory Stark

Date:

19 December 2007, 11:17:55

"Daniel Migowski" <dmigowski@ikoffice.de> writes:

> Hello dear developers,
>
> I just updated the TypeInfoCache to return LONGVARCHAR for the PostgreSQL Type
> text. Currently VARCHAR(0) is returned, which seems not correct in my eyes, and
> confuses some major software packets like Chrystal Reports.

Why?

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: TypeInfoCache

From

Daniel Migowski

Date:

19 December 2007, 11:34:44

Gregory Stark schrieb:
> "Daniel Migowski" <dmigowski@ikoffice.de> writes:
>
>> Hello dear developers,
>>
>> I just updated the TypeInfoCache to return LONGVARCHAR for the PostgreSQL Type
>> text. Currently VARCHAR(0) is returned, which seems not correct in my eyes, and
>> confuses some major software packets like Chrystal Reports.
>>
>
> Why?
Because VARCHAR (in my understanding) has some limit, like 256 or 50 or
even 8192, whatever. LONGVARCHAR is unlimited as I understand and much
better matches what i understood what "text" is for. While looking at
the JDBC specification "JSR-000221 JDBC 4.0 Final Release Evaluation",
they didn't make a clear statement on this, but AFAIK varchar always
carries a length with it. "text" does not, so its a LONGVARCHAR. It
doesn't matters, because all functions to get the data are equally
defined for both datatypes (spec page 199, table B-6).

At all, the driver returns a length of -1 for the "text" type, since its
no varchar type, and software using the metadata might expect to get
something >0 for the length of a VARCHAR, like defined in the SQL spec
(don't know the page there, but the postgresql handbook says, varchar
without a parameter is a postgres extension).

With best regards,
Daniel Migowski
--

 |¯¯|¯¯|    *IKOffice GmbH             Daniel Migowski*
 |  |  |/|                            Mail: dmigowski@ikoffice.de <mailto:dmigowski@ikoffice.de>
 |  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52
 |  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55
 |__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76

            Geschäftsführer: Ingo Kuhlmann, Daniel Migowski
            Amtsgericht Oldenburg, HRB 201467
            Steuernummer: 64/211/01864

Re: TypeInfoCache

From

Daniel Migowski

Date:

19 December 2007, 11:40:59

Daniel Migowski schrieb: <blockquote cite="mid:47693B10.3040804@ikoffice.de" type="cite">Gregory Stark schrieb: <br
/><blockquotetype="cite">"Daniel Migowski" <a class="moz-txt-link-rfc2396E"
href="mailto:dmigowski@ikoffice.de"><dmigowski@ikoffice.de></a>writes: <br />   <blockquote type="cite">Hello
deardevelopers, <br /><br /> I just updated the TypeInfoCache to return LONGVARCHAR for the PostgreSQL Type <br />
text.Currently VARCHAR(0) is returned, which seems not correct in my eyes, and <br /> confuses some major software
packetslike Chrystal Reports. <br />     </blockquote><br /> Why? <br /></blockquote> Because VARCHAR (in my
understanding)has some limit, like 256 or 50 or even 8192, whatever. LONGVARCHAR is unlimited as I understand and much
bettermatches what i understood what "text" is for. While looking at the JDBC specification "JSR-000221 JDBC 4.0 Final
ReleaseEvaluation", they didn't make a clear statement on this, but AFAIK varchar always carries a length with it.
"text"does not, so its a LONGVARCHAR. It doesn't matters, because all functions to get the data are equally defined for
bothdatatypes (spec page 199, table B-6). <br /><br /> At all, the driver returns a length of -1 for the "text" type,
sinceits no varchar type, and software using the metadata might expect to get something >0 for the length of a
VARCHAR,like defined in the SQL spec (don't know the page there, but the postgresql handbook says, varchar without a
parameteris a postgres extension). <br /><br /> With best regards, <br /> Daniel Migowski <br /></blockquote> Btw, a
varcharwithout length in PostgreSQL should be treated as LONGVARCHAR, too, but i didn't implement that for now. <br
/><br/> With best regards,<br /> Daniel Migowski<br /><div class="moz-signature">-- <br /><pre> |¯¯|¯¯|    <b>IKOffice
GmbH            Daniel Migowski</b>|  |  |/|                            Mail: <a
href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>| | // |  Nordstr. 10               Tel.: +49 (441) 21 98
8952|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  <a
href="http://www.ikoffice.de">http://www.ikoffice.de</a>   Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo
Kuhlmann,Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864</pre></div>

Re: TypeInfoCache

From

Daniel Migowski

Date:

19 December 2007, 11:46:23

Daniel Migowski schrieb: <blockquote cite="mid:47693B10.3040804@ikoffice.de" type="cite">Gregory Stark schrieb: <br
/><blockquotetype="cite">"Daniel Migowski" <a class="moz-txt-link-rfc2396E"
href="mailto:dmigowski@ikoffice.de"><dmigowski@ikoffice.de></a>writes: <br />   <blockquote type="cite">Hello
deardevelopers, <br /><br /> I just updated the TypeInfoCache to return LONGVARCHAR for the PostgreSQL Type <br />
text.Currently VARCHAR(0) is returned, which seems not correct in my eyes, and <br /> confuses some major software
packetslike Chrystal Reports. <br />     </blockquote><br /> Why? <br /></blockquote> Because VARCHAR (in my
understanding)has some limit, like 256 or 50 or even 8192, whatever. LONGVARCHAR is unlimited as I understand and much
bettermatches what i understood what "text" is for. While looking at the JDBC specification "JSR-000221 JDBC 4.0 Final
ReleaseEvaluation", they didn't make a clear statement on this, but AFAIK varchar always carries a length with it.
"text"does not, so its a LONGVARCHAR. It doesn't matters, because all functions to get the data are equally defined for
bothdatatypes (spec page 199, table B-6). <br /><br /> At all, the driver returns a length of -1 for the "text" type,
sinceits no varchar type, and software using the metadata might expect to get something >0 for the length of a
VARCHAR,like defined in the SQL spec (don't know the page there, but the postgresql handbook says, varchar without a
parameteris a postgres extension). <br /><br /> With best regards, <br /> Daniel Migowski <br /></blockquote> See <a
class="moz-txt-link-freetext"
href="http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html">http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html</a>,
Chapter8.9.7, how others map a very large text type. The biggest they support as varchar is 8.000 chars (MS SQL
Server).Everything over that is LONGVARCHAR (and "text" in postgres is up to 1GB).<br /><br /> With best regards,<br />
DanielMigowski<br /><br /><div class="moz-signature">-- <br /><pre> |¯¯|¯¯|    <b>IKOffice GmbH             Daniel
Migowski</b>| |  |/|                            Mail: <a href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>|
| // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98
8955|__|__|\|  <a href="http://www.ikoffice.de">http://www.ikoffice.de</a>    Mob.: +49 (176) 22 31 20 76
Geschäftsführer:Ingo Kuhlmann, Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer:
64/211/01864</pre></div>

Re: TypeInfoCache

From

Gregory Stark

Date:

19 December 2007, 12:11:35

"Daniel Migowski" <dmigowski@ikoffice.de> writes:

>> Why?
>
> Because VARCHAR (in my understanding) has some limit, like 256 or 50 or even
> 8192, whatever.

yes, 2GB, same as text.

> LONGVARCHAR is unlimited as I understand and much better
> matches what i understood what "text" is for.

Well you haven't explained what you understand "text" is for but in Postgres
they can be used pretty much interchangeably.

I think this has come up before, you should check the mail archives. The
problem is that describing "text" as if it's not a simple varchar type of type
confuses other applications into restricting what you can do with it. They
assume it has the kind of restrictions other databases impose.

Generally in Postgres you're probably best off using "text" unless you have
some specific limit you need to impose. In most cases Postgres will silently
cast your varchars to text when necessary but every now and then you might
find a case where it doesn't and fails to use an index or optimize a query
where it could.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's On-Demand Production Tuning

Re: TypeInfoCache

From

Daniel Migowski

Date:

19 December 2007, 14:13:27

Gregory Stark schrieb: <blockquote cite="mid:873atymzpu.fsf@oxford.xeocode.com" type="cite"><pre wrap="">"Daniel
Migowski"<a class="moz-txt-link-rfc2396E" href="mailto:dmigowski@ikoffice.de"><dmigowski@ikoffice.de></a> writes:
</pre><blockquotetype="cite"><blockquote type="cite"><pre wrap="">Why?     </pre></blockquote><pre wrap="">Because
VARCHAR(in my understanding) has some limit, like 256 or 50 or even
 
8192, whatever.   </pre></blockquote><pre wrap="">
yes, 2GB, same as text. </pre></blockquote> I meant the limit you give it. Not the internal one. VARCHAR(50) has a
limitof 50, right? <blockquote cite="mid:873atymzpu.fsf@oxford.xeocode.com" type="cite"><blockquote type="cite"><pre
wrap="">LONGVARCHARis unlimited as I understand and much better
 
matches what i understood what "text" is for.    </pre></blockquote><pre wrap="">
Well you haven't explained what you understand "text" is for but in Postgres
they can be used pretty much interchangeably. </pre></blockquote> I think a VARCHAR(50) and text are not
interchangeable.<br/><blockquote cite="mid:873atymzpu.fsf@oxford.xeocode.com" type="cite"><pre wrap="">
 
I think this has come up before, you should check the mail archives. The
problem is that describing "text" as if it's not a simple varchar type of type
confuses other applications into restricting what you can do with it. They
assume it has the kind of restrictions other databases impose. </pre></blockquote> Which restrictions does an
JDBC-LONGVARCHARimpose? Read the JDBC spec, please, where they say they are interchangeable reagrding all Access
methods?Like in PostgreSQL. But a LONGVARCHAR is IMHO commonly regarded as "very much text", while a VARCHAR(n) is
regardedas "up to n chars" of text.<br /><blockquote cite="mid:873atymzpu.fsf@oxford.xeocode.com" type="cite"><pre
wrap="">Generallyin Postgres you're probably best off using "text" unless you have
 
some specific limit you need to impose. In most cases Postgres will silently
cast your varchars to text when necessary but every now and then you might
find a case where it doesn't and fails to use an index or optimize a query
where it could. </pre></blockquote> This one is new to me. Does this mean even storage is done the same for text and
varchar?Does this mean I could savely convert all my varchar's to text (if my client application accepts this?)<br
/><br/> With best regards,<br /> Daniel Migowski<br /><br /> PS: Now searching the archives...<br /><br /><div
class="moz-signature">--<br /><pre> |¯¯|¯¯|    <b>IKOffice GmbH             Daniel Migowski</b>|  |  |/|
           Mail: <a href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>|  | // |  Nordstr. 10
Tel.:+49 (441) 21 98 89 52|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  <a
href="http://www.ikoffice.de">http://www.ikoffice.de</a>   Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo
Kuhlmann,Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864</pre></div>

Re: TypeInfoCache

From

Oliver Jowett

Date:

19 December 2007, 17:14:55

Daniel Migowski wrote:

> I think a VARCHAR(50) and text are not interchangeable.

But a "varchar" (with no limit) and "text" *are* interchangeable, which
is why we identify text as VARCHAR

-O

Re: TypeInfoCache

From

Tom Lane

Date:

19 December 2007, 21:26:05

Oliver Jowett <oliver@opencloud.com> writes:
> Daniel Migowski wrote:
>> I think a VARCHAR(50) and text are not interchangeable.

> But a "varchar" (with no limit) and "text" *are* interchangeable, which
> is why we identify text as VARCHAR

But note that varchar-with-no-limit is itself a Postgres-ism: it's
not allowed by the standard.

            regards, tom lane

TypeInfoCache

From

"Jan de Visser"

Date:

19 December 2007, 21:39:39

On 12/19/07, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Oliver Jowett <oliver@opencloud.com> writes:
> > Daniel Migowski wrote:
> >> I think a VARCHAR(50) and text are not interchangeable.
>
> > But a "varchar" (with no limit) and "text" *are* interchangeable, which
> > is why we identify text as VARCHAR
>
> But note that varchar-with-no-limit is itself a Postgres-ism: it's
> not allowed by the standard.

I was about to write the same thing. For educational value, this is
what a popular closed-source dbms will tell you:

SQL>  create table a ( a varchar );
 create table a ( a varchar )
                            *
ERROR at line 1:
xxx-00906: missing left parenthesis

SQL> create table a (a varchar(4001));
create table a (a varchar(4001))
                          *
ERROR at line 1:
xxx-00910: specified length too long for its datatype

SQL> create table a (a varchar(4000));

Table created.

That's right, textual data longer than 4000 characters needs to go
through the CLOB interface. And my license doesn't permit me to say
what that does to performance :)

So what Daniel is trying to say here is that Crystal Report probably
croaks on a column for the type is Types.VARCHAR but which does not
have a maximum length associated with it.

I think his patch is good.

jan

Re: TypeInfoCache

From

Kris Jurka

Date:

20 December 2007, 00:41:12

On Wed, 19 Dec 2007, Tom Lane wrote:

> Oliver Jowett <oliver@opencloud.com> writes:
>
>> But a "varchar" (with no limit) and "text" *are* interchangeable, which
>> is why we identify text as VARCHAR
>
> But note that varchar-with-no-limit is itself a Postgres-ism: it's
> not allowed by the standard.
>

So should varchar(10000000) be returned as VARCHAR or LONGVARCHAR?

Right now we return 0 for the precision of text or varchar without length.
Perhaps we should return something else for that similar to how we changed
the result of ResultSetMetaData.getColumnDisplaySize to return
Integer.MAX_VALUE instead of -1 for types without lengths.

Kris Jurka

Re: TypeInfoCache

From

Gregory Stark

Date:

20 December 2007, 06:05:22

"Gregory Stark" <stark@enterprisedb.com> writes:

> "Daniel Migowski" <dmigowski@ikoffice.de> writes:
>
>>> Why?
>>
>> Because VARCHAR (in my understanding) has some limit, like 256 or 50 or even
>> 8192, whatever.
>
> yes, 2GB, same as text.

Oops, sorry, 1GB.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: TypeInfoCache

From

Daniel Migowski

Date:

20 December 2007, 06:11:05

Thank you for your opinions so far. I just looked through the mailing lists to get the points from former discussions
andtry to sum it all up here now:<br /><ul><li>At first, VARCHAR as defined by SQL and as used by all other JDBC
driversalways has a upper limit. LONGVARCHAR is for very large quantities of text, so IMHO returning a "text" or an
unrestricted"varchar" as VARCHAR just breaks the specs and the expectations.<li>It is said by Oliver Jowett, that
"varchar"(with no limit) and "text" <b class="moz-txt-star"><span class="moz-txt-tag">*</span>are<span
class="moz-txt-tag">*</span></b>interchangeable. This is completly true. But then <b>*both*</b> have to be described by
thedriver as LONGVARCHAR, not as VARCHAR. As Tom Lane said, VARCHAR without length is a postgres specific extension.
<li>TheJDBC-Spec recommends for Fieldtypes of type LONGVARCHAR to use the getStream() function to get the data, instead
ofgetString() for VARCHAR. A few years ago the argument was, that an application should use getString() also for "text"
types,since the data is on the heap already in the current driver design. This has some flaws: <br /> 1. The internals
ofthe driver just don't care when programming a well defined interface like JDBC. <br /> 2. An application accessing a
LONGVARCHARmight very well just use the first 100 chars to display a field preview, so a getStream() call might be
completlyreasonable. Note it could be up to one Gig of data.<br /> 3. The drivers stores the data internally as a
byte[]and wrapped it into a string when the getString() method is called. This clones the data anyway, so for very
largedata getString() is even more inperformant than getStream() which pulls only a bit of data. If the application
wantseverything, it will call getString() anyway, regardless of the type... <br /> 4. An application, that knows the
datastored in the db, because is db schema is by the db developer, can also use getString() on a LONGVARCHAR if it
knows,that the "text" field shouldn't very big.<br /><li>A real generic JDBC application will also work with other JDBC
drivers,and will assume all behave the same. As I already stated, this Link (<a
href="http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html">http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html</a>)
at8.9.7 shows what other DB drivers deliver.<br /><li>Someone told a few years ago, that all their data in Postgres is
storedas "text" and that exporting this as LONGVARCHAR would break its application. WTF? He declares unbounded text
fieldsand then his application should handle them.<li>For backwards compatibily we could have users switch from "text"
to"varchar" without bounds, which will also be shown as VARCHAR with my patch. My patch currently jsut affects the
"text"type.<li>The current driver implementation breaks ORM mappers, and any other software, that tries to understand
thedatabase schema. As far as I see, those software completly reasonable relies on a VARCHAR returning a length</ul>
Pleasegive me any good reasons not to apply my patch, with would further improve standards conformance.<br /> With best
regards,<br/><br /> Daniel Migowski<br /><br /><br /><div class="moz-signature">-- <br /><pre> |¯¯|¯¯|    <b>IKOffice
GmbH            Daniel Migowski</b>|  |  |/|                            Mail: <a
href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>| | // |  Nordstr. 10               Tel.: +49 (441) 21 98
8952|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  <a
href="http://www.ikoffice.de">http://www.ikoffice.de</a>   Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo
Kuhlmann,Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864</pre></div>

Re: TypeInfoCache

From

Oliver Jowett

Date:

20 December 2007, 06:43:53

Daniel Migowski wrote:

> Please give me any good reasons not to apply my patch, with would
> further improve standards conformance.

My main concern is that 'text' is a very common type to use in
PostgreSQL based designs, and that JDBC applications are more likely to
understand how to interpret a field that claims to be VARCHAR than one
that is LONGVARCHAR, given that LONGVARCHAR is a relatively strange type
and at best poorly defined.

i.e. - there are likely to be applications out there that depend on the
current behaviour - what are you going to do to support them?

This is the first time that mapping 'text' to LONGVARCHAR has been
suggested, as far as I can recall, so I think your "this breaks ORM
mappers and anything else that tries to understand the database schema"
claim is perhaps a bit of an exaggeration. If it does, where are all the
bug reports?

-O

Re: TypeInfoCache

From

Oliver Jowett

Date:

20 December 2007, 06:48:35

Oliver Jowett wrote:
> Daniel Migowski wrote:
>
>> Please give me any good reasons not to apply my patch, with would
>> further improve standards conformance.
>
> My main concern is that 'text' is a very common type to use in
> PostgreSQL based designs, and that JDBC applications are more likely to
> understand how to interpret a field that claims to be VARCHAR than one
> that is LONGVARCHAR, given that LONGVARCHAR is a relatively strange type
> and at best poorly defined.

Also worth noting that 'text' is commonly used even for very short
strings. Do you really want to hint to JDBC clients that they should be
using the streaming interface and expect to deal with very long strings
in that common case? I suspect (with no hard data to back it up,
admittedly!) that 'text' fields in general use are more commonly under
100 characters than over a megabyte.

-O

Re: TypeInfoCache

From

Gregory Stark

Date:

20 December 2007, 06:51:55

"Daniel Migowski" <dmigowski@ikoffice.de> writes:


You keep sending blank emails.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

Re: TypeInfoCache

From

Kris Jurka

Date:

20 December 2007, 06:53:41

On Thu, 20 Dec 2007, Oliver Jowett wrote:

> My main concern is that 'text' is a very common type to use in PostgreSQL
> based designs, and that JDBC applications are more likely to understand how
> to interpret a field that claims to be VARCHAR than one that is LONGVARCHAR,
> given that LONGVARCHAR is a relatively strange type and at best poorly
> defined.

This is my concern as well, which is why I suggested that changing the
precision value might be a better solution.  Daniel, any opinion on that
alternative?

Kris Jurka

Re: TypeInfoCache

From

Daniel Migowski

Date:

20 December 2007, 06:53:53

[Resend... the first one seemed to be blank for at least Gregory Stark]

Thank you for your opinions so far. I just looked through the mailing lists to get the points from former discussions and try to sum it all up here now:

At first, VARCHAR as defined by SQL and as used by all other JDBC drivers always has a upper limit. LONGVARCHAR is for very large quantities of text, so IMHO returning a "text" or an unrestricted "varchar" as VARCHAR just breaks the specs and the expectations.
It is said by Oliver Jowett, that "varchar" (with no limit) and "text" *are* interchangeable. This is completly true. But then *both* have to be described by the driver as LONGVARCHAR, not as VARCHAR. As Tom Lane said, VARCHAR without length is a postgres specific extension.
The JDBC-Spec recommends for Fieldtypes of type LONGVARCHAR to use the getStream() function to get the data, instead of getString() for VARCHAR. A few years ago the argument was, that an application should use getString() also for "text" types, since the data is on the heap already in the current driver design. This has some flaws:
1. The internals of the driver just don't care when programming a well defined interface like JDBC.
2. An application accessing a LONGVARCHAR might very well just use the first 100 chars to display a field preview, so a getStream() call might be completly reasonable. Note it could be up to one Gig of data.
3. The drivers stores the data internally as a byte[] and wrapped it into a string when the getString() method is called. This clones the data anyway, so for very large data getString() is even more inperformant than getStream() which pulls only a bit of data. If the application wants everything, it will call getString() anyway, regardless of the type...
4. An application, that knows the data stored in the db, because is db schema is by the db developer, can also use getString() on a LONGVARCHAR if it knows, that the "text" field shouldn't very big.
A real generic JDBC application will also work with other JDBC drivers, and will assume all behave the same. As I already stated, this Link (http://java.sun.com/j2se/1.3/docs/guide/jdbc/getstart/mapping.html) at 8.9.7 shows what other DB drivers deliver.
Someone told a few years ago, that all their data in Postgres is stored as "text" and that exporting this as LONGVARCHAR would break its application. WTF? He declares unbounded text fields and then his application should handle them.
For backwards compatibily we could have users switch from "text" to "varchar" without bounds, which will also be shown as VARCHAR with my patch. My patch currently jsut affects the "text" type.
The current driver implementation breaks ORM mappers, and any other software, that tries to understand the database schema. As far as I see, those software completly reasonable relies on a VARCHAR returning a length

Please give me any good reasons not to apply my patch, with would further improve standards conformance.
With best regards,

Daniel Migowski

 |¯¯|¯¯|    IKOffice GmbH             Daniel Migowski|  |  |/|                            Mail: dmigowski@ikoffice.de|  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo Kuhlmann, Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864

Re: TypeInfoCache

From

Kris Jurka

Date:

20 December 2007, 06:55:04


On Thu, 20 Dec 2007, Gregory Stark wrote:

> You keep sending blank emails.
>

Just HTML only emails.

Kris Jurka

Re: TypeInfoCache

From

Daniel Migowski

Date:

20 December 2007, 07:09:04

Hello Kris, hello Oliver,<br /><br /> Kris Jurka schrieb:<br /><blockquote
cite="mid:Pine.BSO.4.64.0712200547450.15638@leary.csoft.net"type="cite"><blockquote type="cite">My main concern is that
'text'is a very common type to use in PostgreSQL based designs, and that JDBC applications are more likely to
understandhow to interpret a field that claims to be VARCHAR than one that is LONGVARCHAR, given that LONGVARCHAR is a
relativelystrange type and at best poorly defined. <br /></blockquote> This is my concern as well, which is why I
suggestedthat changing the precision value might be a better solution.  Daniel, any opinion on that alternative? <br
/></blockquote>I thought about this, too. In this case we deliver a VARCHAR(1073741824) for every "text". I think about
thisas being very strange alternative to just saying LONGVARCHAR. Also because of my argument of better using
getStream()for big text fields IS a good alternative. <br /><br /> Okey, how if we give this as an connection parameter
tothe driver? describeUnboundTextAs=VARCHAR(1073741824) (default, current behaviour) vs. showTextAs=LONGVARCHAR. This
shouldbe extended to show postgres varchar(unbound) in the same type. <br /><br /> IMHO should LONGVARCHAR be the
default.If the sql/jdbc spec leaves it relatively undefined, the application (which is generic jdbc and could also use
otherdbs at the backend), should handle those as relatively undefined, too. And IF the application knows about postgres
atthe backend, then there is no problem with LONGVARCHAR in the description, too.<br /><br /> Best regards,<br />
DanielMigowski<br /><br /><br /><div class="moz-signature">-- <br /><pre> |¯¯|¯¯|    <b>IKOffice GmbH
DanielMigowski</b>|  |  |/|                            Mail: <a
href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>| | // |  Nordstr. 10               Tel.: +49 (441) 21 98
8952|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  <a
href="http://www.ikoffice.de">http://www.ikoffice.de</a>   Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo
Kuhlmann,Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864</pre></div>

Re: TypeInfoCache

From

Daniel Migowski

Date:

20 December 2007, 08:35:39

Kris Jurka schrieb:

On Thu, 20 Dec 2007, Oliver Jowett wrote:
My main concern is that 'text' is a very common type to use in PostgreSQL based designs, and that JDBC applications are more likely to understand how to interpret a field that claims to be VARCHAR than one that is LONGVARCHAR, given that LONGVARCHAR is a relatively strange type and at best poorly defined.
This is my concern as well, which is why I suggested that changing the precision value might be a better solution. Daniel, any opinion on that alternative?

Btw., you say that "JDBC applications" are more likely to understand how to interpret a VARCHAR field, than an LONGVARCHAR field. Which? I don't know any.And currently a JDBC application (Chrystal Reports) broke, because we did't send a LONGVARCHAR! Every JDBC application should be able to handle both, since JDBC defines both. And, most important, both types can, according to JDBC spec, treated equally (same access functions,etc.). So we shouldn't create a broken driver for applications, that _might_ be broken. And if they are, a parameter option should be a fair deal to give to the driver, to let bad behaving applications work. We don't have to stick to bugs just for backwards compatibility, don't we? We are not Microsoft, i think.

With best regards,
Daniel Migowski

 |¯¯|¯¯|    IKOffice GmbH             Daniel Migowski|  |  |/|                            Mail: dmigowski@ikoffice.de|  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo Kuhlmann, Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864

Re: TypeInfoCache

From

Tom Lane

Date:

20 December 2007, 11:38:57

Daniel Migowski <dmigowski@ikoffice.de> writes:
>     * At first, VARCHAR as defined by SQL and as used by all other JDBC
>       drivers always has a upper limit. LONGVARCHAR is for very large
>       quantities of text, so IMHO returning a "text" or an unrestricted
>       "varchar" as VARCHAR just breaks the specs and the expectations.

Unfortunately, LONGVARCHAR is no more standard than TEXT, at least
as far as the non-JDBC world is concerned.

I concur with the complaints that LONGVARCHAR is likely to prompt
applications to do things that might be enormously inefficient overkill
for typical-size fields.  If the driver had a way to know which fields
are likely to be wide, it'd be OK to translate them to LONGVARCHAR,
but I'm dubious about doing that for text fields in general.

            regards, tom lane

Re: TypeInfoCache

From

Daniel Migowski

Date:

20 December 2007, 11:51:54

Hello Tom,<br /><br /> Tom Lane schrieb: <blockquote cite="mid:11679.1198165116@sss.pgh.pa.us" type="cite"><pre
wrap="">DanielMigowski <a class="moz-txt-link-rfc2396E"
href="mailto:dmigowski@ikoffice.de"><dmigowski@ikoffice.de></a>writes: </pre><blockquote type="cite"><pre
wrap="">   * At first, VARCHAR as defined by SQL and as used by all other JDBC     drivers always has a upper limit.
LONGVARCHARis for very large     quantities of text, so IMHO returning a "text" or an unrestricted     "varchar" as
VARCHARjust breaks the specs and the expectations.   </pre></blockquote><pre wrap="">
 
Unfortunately, LONGVARCHAR is no more standard than TEXT, at least
as far as the non-JDBC world is concerned.

I concur with the complaints that LONGVARCHAR is likely to prompt
applications to do things that might be enormously inefficient overkill
for typical-size fields.  If the driver had a way to know which fields
are likely to be wide, it'd be OK to translate them to LONGVARCHAR,
but I'm dubious about doing that for text fields in general.
        regards, tom lane</pre></blockquote> Okey, we have 3 Scenarios:<br /><ol><li>Scenario 1: A JDBC conformant
applicationtrying to use JDBC to do arbitrary things with arbitrary databases and artbitrary JDBC drivers (Like some
generalJDBC database manager). This application indeed has to use getStream() for LONGVARCHAR fields, because it cannot
knowabout the size of the contents. But there is a difference in what the application wants to do with that stream:<br
/><br/><ol><li>If it wants to get all the contents, it will call getString() anyway, playing the loop into the
driver-<li>Ifit wants to show a bit of the content, it will call getStream(), and request e.g. just the first 100
chars.</ol></ol><blockquote>Ineither case it is BETTER to deliver a LONGVARCHAR because now the application can DECIDE
whatto do. For VARCHAR it will always use getString(), because of the spec.<br /></blockquote><ol
start="2"><li>Scenario2: The application either knows about the data itself, because it provided DDL. Or the
applicationhas to work with different DBs and different Drivers but is the same application, and just has some
O/R-Mappingbelow its feet. Every other DBMS behaves by returning LONGVARCHAR, only for PostgreSQL this application has
towrite a workaround and to use the VARCHAR.</ol><ol start="3"><li>Scenario 3: Specialized Java Application with one
driverand one postgres db knowing exactly about the backend and just using JDBC as an abstraction. This app can easily
workaround the LONGVARCHAR because it knows the containing data anyway.</ol> Sorry, but no Scenario profits from the
decisionof not returning LONGVARCHAR. Does someone have any scenario that does?<br /><br /> With best regards,<br />
DanielMigowski<br /><br /><br /><br /><div class="moz-signature">-- <br /><pre> |¯¯|¯¯|    <b>IKOffice GmbH
DanielMigowski</b>|  |  |/|                            Mail: <a
href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>| | // |  Nordstr. 10               Tel.: +49 (441) 21 98
8952|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  <a
href="http://www.ikoffice.de">http://www.ikoffice.de</a>   Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo
Kuhlmann,Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864</pre></div>

Re: TypeInfoCache

From

danap@ttc-cmc.net

Date:

20 December 2007, 12:15:25

I'm trying to create a generic interface for multiple databases.
Please don't ask me why I'm doing such a stupid thing, but I
just know that this change you guys are talking about is going
to break it. Certain data types need to be handled in special
ways in the application. No big deal I can accommodate the change,
but I'm really already baffled by some of the deviations on data
types in PostgreSQL. I would prefer adhering as much as possible
to standards in SQL and the SUN JDBC. If this changes accomplish
this so be it.

MyJSQLView Project Manager
Dana Proctor

Re: TypeInfoCache

From

Oliver Jowett

Date:

20 December 2007, 17:08:39

danap@ttc-cmc.net wrote:

> but I'm really already baffled by some of the deviations on data
> types in PostgreSQL.

Can you send through some examples? That's exactly the sort of feedback
we need to be able to improve the driver.

> I would prefer adhering as much as possible
> to standards in SQL and the SUN JDBC. If this changes accomplish
> this so be it.

It's arguable whether it will or not, given that 'text' isn't a standard
type anyway.

-O

Re: TypeInfoCache

From

dmp

Date:

21 December 2007, 16:21:24

This is the standard output I obtain from a database about
its data types. I use the defined data types from manuals. This
is important to the MyJSQLView program because if a database
defines a data type I need to assume that a user somewhere
will use it so I have to be able to handle it. This seems sensible
from the perspective of a generic database GUI access too. I
admit two areas I have not addressed yet in the current version
of the application are user defined and arrays. Noticed the asterisked
rows. These are data types that are specific to PostgreSQL. Granted
each database is going to have a few of these. Notice the returned
Class & Type. The 'object' class returned are fine for the Network
Address types, since the Sun Java JDBC defines methods for objects
in the getter/setter methods, but the the Interval and Geometric data
types return specific types that are not defined by the Sun Java JDBC,
'PGxxx'. I have learned how to properly handle these types without
importing them from the org.postgresql driver, but I would call them
deviants. By the way importing such deviants from a database driver
in a generic GUI is not particular acceptable. Maybe I'm missing
something there; Anyway I would think that standard data types should
really relate more to hardware and the the specifics of the nature of
program's, in this case database, requirements. Example blob/bytea.
This is a obvious specific to a database program and so therefore should
be an extension to the standard data types. If points and circles are going
to be added is then next perhaps mole?

danap

-----------------------
PostgreSQL 8.2.5

ColumnName, Modified ColumnName, ColumnClassName, ColumnClassType,
ColumnPreferedSize

1 data_type_id Data Type Id java.lang.Integer int4 11
2 smallint_type Smallint Type java.lang.Integer int2 6
3 int_type Int Type java.lang.Integer int4 11
4 bigint_type Bigint Type java.lang.Long int8 20
5 decimal_type Decimal Type java.math.BigDecimal numeric 18
6 numeric_type Numeric Type java.math.BigDecimal numeric 12
7 real_type Real Type java.lang.Float float4 14
8 doubleprecision_type Doubleprecision Type java.lang.Double float8 24
9 serial_type Serial Type java.lang.Integer int4 11
10 bigserial_type Bigserial Type java.lang.Long int8 20
11 varchar_type Varchar Type java.lang.String varchar 30
12 char_type Char Type java.lang.String bpchar 30
13 text_type Text Type java.lang.String text 2147483647
14 bytea_type Bytea Type [B bytea 2147483647
15 date_type Date Type java.sql.Date date 13
16 time_type Time Type java.sql.Time time 15
17 timetz_type Timetz Type java.sql.Time timetz 21
18 timestamp_type Timestamp Type java.sql.Timestamp timestamp 29
19 timestamptz_type Timestamptz Type java.sql.Timestamp timestamptz 35
*20 interval_type Interval Type org.postgresql.util.PGInterval interval 49
21 boolean_type Boolean Type java.lang.Boolean bool 1
*22 point_type Point Type org.postgresql.geometric.PGpoint point 2147483647
*23 linesegment_type Linesegment Type org.postgresql.geometric.PGlseg
lseg 2147483647
*24 box_type Box Type org.postgresql.geometric.PGbox box 2147483647
*25 path_type Path Type org.postgresql.geometric.PGpath path 2147483647
*26 polygon_type Polygon Type org.postgresql.geometric.PGpolygon polygon
2147483647
*27 circle_type Circle Type org.postgresql.geometric.PGcircle circle
2147483647
*28 cidr_type Cidr Type java.lang.Object cidr 2147483647
*29 inet_type Inet Type java.lang.Object inet 2147483647
*30 macaddr_type Macaddr Type java.lang.Object macaddr 2147483647
31 bit2_type Bit2 Type java.lang.Boolean bit 2
32 bitvarying2_type Bitvarying2 Type java.lang.Object varbit 5

Re: TypeInfoCache

From

Oliver Jowett

Date:

21 December 2007, 17:36:48

dmp wrote:

> Noticed the asterisked
> rows. These are data types that are specific to PostgreSQL. Granted
> each database is going to have a few of these. Notice the returned
> Class & Type. The 'object' class returned are fine for the Network
> Address types, since the Sun Java JDBC defines methods for objects
> in the getter/setter methods, but the the Interval and Geometric data
> types return specific types that are not defined by the Sun Java JDBC,
> 'PGxxx'. I have learned how to properly handle these types without
> importing them from the org.postgresql driver, but I would call them
> deviants. By the way importing such deviants from a database driver
> in a generic GUI is not particular acceptable.

Can't you just use getString() or getObject() on any type you don't
recognize and let the driver deal with any necessary casting?

> Maybe I'm missing
> something there; Anyway I would think that standard data types should
> really relate more to hardware and the the specifics of the nature of
> program's, in this case database, requirements. Example blob/bytea.
> This is a obvious specific to a database program and so therefore should
> be an extension to the standard data types. If points and circles are going
> to be added is then next perhaps mole?

They're mappings of standard PostgreSQL types, and they use driver
infrastructure that allows adding of specific support for any
user-defined type that happens to get installed in the DB (type
extensibility is a fairly big thing in PostgreSQL, why would we ignore
it?). The geometric types are quite a bit more than just another data
structure, there are various "spatial" indexes and queries you can run
against them. Same is generally true of other custom types, they have
their own indexing and operator behaviour which is why they're there in
the first place.

If you don't want to use the type the driver implements you don't have
to, getString() still works just fine (or somewhat more indirectly -
getObject() -> I don't recognize the returned object -> call toString()
on it)

How would you rather have the driver handle these types, if not in this way?

-O

Re: TypeInfoCache

From

dmp

Date:

22 December 2007, 17:08:04

dmp wrote:

 > Noticed the asterisked
 > rows. These are data types that are specific to PostgreSQL. Granted
 > each database is going to have a few of these. Notice the returned
 > Class & Type. The 'object' class returned are fine for the Network
 > Address types, since the Sun Java JDBC defines methods for objects
 > in the getter/setter methods, but the the Interval and Geometric data
 > types return specific types that are not defined by the Sun Java JDBC,
 > 'PGxxx'. I have learned how to properly handle these types without
 > importing them from the org.postgresql driver, but I would call them
 > deviants. By the way importing such deviants from a database driver
 > in a generic GUI is not particular acceptable.

 > Can't you just use getString() or getObject() on any type you don't
 >recognize and let the driver deal with any necessary casting?

Yes, and this is exactly what the application does. The same is not true of
the setter methods. When I first created the TableEntryForm it did not pay
attention to any of the data types, but obvious ones like date and time.
The form would just use setObject(). Later it was decided to do some checks
on data before it was submitted to the database, for instance numeric
characters were sent for integers. This then involved checking and using
the appropriate java.sql.Type setter for the type. Any types not caught
would
drop through and use setObject. With the PostgreSQL JDBC driver this does
not work. You have to use the appropriate setter, unfortunately the standard
JDBC does not define the interval, network, or geometric types. So you have
to CAST or import the type from the postgreSQL driver. Since the MyJSQLView
program also supports MySQL the program would then need the postgreSQL
driver to function even if a user only wanted to use MySQL. This creates a
dependency outside of the standard JDBC. Well a CAST was the solution, but
still the application has to catch all defined data types and not let
them drop through
to a default setObject().

 > Maybe I'm missing
 > something there; Anyway I would think that standard data types should
 > really relate more to hardware and the the specifics of the nature of
 > program's, in this case database, requirements. Example blob/bytea.
 > This is a obvious specific to a database program and so therefore should
 > be an extension to the standard data types. If points and circles are
going
 > to be added is then next perhaps mole?

 > They're mappings of standard PostgreSQL types, and they use driver
 > infrastructure that allows adding of specific support for any
 > user-defined type that happens to get installed in the DB (type
 > extensibility is a fairly big thing in PostgreSQL, why would we ignore
 > it?).

I'm sorry I can not really answer this. All you have done is state a
reality.
We already know PostgreSQL is calling these standard types and or course
if they are then the driver I guess would then support them. If you call
them
one of the standard types then again yes users can use them for a basis for
building more complex user-defined types. Extendibility is an admirable
feature for a program, I guess my argument here is that maybe it should be
limited with regard to data types. I have never seen in my limited
experience
a programing language that defined a point as a data type. In reality it
is a
concept that is composed of data types, real/integer numbers. Yes it is nice
to have these predisposed types, but a user could of created a point through
a user-defined type just like an example I saw in the manual regarding a
complex type.

 > The geometric types are quite a bit more than just another data
 > structure, there are various "spatial" indexes and queries you can run
 > against them. Same is generally true of other custom types, they have
 > their own indexing and operator behaviour which is why they're there in
 > the first place.

Now were are getting somewhere. If the internals of the database are using
these geometric types in the algorithms for indexes and queries then perhaps
they are justified as some form of data structure. Notice that I have quoted
you from above, you are now even calling them data structures, and not
types. If this is the case then it might be understandable in defining them
as available 'data types'. If this is not the case then you are again
just making
a statement in support of providing the ability to use something that was
created. If you create a type then obviously you have to provide indexing
and operators for it, the latter does not justify its existence.

 > If you don't want to use the type the driver implements you don't have
 > to, getString() still works just fine (or somewhat more indirectly -
 > getObject() -> I don't recognize the returned object -> call toString()
 >on it)

I already answered this.

 > How would you rather have the driver handle these types, if not in
this way?

I would rather not have to CAST or import non-standard types which are not
defined in the Sun JDBC. If you wish to have them at least make the driver
except a generic string setter method like the getter. Which is where we
started
with this thread in which I think was done. I have been busy trying to get a
PostgreSQL export working so I have not checked. Anyway I think I'm happy,
how about everyone else? I really do appreciate the support/information
this forum
and the others PostgreSQL provides. Thanks guys/gals.

danap.

Re: TypeInfoCache

From

Oliver Jowett

Date:

22 December 2007, 17:28:45

dmp wrote:

> I would rather not have to CAST or import non-standard types which are not
> defined in the Sun JDBC. If you wish to have them at least make the driver
> except a generic string setter method like the getter.

You can now do this in CVS HEAD by using setObject(n, someStringObject,
Types.OTHER). Or you can enable it globally for all strings by using
"stringtype=unspecified" at the cost of less type safety (which I
wouldn't recommend)

Normally the meaning of setString() is pretty explicit, though - it says
that you want the target type to be text or varchar. If the underlying
database type is something different, I don't think it's unreasonable to
require you to either use a different method, or put a cast in your query.

-O

Re: TypeInfoCache

From

Daniel Migowski

Date:

27 December 2007, 04:40:15

Hello, dear JDBC mailing list members,

At first I hope you all had a merry christmas. Regarding the patch I
sent, another important argument is this one: I createt a benchmark for
getCharacterStream vs. getStream running on a table with a varchar(50)
and a text column, each containing 50 chars of data.  These are the
benchmarks for the differences of calling 1000 times the given function
on a Core2Duo 2Ghz and JDK1.5:

    getString on text_col took 8.27384ms
    getCharacterStream on text_col took 14.26489ms
    Factor: 1.724095462324628
    getString on varchar_col took 9.57153ms
    getCharacterStream on varchar_col took 12.76733ms
    Factor: 1.3338860140437319

Please note only the getString() vs.
getCharacterStream()+creatingAString are benchmarked, not the DB
connection itself (which is the real bottleneck). And please note only
those applications that really use getCharacterStream are a bit slower,
but as I explained in my previous mails, they do it for a good reason.

Conclusion: There is no real performance loss even in the worst case
(which should never occur anyway), so please accept my patch. I wish you
all a happy new year!

With best regards,
Daniel Migowski

package de.ikoffice.jdbc;

import java.io.Reader;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

public class StringVsStreamTest {

    public static void main(String args[]) throws Exception {
        long nanos; int count;
        char[] buffer = new char[1000];
        long tempNanos1;
        long tempNanos2;

        Class.forName("org.postgresql.Driver");

        Connection c = DriverManager.getConnection("jdbc:postgresql://192.168.1.3:5432/Test","ik","ik0000");
        Statement s = c.createStatement();
        ResultSet r = s.executeQuery("SELECT * FROM tbl_test");
        r.next();

        for( int j = 0; j < 10; j++ ) {
            // Ignore the times for the first run to be sure no JIT compilation slows us down.
            for( int i=0; i<1000; i++ ) {
                String string = r.getString("varchar_col");
            }
            for( int i=0; i<1000; i++ ) {
                Reader reader = r.getCharacterStream("varchar_col");
                count = reader.read(buffer, 0, 1000);
                String string = new String(buffer,0,count);
            }
            for( int i=0; i<1000; i++ ) {
                String string = r.getString("text_col");
            }
            for( int i=0; i<1000; i++ ) {
                Reader reader = r.getCharacterStream("text_col");
                count = reader.read(buffer, 0, 1000);
                String string = new String(buffer,0,count);
            }
        }

        // Now do the real measurement.
        nanos = System.nanoTime();
        for( int i=0; i<1000; i++ ) {
            String string = r.getString("text_col");
        }
        tempNanos1 = System.nanoTime()-nanos;
        System.out.println("getString on text_col took "+((tempNanos1)/100000d)+"ms");

        nanos = System.nanoTime();
        for( int i=0; i<1000; i++ ) {
            Reader reader = r.getCharacterStream("text_col");
            count = reader.read(buffer, 0, 1000);
            String string = new String(buffer,0,count);
        }
        tempNanos2 = System.nanoTime()-nanos;
        System.out.println("getCharacterStream on text_col took "+((tempNanos2)/100000d)+"ms");

        System.out.println("Factor: " + (tempNanos2*1d/tempNanos1));

        nanos = System.nanoTime();
        for( int i=0; i<1000; i++ ) {
            String string = r.getString("varchar_col");
        }
        tempNanos1 = System.nanoTime()-nanos;
        System.out.println("getString on varchar_col took "+((tempNanos1)/100000d)+"ms");

        nanos = System.nanoTime();
        for( int i=0; i<1000; i++ ) {
            Reader reader = r.getCharacterStream("varchar_col");
            count = reader.read(buffer, 0, 1000);
            String string = new String(buffer,0,count);
        }
        tempNanos2 = System.nanoTime()-nanos;
        System.out.println("getCharacterStream on varchar_col took "+((tempNanos2)/100000d)+"ms");

        System.out.println("Factor: " + (tempNanos2*1d/tempNanos1));

    }

}

Re: TypeInfoCache

From

Oliver Jowett

Date:

27 December 2007, 05:02:14

Daniel Migowski wrote:

>    getString on text_col took 8.27384ms
>    getCharacterStream on text_col took 14.26489ms
>    Factor: 1.724095462324628
>    getString on varchar_col took 9.57153ms
>    getCharacterStream on varchar_col took 12.76733ms
>    Factor: 1.3338860140437319

> Conclusion: There is no real performance loss even in the worst case
> (which should never occur anyway), so please accept my patch. I wish you
> all a happy new year!

I'm confused. Don't your numbers show a 30%-70% slowdown when using
getCharacterStream()?

Also some quick notes about microbenchmarks:

- 1000 iterations isn't really enough to get useful numbers, try 20,000+
for JIT effects or considerably more for GC effects
- server vs. client JVM can have quite different behaviours, you should
check both
- put your inner benchmark loops in a separate method as some JIT has
method-level granularity
- benchmark the same code path in your warmup loop as you actually
measure the numbers from. In your code, you're benchmarking a different
code path to the one you run in your warmup loop.

-O

Re: TypeInfoCache

From

Daniel Migowski

Date:

07 January 2008, 12:36:04

Hello, together,

some news or comments about my patch? Will it be accepted or am i doomed
with ignorance, now ;)

With best regards,
Daniel Migowski

Re: TypeInfoCache

From

Daniel Migowski

Date:

07 January 2008, 12:38:42

Sorry, somehow most [JDBC] mails managed to get into the spam filter... will follow up now... sorry for MY ignorance
:(<br/><br /> With best regards,<br /> Daniel Migowski<br /><br /> Daniel Migowski schrieb: <blockquote
cite="mid:47825391.9050304@ikoffice.de"type="cite">Hello, together, <br /><br /> some news or comments about my patch?
Willit be accepted or am i doomed with ignorance, now ;) <br /><br /> With best regards, <br /> Daniel Migowski <br
/><br/></blockquote><br /><br /><div class="moz-signature">-- <br /><pre> |¯¯|¯¯|    <b>IKOffice GmbH
DanielMigowski</b>|  |  |/|                            Mail: <a
href="mailto:dmigowski@ikoffice.de">dmigowski@ikoffice.de</a>| | // |  Nordstr. 10               Tel.: +49 (441) 21 98
8952|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  <a
href="http://www.ikoffice.de">http://www.ikoffice.de</a>   Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo
Kuhlmann,Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864</pre></div>

Re: TypeInfoCache

From

Daniel Migowski

Date:

07 January 2008, 12:38:50

Sorry, somehow most [JDBC] mails managed to get into the spam filter... will follow up now... sorry for MY ignorance :(

With best regards,
Daniel Migowski

Daniel Migowski schrieb:

Hello, together,

some news or comments about my patch? Will it be accepted or am i doomed with ignorance, now ;)

With best regards,
Daniel Migowski

 |¯¯|¯¯|    IKOffice GmbH             Daniel Migowski|  |  |/|                            Mail: dmigowski@ikoffice.de|  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo Kuhlmann, Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864

Re: TypeInfoCache

From

Daniel Migowski

Date:

07 January 2008, 12:41:14

Sorry, somehow most [JDBC] mails managed to get into the spam filter... will follow up now... sorry for MY ignorance :(

With best regards,
Daniel Migowski

Daniel Migowski schrieb:

Hello, together,

some news or comments about my patch? Will it be accepted or am i doomed with ignorance, now ;)

With best regards,
Daniel Migowski

 |¯¯|¯¯|    IKOffice GmbH             Daniel Migowski|  |  |/|                            Mail: dmigowski@ikoffice.de|  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52|  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55|__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76           Geschäftsführer: Ingo Kuhlmann, Daniel Migowski           Amtsgericht Oldenburg, HRB 201467           Steuernummer: 64/211/01864

Re: TypeInfoCache

From

Daniel Migowski

Date:

07 January 2008, 12:49:04

Sorry, somehow most [JDBC] mails since 2007-12-19 managed to get into
the spam filter... will follow up now... sorry for MY ignorance :(

With best regards,
Daniel Migowski

Daniel Migowski schrieb:
> Hello, together,
>
> some news or comments about my patch? Will it be accepted or am i
> doomed with ignorance, now ;)
>
> With best regards,
> Daniel Migowski
>


--

 |¯¯|¯¯|    *IKOffice GmbH             Daniel Migowski*
 |  |  |/|                            Mail: dmigowski@ikoffice.de <mailto:dmigowski@ikoffice.de>
 |  | // |  Nordstr. 10               Tel.: +49 (441) 21 98 89 52
 |  | \\ |  26135 Oldenburg           Fax.: +49 (441) 21 98 89 55
 |__|__|\|  http://www.ikoffice.de    Mob.: +49 (176) 22 31 20 76

            Geschäftsführer: Ingo Kuhlmann, Daniel Migowski
            Amtsgericht Oldenburg, HRB 201467
            Steuernummer: 64/211/01864

Re: TypeInfoCache

From

Daniel Migowski

Date:

07 January 2008, 15:17:11

Hello Oliver,

I redid the benchmark, now with 10,000,000 iterations and externalized
functions. Thank you for your hints on writing good benchmarks. These
are the results:

    getString on text_col took 9015.563444ms
    getCharacterStream on text_col took 12239.496833ms
    Factor: 1.3575964396485478

    getString on varchar_col took 9047.101624ms
    getCharacterStream on varchar_col took 12268.299423ms
    Factor: 1.3560474871261377

A feel a bit ashamed of the different results, now, but nontheless do
they still speak what I wanted to say: We have a 35% percent slowdown
JUST on the getStream-Method, regardless of a varchar or a text type.
This is not much if seen in a real life application where the actual
waiting for the data is the real bottleneck.

And this is just a small slowdown, if the applicatin does not use
getString() on the LONGVARCHAR anyway!

So I don't see any problems here.

I can understand your concerns, if you are used to applications that
store everything in text-fields on the database. But a problem is only
if you have such an applicatoin, and forgets about its database fields
and uses the metadata to remember what its fields have been, now notices
the data is in LONGVARCHARs, and uses the stream-methods. But normally,
if you create such a database, you created it just for postgresql
anyway, want to stay with postgresql, and have access to the code that
accesses the JDBC driver. But this again will only contain
getString()-Methods, as it was used to, and so your code does not have
any penalty if we apply the patch.

Please think about this again. Only the very generic applications need
the metadata. And if the fields might contain very large data, it is
good if they use the "slow" interface. The 33% will be reduced to max 1%
at all anyways if mixed in the other code.

Thank you and with best regards,
Daniel Migowski

Oliver Jowett schrieb:
> Daniel Migowski wrote:
>
>>    getString on text_col took 8.27384ms
>>    getCharacterStream on text_col took 14.26489ms
>>    Factor: 1.724095462324628
>>    getString on varchar_col took 9.57153ms
>>    getCharacterStream on varchar_col took 12.76733ms
>>    Factor: 1.3338860140437319
>
>> Conclusion: There is no real performance loss even in the worst case
>> (which should never occur anyway), so please accept my patch. I wish
>> you all a happy new year!
>
> I'm confused. Don't your numbers show a 30%-70% slowdown when using
> getCharacterStream()?
>
> Also some quick notes about microbenchmarks:
>
> - 1000 iterations isn't really enough to get useful numbers, try
> 20,000+ for JIT effects or considerably more for GC effects
> - server vs. client JVM can have quite different behaviours, you
> should check both
> - put your inner benchmark loops in a separate method as some JIT has
> method-level granularity
> - benchmark the same code path in your warmup loop as you actually
> measure the numbers from. In your code, you're benchmarking a
> different code path to the one you run in your warmup loop.
>
> -O
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster

Re: TypeInfoCache

From

Gregory Stark

Date:

07 January 2008, 16:09:21

"Daniel Migowski" <dmigowski@ikoffice.de> writes:

> I can understand your concerns, if you are used to applications that store
> everything in text-fields on the database. But a problem is only if you have
> such an applicatoin, and forgets about its database fields and uses the
> metadata to remember what its fields have been, now notices the data is in
> LONGVARCHARs, and uses the stream-methods.

I think that's quite likely though if you build an application and then later
throw some generic tool at it such as a reporting tool, or a schema design
tool, or a migration tool or something like that.

But I wouldn't be too worried about a slowdown. I would be more worried about
having said tool see LONGVARCHAR and throw its hands in the air and refuse to
include it in your reports. Or insist on migrating it to or from MEMO fields
instead of plain strings.

Perhaps this is all just FUD though. I haven't seen such a case myself. I was
under the impression such cases had been previously posted on list but if
you've searched and not found anything then perhaps I'm thinking of some other
scenario.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!

Re: TypeInfoCache

From

Daniel Migowski

Date:

08 January 2008, 06:30:25

Hello Gregory,

Gregory Stark schrieb:
> "Daniel Migowski" <dmigowski@ikoffice.de> writes:
>
>
>> I can understand your concerns, if you are used to applications that store
>> everything in text-fields on the database. But a problem is only if you have
>> such an applicatoin, and forgets about its database fields and uses the
>> metadata to remember what its fields have been, now notices the data is in
>> LONGVARCHARs, and uses the stream-methods.
>>
>
> I think that's quite likely though if you build an application and then later
> throw some generic tool at it such as a reporting tool, or a schema design
> tool, or a migration tool or something like that.
>
Is reasonable, yes.
> But I wouldn't be too worried about a slowdown. I would be more worried about
> having said tool see LONGVARCHAR and throw its hands in the air and refuse to
> include it in your reports. Or insist on migrating it to or from MEMO fields
> instead of plain strings.
>
What would be the problem with this? text fields ARE memo fields. Use
varchar(n) if you want length constrained fields :).
> Perhaps this is all just FUD though.
In fact LONGVARCHAR is made easy in JDBC, since it is required to be
accessible by the same functions as VARCHAR.
> I haven't seen such a case myself. I was
> under the impression such cases had been previously posted on list but if
> you've searched and not found anything then perhaps I'm thinking of some other
> scenario.
>
In fact I searched for LONGVARCHAR on the list, and everything I got are
complaints that it is not supported in the metadata (in about 4
threads). Of course we could not get complaints for the reversed case,
in which LONGVARCHAR itself was a problem, yet.

With best regards,
Daniel Migowski

Re: TypeInfoCache

From

Gregory Stark

Date:

08 January 2008, 08:16:48

"Daniel Migowski" <dmigowski@ikoffice.de> writes:

> What would be the problem with this? text fields ARE memo fields. Use
> varchar(n) if you want length constrained fields :).

Well if I understand correctly memo fields are much less functional than text
fields. In short the mapping between Postgres types and types of other
databases is imperfect and what type is the closest match will depend on which
properties are most important. IIUC memo fields cannot be used in queries the
way text fields can, passed to functions, and stored in variables etc. The
ability to not declare a length constraint is a fairly minor distinguishing
property that doesn't impact the application much.

>> Perhaps this is all just FUD though.
>
> In fact LONGVARCHAR is made easy in JDBC, since it is required to be accessible
> by the same functions as VARCHAR.

Sure, but that doesn't mean tools will make the right decision. It'll be
positively weird for a tool to provide a different set of options to the user
if it sees a text field rather than a varchar for example since in Postgres
they're almost exactly equivalent.

>> I haven't seen such a case myself. I was
>> under the impression such cases had been previously posted on list but if
>> you've searched and not found anything then perhaps I'm thinking of some other
>> scenario.
>
> In fact I searched for LONGVARCHAR on the list, and everything I got are
> complaints that it is not supported in the metadata (in about 4 threads). Of
> course we could not get complaints for the reversed case, in which LONGVARCHAR
> itself was a problem, yet.

Right, searching for LONGVARCHAR isn't going to find problems since it's not
the way Postgres worked in the past. Perhaps searching for "memo" or "lob" or
something like that might work. But I'm being unfair, I guess I have to do
this search myself if I still think there's a problem :)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!