Thread: information_schema.columns changes needed for OLEDB

information_schema.columns changes needed for OLEDB

From
Konstantin Izmailov
Date:
As we discussed at pgcon2009 there are some changes/fixes necessary in information_schema.columns to allow correct work of applications and services via OLEDB on Windows. Here are some:

1. data_type field contains types names that are not recognized by MS apps.

Code around: rename types on the fly, e.g.

integer -> int

character varying -> varchar

character -> char

timestamp without time zone -> datetime

bytea -> image

2. character_maximum_length field

Code around: change value for text abd bytea types

[text] 1073741823

[bytea] 2147483647

3. character_octet_length should always be double of character_maximum_length (due to Unicode character size on Windows which is 2).

4. datetime_precision field is not always correct

Code around: change value of the fly, e.g. if value is not null then

[numeric] keep the value (ok)

[bigint] set value to 19

all other set to 10

5. numeric_precision_radix field should always be equal to 10

6. datetime_precision field, minor changes

Code around: change value on the fly, e.g.

[date] set value to zero

[datetime] set value to 3

Re: information_schema.columns changes needed for OLEDB

From
Peter Eisentraut
Date:
On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
> As we discussed at pgcon2009 there are some changes/fixes necessary in
> information_schema.columns to allow correct work of applications and
> services via OLEDB on Windows. Here are some:
>
> 1. data_type field contains types names that are not recognized by MS apps.
>
> Code around: rename types on the fly, e.g.
>
> integer -> int
>
> character varying -> varchar
>
> character -> char

The spelling of these types in the information schema is fixed by the SQL 
standard.  We can't change that.

> timestamp without time zone -> datetime

And that would certainly be wrong for other applications, because PostgreSQL 
doesn't have a datetime type.

> bytea -> image

And that we certainly can't do either.

> 2. character_maximum_length field
>
> Code around: change value for text abd bytea types
>
> [text] 1073741823

(see next item)

> [bytea] 2147483647

But bytea is not a character type in the first place, so this value is 
meaningless.

> 3. character_octet_length should always be double of
> character_maximum_length (due to Unicode character size on Windows which is
> 2).

We could do something like that if we exposed the maximum octet length of a 
character per encoding.  But what I wonder is whether this should reflect the 
server or the client encoding.  How do your applications use this value?

> 4. datetime_precision field is not always correct
>
> Code around: change value of the fly, e.g. if value is not null then
>
> [numeric] keep the value (ok)
>
> [bigint] set value to 19
>
> all other set to 10

Why would numeric and bigint affect *datetime*_precision at all?

> 5. numeric_precision_radix field should always be equal to 10

Why?

> 6. datetime_precision field, minor changes
>
> Code around: change value on the fly, e.g.
>
> [date] set value to zero

Makes sense.  I think this is not correct at the moment.

> [datetime] set value to 3

Well, it really depends on what you set it to when you declared the column, 
no?


Re: information_schema.columns changes needed for OLEDB

From
Konstantin Izmailov
Date:
Number 4 is actually numeric_precision (I typed incorrectly). My recollection is that numeric_precision sometimes
expressedin radix 2 and it caused issues for Windows apps.<br /><br />I agree on other issues. I was curious if
databasecan help OLEDB driver (to make it simpler). Anyway it can emulate values for specific Windows apps on the fly.
Thankyou!<br /><br /><div class="gmail_quote">On Sat, May 23, 2009 at 2:57 PM, Peter Eisentraut <span dir="ltr"><<a
href="mailto:peter_e@gmx.net">peter_e@gmx.net</a>></span>wrote:<br /><blockquote class="gmail_quote"
style="border-left:1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;"><div class="im">On
Friday22 May 2009 18:27:01 Konstantin Izmailov wrote:<br /> > As we discussed at pgcon2009 there are some
changes/fixesnecessary in<br /> > information_schema.columns to allow correct work of applications and<br /> >
servicesvia OLEDB on Windows. Here are some:<br /> ><br /> > 1. data_type field contains types names that are not
recognizedby MS apps.<br /> ><br /> > Code around: rename types on the fly, e.g.<br /> ><br /> > integer
->int<br /> ><br /> > character varying -> varchar<br /> ><br /> > character -> char<br /><br
/></div>Thespelling of these types in the information schema is fixed by the SQL<br /> standard.  We can't change
that.<br/><div class="im"><br /> > timestamp without time zone -> datetime<br /><br /></div>And that would
certainlybe wrong for other applications, because PostgreSQL<br /> doesn't have a datetime type.<br /><br /> > bytea
->image<br /><br /> And that we certainly can't do either.<br /><div class="im"><br /> > 2.
character_maximum_lengthfield<br /> ><br /> > Code around: change value for text abd bytea types<br /> ><br />
>[text] 1073741823<br /><br /></div>(see next item)<br /><br /> > [bytea] 2147483647<br /><br /> But bytea is not
acharacter type in the first place, so this value is<br /> meaningless.<br /><div class="im"><br /> > 3.
character_octet_lengthshould always be double of<br /> > character_maximum_length (due to Unicode character size on
Windowswhich is<br /> > 2).<br /><br /></div>We could do something like that if we exposed the maximum octet length
ofa<br /> character per encoding.  But what I wonder is whether this should reflect the<br /> server or the client
encoding. How do your applications use this value?<br /><div class="im"><br /> > 4. datetime_precision field is not
alwayscorrect<br /> ><br /> > Code around: change value of the fly, e.g. if value is not null then<br /> ><br
/>> [numeric] keep the value (ok)<br /> ><br /> > [bigint] set value to 19<br /> ><br /> > all other set
to10<br /><br /></div>Why would numeric and bigint affect *datetime*_precision at all?<br /><div class="im"><br /> >
5.numeric_precision_radix field should always be equal to 10<br /><br /></div>Why?<br /><div class="im"><br /> > 6.
datetime_precisionfield, minor changes<br /> ><br /> > Code around: change value on the fly, e.g.<br /> ><br
/>> [date] set value to zero<br /><br /></div>Makes sense.  I think this is not correct at the moment.<br /><div
class="im"><br/> > [datetime] set value to 3<br /><br /></div>Well, it really depends on what you set it to when you
declaredthe column,<br /> no?<br /></blockquote></div><br /> 

Re: information_schema.columns changes needed for OLEDB

From
Josh Berkus
Date:
On 5/23/09 7:37 PM, Konstantin Izmailov wrote:
> Number 4 is actually numeric_precision (I typed incorrectly). My
> recollection is that numeric_precision sometimes expressed in radix 2
> and it caused issues for Windows apps.
>
> I agree on other issues. I was curious if database can help OLEDB driver
> (to make it simpler). Anyway it can emulate values for specific Windows
> apps on the fly. Thank you!

You could, of course, create your own ms_information_schema which had 
ms_friendly views of the IS.

-- 
Josh Berkus
PostgreSQL Experts Inc.
www.pgexperts.com


Re: information_schema.columns changes needed for OLEDB

From
Gevik Babakhani
Date:
Josh Berkus wrote:
> On 5/23/09 7:37 PM, Konstantin Izmailov wrote:
>> Number 4 is actually numeric_precision (I typed incorrectly). My
>> recollection is that numeric_precision sometimes expressed in radix 2
>> and it caused issues for Windows apps.
>>
>> I agree on other issues. I was curious if database can help OLEDB driver
>> (to make it simpler). Anyway it can emulate values for specific Windows
>> apps on the fly. Thank you!
>
> You could, of course, create your own ms_information_schema which had 
> ms_friendly views of the IS.
>
This is what I have done for a past project. I do not think we should 
part from the standard SQL schema in order to satisfy a certain third 
party component.
If the information_schema does not provide all the information, one 
could always query the pg_* tables for needed data.

-- 
Regards,
Gevik



Re: information_schema.columns changes needed for OLEDB

From
Peter Eisentraut
Date:
On Sunday 24 May 2009 03:37:28 Konstantin Izmailov wrote:
> Number 4 is actually numeric_precision (I typed incorrectly). My
> recollection is that numeric_precision sometimes expressed in radix 2 and
> it caused issues for Windows apps.

It is measured in radix 2 for floating-point types and in radix 10 for fixed-
point types.



Re: information_schema.columns changes needed for OLEDB

From
Konstantin Izmailov
Date:
Hi,<br />I'm not sure that it is related to information_schema but I wanted to let you know that some Postgres
functionsare listed in pg_proc while others are not. For example, all Data Type Formatting function are in pg_proc
(to_char,to_hex, ...). While several of the Date/Time Functions are not there (extract, localtime, ...).<br /><br />Why
differentPostgres function are not equal???<br /><br />This causes issues to Windows integration as well.<br
/>Konstantin<br/> 

Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Konstantin Izmailov <pgfizm@gmail.com> writes:
> you know that some Postgres functions are listed in pg_proc while others are
> not. For example, all Data Type Formatting function are in pg_proc (to_char,
> to_hex, ...). While several of the Date/Time Functions are not there
> (extract, localtime, ...).

The ones that appear not to be there are ones that the SQL standard
demands special weird syntax for.  The grammar translates such calls
to standard function calls to underlying functions, which usually are
named a bit differently to avoid confusion.  For instance
extract(field from some_expr) becomes date_part('field', some_expr).

If you want to know what all of these are, see the func_expr production
in parser/gram.y.

> This causes issues to Windows integration as well.

Complain to the SQL standards committee, especially to those members
who seem to think COBOL represented the apex of programming language
syntax design :-(
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Konstantin Izmailov
Date:
Tom,
this is very helpful - thank you so much!
 
I had to discover those 'missing' functions one by one, usually after users' complaints.
 
Konstantin

On Fri, May 29, 2009 at 11:35 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Konstantin Izmailov <pgfizm@gmail.com> writes:
> you know that some Postgres functions are listed in pg_proc while others are
> not. For example, all Data Type Formatting function are in pg_proc (to_char,
> to_hex, ...). While several of the Date/Time Functions are not there
> (extract, localtime, ...).

The ones that appear not to be there are ones that the SQL standard
demands special weird syntax for.  The grammar translates such calls
to standard function calls to underlying functions, which usually are
named a bit differently to avoid confusion.  For instance
extract(field from some_expr) becomes date_part('field', some_expr).

If you want to know what all of these are, see the func_expr production
in parser/gram.y.

> This causes issues to Windows integration as well.

Complain to the SQL standards committee, especially to those members
who seem to think COBOL represented the apex of programming language
syntax design :-(

                       regards, tom lane

Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
>> [ discussion of some details of information_schema results ]

Are we going to do anything about any of these issues?
In particular,

>> 6. datetime_precision field, minor changes
>> 
>> Code around: change value on the fly, e.g.
>> 
>> [date] set value to zero

> Makes sense.  I think this is not correct at the moment.

AFAICS, the SQL standard demands that precision and scale fields be
non-null all the time for those data types where they make sense
(this is encoded in the CHECK CONSTRAINTs that are declared for the
various information-schema tables, see particularly 21.15
DATA_TYPE_DESCRIPTOR base table in SQL99).  DATE is clearly wrong
per spec, but it's not the only problem.

Our interpretation has been to set these values to null if the typmod
is defaulted, which is reasonable in the abstract but it's still a
violation of spec.  I wonder whether we should be inserting some large
limit value instead.

For the datetime types we actually have a perfectly good default
precision value, namely six digits, if the DB is using integer datetimes
--- and I don't think there's a strong argument not to use this value
for float timestamps too.

I'd prefer to avoid a catversion bump at this stage of the cycle,
but it looks like any changes here would merely involve the bodies of
some functions in information_schema.sql.  I think we could just change
them without a catversion bump.  Any beta testers who actually care
could easily insert the new definitions without an initdb.
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Robert Haas
Date:
On Sun, May 31, 2009 at 11:41 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I'd prefer to avoid a catversion bump at this stage of the cycle,
> but it looks like any changes here would merely involve the bodies of
> some functions in information_schema.sql.  I think we could just change
> them without a catversion bump.  Any beta testers who actually care
> could easily insert the new definitions without an initdb.

Is this a regression relative to 8.3?  If not, why not leave it for
8.5?  It seems a little bit late in the game to be messing with this.
I admit I don't have any particular reason to think it will break
anything, but there are other minor patches floating around that meet
that criteria too, so if this one then why not those?

...Robert


Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes:
> Is this a regression relative to 8.3?  If not, why not leave it for
> 8.5?

No. it's not a regression.  What I was concerned about was mainly that
it seemed likely to drop off the radar screen without being addressed at
all, since most of the other things Konstantin complained about seemed
to be we-won't-change-this items.  I have no objection to a conscious
decision to leave it alone for 8.4, so long as it gets memorialized as
either a pending patch or a TODO item.
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Peter Eisentraut
Date:
On Sunday 31 May 2009 18:41:55 Tom Lane wrote:
> AFAICS, the SQL standard demands that precision and scale fields be
> non-null all the time for those data types where they make sense
> (this is encoded in the CHECK CONSTRAINTs that are declared for the
> various information-schema tables, see particularly 21.15
> DATA_TYPE_DESCRIPTOR base table in SQL99).  DATE is clearly wrong
> per spec, but it's not the only problem.

The DATE change is the only thing I'd be prepared to make right now.

> Our interpretation has been to set these values to null if the typmod
> is defaulted, which is reasonable in the abstract but it's still a
> violation of spec.  I wonder whether we should be inserting some large
> limit value instead.

That is something to think about, but it needs more time.  We also have some 
inconsistency there; for example we produce a large limit value for octet 
length.  Needs more thought.  And if we go down that route, it should also 
require less hardcoding of numbers into information_schema.sql.




Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> On Sunday 31 May 2009 18:41:55 Tom Lane wrote:
>> AFAICS, the SQL standard demands that precision and scale fields be
>> non-null all the time for those data types where they make sense
>> (this is encoded in the CHECK CONSTRAINTs that are declared for the
>> various information-schema tables, see particularly 21.15
>> DATA_TYPE_DESCRIPTOR base table in SQL99).  DATE is clearly wrong
>> per spec, but it's not the only problem.

> The DATE change is the only thing I'd be prepared to make right now.

At this point I think the clear decision is "we're not changing anything
for 8.4".  I've put the issue on the TODO list for future development
cycles.
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Peter Eisentraut
Date:
On Monday 08 June 2009 07:12:33 Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On Sunday 31 May 2009 18:41:55 Tom Lane wrote:
> >> AFAICS, the SQL standard demands that precision and scale fields be
> >> non-null all the time for those data types where they make sense
> >> (this is encoded in the CHECK CONSTRAINTs that are declared for the
> >> various information-schema tables, see particularly 21.15
> >> DATA_TYPE_DESCRIPTOR base table in SQL99).  DATE is clearly wrong
> >> per spec, but it's not the only problem.
> >
> > The DATE change is the only thing I'd be prepared to make right now.
>
> At this point I think the clear decision is "we're not changing anything
> for 8.4".  I've put the issue on the TODO list for future development
> cycles.

After gathering that there will probably be some other changes before
release that will require an initdb (even without catversion bump), and
after reexamining the issue, I think it's trivial and uncontroversial to
fix the datetime issues:

diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
index 9c5672f..cb0296a 100644
--- a/src/backend/catalog/information_schema.sql
+++ b/src/backend/catalog/information_schema.sql
@@ -160,12 +160,12 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer    RETURNS NULL ON
NULLINPUT    AS$$SELECT
 
-  CASE WHEN $2 = -1 /* default typmod */
-       THEN null
+  CASE WHEN $1 IN (1082) /* date */
+           THEN 0       WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
-       THEN $2
+           THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 END       WHEN $1 IN (1186) /* interval */
-       THEN $2 & 65535
+           THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 & 65535 END       ELSE null  END$$;


I have also prepared a patch that creates more realistic values for
character_octet_length based on encoding information, which I will propose
for 8.5.  The issue of whether to report null or some large value for
"unlimited" length data types needs some more thought.



Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> diff --git a/src/backend/catalog/information_schema.sql b/src/backend/catalog/information_schema.sql
> index 9c5672f..cb0296a 100644
> --- a/src/backend/catalog/information_schema.sql
> +++ b/src/backend/catalog/information_schema.sql
> @@ -160,12 +160,12 @@ CREATE FUNCTION _pg_datetime_precision(typid oid, typmod int4) RETURNS integer
>      RETURNS NULL ON NULL INPUT
>      AS
>  $$SELECT
> -  CASE WHEN $2 = -1 /* default typmod */
> -       THEN null
> +  CASE WHEN $1 IN (1082) /* date */
> +           THEN 0
>         WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
> -       THEN $2
> +           THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 END
>         WHEN $1 IN (1186) /* interval */
> -       THEN $2 & 65535
> +           THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 & 65535 END
>         ELSE null
>    END$$;

Just for the record, this is a perfect example of why -u format sucks.
How many people think they can look at this and know exactly what the
change does?

I'll be back to weigh in on the merits of the patch after I've converted
it to -c format so I can understand it.
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> After gathering that there will probably be some other changes before
> release that will require an initdb (even without catversion bump), and
> after reexamining the issue, I think it's trivial and uncontroversial to
> fix the datetime issues:

For the benefit of anyone else finding this hard to read, I've
reformatted as

*** src/backend/catalog/information_schema.sql.orig    Tue Feb 24 11:10:16 2009
--- src/backend/catalog/information_schema.sql    Tue Jun  9 14:21:37 2009
***************
*** 160,171 ****     RETURNS NULL ON NULL INPUT     AS $$SELECT
!   CASE WHEN $2 = -1 /* default typmod */
!        THEN null        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
!        THEN $2        WHEN $1 IN (1186) /* interval */
!        THEN $2 & 65535        ELSE null   END$$; 
--- 160,171 ----     RETURNS NULL ON NULL INPUT     AS $$SELECT
!   CASE WHEN $1 IN (1082) /* date */
!            THEN 0        WHEN $1 IN (1083, 1114, 1184, 1266) /* time, timestamp, same + tz */
!            THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 END        WHEN $1 IN (1186) /* interval */
!            THEN CASE WHEN $2 = -1 THEN 6 ELSE $2 & 65535 END        ELSE null   END$$;

I would suggest coding the inner cases asCASE WHEN $2 < 0 THEN ...default...
since the general practice in the C code is to treat any negative
value as meaning "unspecified".  Otherwise, +1.
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Peter Eisentraut
Date:
On Friday 22 May 2009 18:27:01 Konstantin Izmailov wrote:
> 3. character_octet_length should always be double of
> character_maximum_length (due to Unicode character size on Windows which is
> 2).

I have the attached patch that would make character_octet_length the product 
of character_octet_length and the maximum octet length of a single character 
in the selected server encoding.  So for UTF-8, this would be factor 4.  This 
doesn't exactly correspond to the behavior that you expect, but I think it's 
more correct overall anyway.


Re: information_schema.columns changes needed for OLEDB

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> I have the attached patch that would make character_octet_length the product 
> of character_octet_length and the maximum octet length of a single character 
> in the selected server encoding.  So for UTF-8, this would be factor 4.  This
> doesn't exactly correspond to the behavior that you expect, but I think it's 
> more correct overall anyway.

+1, but that new query isn't very schema-safe ... I think it needs a few
"pg_catalog." qualifications.
        regards, tom lane


Re: information_schema.columns changes needed for OLEDB

From
Peter Eisentraut
Date:
On Monday 06 July 2009 22:16:12 Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I have the attached patch that would make character_octet_length the
> > product of character_octet_length and the maximum octet length of a
> > single character in the selected server encoding.  So for UTF-8, this
> > would be factor 4.  This doesn't exactly correspond to the behavior that
> > you expect, but I think it's more correct overall anyway.
>
> +1, but that new query isn't very schema-safe ... I think it needs a few
> "pg_catalog." qualifications.

Applied with fixes.