Thread: Portable interfaces ...

Portable interfaces ...

From
"Preston A. Elder"
Date:
(I apologise in advance if this ever gets duplicated to this list, I
originally submitted it before subscribing).

I wanted to know exactly which header files I can use portably.

For example, on my system (Gentoo Linux), libpq-fe.h and postgres_ext.h
are both installed in /usr/include - thus I'm assuming they (or at least
libpq-fe.h, which includes postgres_ext.h) can be assumed to be
available on all unix systems that have postgres installed with 'make
install'.

However, also on Gentoo, a '/usr/include/postgresql' directory is
created with 'informix', 'internal', 'pgsql' and 'server'
subdirectories.  These subdirectories contain more headers, most of
which I'm assuming are for internal stuff.

I have need to use two files out of this postgresql directory.
Specifically: server/catalog/pg_type.h to get the OID ID's for all the default types
(I'm not using it for 'special' types, just the standard ones, eg.
INT8OID, etc). server/mb/pg_wchar.h for unicode translation support (I'm assuming if
I have a unicode field, that it will actually be represented in a query
as a multi-byte char array, which I can then use pg_mb2wchar to convert
into a wchar_t *).

My question is, what is the likelihood of these headers being on most
unix systems with postgres installed?  What is the likelyhood of them
being under /usr[/local]/postgresql?  What is the likelyhood that these
file names may change, or move?  and what about windows systems?

Unfortunately, the standard API as provided by libpq-fe.h is not really
extensive enough to be able to write any kind of reasonably complex
application that wants to do things like support localization, or verify
the data type is what we expect (or, let the database tell it the data
type without having to store it or know it in advance).

Incidentally, I do note that pg_config --includedir-server ==
/usr/include/postgresql/server on my system, however I do NOT know if
pg_config is a standard script installed with make install (and thus if
I can rely on it to point to the server include directory - or, as
mentioned above, whether the relative paths listed above are guarenteed
to stay the same).

Any help appreciated - and if any PG devs are listening, is it possible
to get either the 'default' API expanded, or document what users of
libpq can reasonably expect to exist on any system with libpq (and
associated headers) can expect to be available?  I'd prefer to use the
#define's and localization routines within postgres itself, since its
the only way to ensure complete compatibility and correctness ... but if
its non-portable, I'm going to have to use my own #define's and portable
calls, which increases the chances of incompatibility with postgres :(

A coder in need ...

-- 
PreZ
Founder
The Neuromancy Society
http://www.neuromancy.net




Re: Portable interfaces ...

From
Tom Lane
Date:
"Preston A. Elder" <prez@neuromancy.net> writes:
> I wanted to know exactly which header files I can use portably.

pg_type.h is part of the "server-side" include tree.  When installing
from original sources, it is installed only if you type "make
install-all-headers".  When dealing with a Linux distribution's
packaging, it's completely at the whim of the packager whether it will
be installed at all, and if so where.

Sorry the news isn't better :-(.  My advice would actually be to copy
the #define's you need for type OIDs into a header in your own sources.
We do not make a practice of changing OIDs for built-in objects, so I
think your risk in that direction is really lower than your risk from
trying to #include pg_type.h on various random distributions.

I don't have a comparably simple answer for your character encoding
issue, but again I fear that relying on PG's internal facilities would
be a bad idea for you.
        regards, tom lane


Re: Portable interfaces ...

From
"Preston A. Elder"
Date:
On Tue, 2004-03-23 at 00:05, Tom Lane wrote:
> I don't have a comparably simple answer for your character encoding
> issue, but again I fear that relying on PG's internal facilities would
> be a bad idea for you.

Does postgres have an API review team or anything?

I mean, not trying to offend anyone here, but I don't see how any API
where you cannot find out details about a table and its columns (such as
character encoding schemes in effect, data type for the column (even if
only for pre-defined data types), field lengths (for strings/etc), etc)
could be considered anything but deficient.

Add to this the "support" for unicode - I mean, its almost like it was
added as an afterthought, since the API won't help you, without using
non-portable facilities (ie. headers/etc that may or may not exist on a
system).

I've programmed with other SQL databases before (specifically Microsoft
SQLServer and Oracle), both of which had standard API's for finding out
the exact kind of information that I speak of - and to be honest, I am
find it hard to believe I would be the first to request these kind of
things be made part of the standard API.

For now, I am (grudgingly) copying the OID codes, and going to use
system-based multibyte/unicode functions (and hope to heck that they are
compatible with postgres, but I get the feeling they are not fully
compatible after browsing pg_wchar.h) - and I'll be preying that I do
not need to find out any more information about a table or column that
would require me to use more 'server-side calls'.  As it is, right now,
I have no verification on things like string lengths because theres no
real way to find out the maximum length of a field (encoding-specific -
ie. for UTF-8, a length of 20 means 20 chars, for UTF-16, a length of 20
means 20 wchar_t's (40 bytes)).

If I sound accusatory in this email, I apologise - I'm just frustrated
(I am trying to write a library that will work on many different unix
variants, windows (compiled with BCB or MSVC) and on OSX - and these
kinds of things frustrate and complicate that process).  What are the
plans for the Postgres API - as I said, I find it hard to believe I am
the first to raise this issue, so are there plans to have function calls
to retrieve 'column properties' and the like?

-- 
PreZ
Founder
The Neuromancy Society
http://www.neuromancy.net




Re: Portable interfaces ...

From
Peter Eisentraut
Date:
Preston A. Elder wrote:
> I wanted to know exactly which header files I can use portably.

You can use exactly the interfaces (headers and functions) described in 
the documentation.  No more, no less.

> For example, on my system (Gentoo Linux), libpq-fe.h and
> postgres_ext.h are both installed in /usr/include - thus I'm assuming
> they (or at least libpq-fe.h, which includes postgres_ext.h) can be
> assumed to be available on all unix systems that have postgres
> installed with 'make install'.

Building libpq programs is described here: 
http://www.postgresql.org/docs/7.4/static/libpq-build.html

Any installation that claims to be ready for libpq development should 
provide libpq-fe.h and dependent headers in a documented location.

> However, also on Gentoo, a '/usr/include/postgresql' directory is
> created with 'informix', 'internal', 'pgsql' and 'server'
> subdirectories.  These subdirectories contain more headers, most of
> which I'm assuming are for internal stuff.

Just because a file exists doesn't mean you should concern yourself with 
how to use it.  The documentations of the interfaces describe what you 
need to use when, how, and for what.  In this case, theses files relate 
to ECPG and server-side functions.  The pgsql directory looks bogus.

> I have need to use two files out of this postgresql directory.
> Specifically:
>   server/catalog/pg_type.h to get the OID ID's for all the default
> types (I'm not using it for 'special' types, just the standard ones,
> eg. INT8OID, etc).

It might be preferrable to get the type names by selecting from the 
system catalog pg_type.

>   server/mb/pg_wchar.h for unicode translation support (I'm assuming
> if I have a unicode field, that it will actually be represented in a
> query as a multi-byte char array, which I can then use pg_mb2wchar to
> convert into a wchar_t *).

You could do that, but the PostgreSQL routines you refer to are not 
intended for client programs.  You should look in your C library for 
this functionality.

> My question is, what is the likelihood of these headers being on most
> unix systems with postgres installed?  What is the likelyhood of them
> being under /usr[/local]/postgresql?  What is the likelyhood that
> these file names may change, or move?  and what about windows
> systems?

That's very difficult to tell.  My advice is to stick to the 
documentation, and if the steps described in the documentation don't 
work, complain to the sysadmin/distributor/installer/packager.

> Unfortunately, the standard API as provided by libpq-fe.h is not
> really extensive enough to be able to write any kind of reasonably
> complex application that wants to do things like support
> localization, or verify the data type is what we expect (or, let the
> database tell it the data type without having to store it or know it
> in advance).

libpq isn't supposed to be a very high-level API.  If you want to write 
in C, maybe you'll be happier with ODBC.

> Incidentally, I do note that pg_config --includedir-server ==
> /usr/include/postgresql/server on my system, however I do NOT know if
> pg_config is a standard script installed with make install (and thus
> if I can rely on it to point to the server include directory - or, as
> mentioned above, whether the relative paths listed above are
> guarenteed to stay the same).

Again, follow the documentation.  If the documentation says to use 
pg_config and pg_config does not exist, complain to your packager.  
However, if pg_config were actually missing, a lot of other packages 
would break, so it's quite safe.



Re: Portable interfaces ...

From
Peter Eisentraut
Date:
Preston A. Elder wrote:
> Does postgres have an API review team or anything?

No.

> I mean, not trying to offend anyone here, but I don't see how any API
> where you cannot find out details about a table and its columns (such
> as character encoding schemes in effect, data type for the column
> (even if only for pre-defined data types), field lengths (for
> strings/etc), etc) could be considered anything but deficient.

http://www.postgresql.org/docs/7.4/static/information-schema.html

> Add to this the "support" for unicode - I mean, its almost like it
> was added as an afterthought,

Indeed.

> since the API won't help you, without
> using non-portable facilities (ie. headers/etc that may or may not
> exist on a system).

I don't understand what the API has to do with Unicode.  Either your 
strings are in Unicode or they are not.  This is set by your 
application.

> For now, I am (grudgingly) copying the OID codes, and going to use
> system-based multibyte/unicode functions (and hope to heck that they
> are compatible with postgres, but I get the feeling they are not
> fully compatible after browsing pg_wchar.h)

They should be.

> is, right now, I have no verification on things like string lengths
> because theres no real way to find out the maximum length of a field
> (encoding-specific - ie. for UTF-8, a length of 20 means 20 chars,
> for UTF-16, a length of 20 means 20 wchar_t's (40 bytes)).

The libpq API gives you a way to determine the binary length of a 
returned datum.  You need to allocate that dynamically.

> What are the plans for the Postgres API - as I said, I
> find it hard to believe I am the first to raise this issue, so are
> there plans to have function calls to retrieve 'column properties'
> and the like?

The plans are more or less that if you don't like it, try using a 
different one, such as ODBC, or if a different programming language 
than C.



Re: Portable interfaces ...

From
"Preston A. Elder"
Date:
On Fri, 2004-03-26 at 05:22, Peter Eisentraut wrote:
> I don't understand what the API has to do with Unicode.  Either your 
> strings are in Unicode or they are not.  This is set by your 
> application.

OK, lets say I'm writing a library that is not supposed to know anything
at all about the database structure.  I'm just writing something to give
the client-programmer a generic interface to many different storage
types (postgres, berkelydb, an XML file, etc).  In this instance, they
may give me a hint as to whether to allow standard or wide strings.

I need to:
1) verify that the database will allow wide (multi-byte) strings if they
have chosen to.
2) find out the encoding style of that db/table so I know how to encode
it into an multi-byte string (since I don't think the libpq API accepts
wchar_t strings, but I think it accepts multi-byte strings).
3) Also know the encoding style so that I can actually encode my wchar_t
strings into multi-byte strings (and vice versa) - since this is very
locale dependant.

As I said, unicode in postgres seems to be an afterthought - this is
also evidenced that encoding can only be specified at the database
level, when most databases I've seen allow it down to the column level
(with inheritance from table then database).

I found out the encoding is available with pg_result->content_encoding. 
However
1) The definition of pg_result is not part of the 'standard' API (its in
the 'private' libpq headers).  So again, I cannot rely on systems having
it defined.
2) It is an integer, which maps more to an internal encoding type than
to anything I can use with a standard multibyte<->wchar_t function.

> > For now, I am (grudgingly) copying the OID codes, and going to use
> > system-based multibyte/unicode functions (and hope to heck that they
> > are compatible with postgres, but I get the feeling they are not
> > fully compatible after browsing pg_wchar.h)
> They should be.

I should also not have to 'hope it works' for standard (non-user
defined) types.  Nor should I have to execute a select to figure out the
oid type for a standard type - which incurs not just the penalty of
having to do a select and waiting for SQL to get back to me, but also a
string comparison of the results (often repeatedly - strcmp(result,
"bool")==0, strcmp(result, "int4")==0, ...) to check what it is.  I
realise I could just do this on startup, and build my own table, but its
still a CPU/wait time penalty.  And copying constants out of elsewhere
when the constants should be provided is also rather hackish.

> > is, right now, I have no verification on things like string lengths
> > because theres no real way to find out the maximum length of a field
> > (encoding-specific - ie. for UTF-8, a length of 20 means 20 chars,
> > for UTF-16, a length of 20 means 20 wchar_t's (40 bytes)).
> The libpq API gives you a way to determine the binary length of a 
> returned datum.  You need to allocate that dynamically.
I'm not talking about allocation here.  I'm talking about string
length.  Or more correctly, maximum string length.  I want to find out
from the database the maximum length of a string I can pass - even if
its in bytes (in which case I divide it by 2 for 2-byte encoding systems
... kindof).  Which is very different to how many bytes it uses up in
the database's storage.

> The plans are more or less that if you don't like it, try using a 
> different one, such as ODBC, or if a different programming language 
> than C.
Which is the wrong attitude for a database product trying to compete in
what is turning into a more and more crowded field.  I'm not using ODBC
for one simple reason - I don't want the users of my library to have to
configure an ODBC interface to their database, etc.  Nor do I want them
to have to install anything extra apart from postgres that they may have
to install for ODBC support.  I'm trying to keep my library as native as
possible with database interfaces, which is why I'm not even using
libpq++.

For now, I'm going to have to hack my way around these issues, with a
series of ugly hacks - but there is no way I could ever in good
concience recommend any database application developer who is writing
code in C or C++ use postgres as their back end database without serious
API changes that will actually provide decent information on standard
types and column/table/database configuration.  Without having to
manually do some selects, of which are not documented anywhere.

For example, there is nowhere in the docs that tells me "To get the OID
of column 'y' in table 'x', do: SELECT atttypid FROM pg_attribute WHERE attrelid = (SELECT relfilenode FROM pg_class
              WHERE relname = 'x') AND attname = 'y'
 
".  That would be an extremely useful statement to have in the docs. 
Then even if the API did not support such blatantly obvious things like
this, I would at least know how to hack it without having to research
what I imagine would be commonly needed functionality in any reasonably
complex application or library where the application/library is not
supposed to know, or rely on the underlying data types.

-- 
PreZ
Founder
The Neuromancy Society
http://www.neuromancy.net




Re: Portable interfaces ...

From
Kris Jurka
Date:

On Sat, 27 Mar 2004, Preston A. Elder wrote:

> For example, there is nowhere in the docs that tells me "To get the OID
> of column 'y' in table 'x', do:
>   SELECT atttypid FROM pg_attribute
>   WHERE attrelid = (SELECT relfilenode FROM pg_class
>                     WHERE relname = 'x')
>   AND attname = 'y'
> ".  That would be an extremely useful statement to have in the docs. 

This is not correct, you want
"... attrelid = (SELECT oid FROM pg_class ..."
not relfilenode.

Kris Jurka