Thread: Automatically parsing in-line composite types

Automatically parsing in-line composite types

From

Mitar

Date:

19 September 2019, 07:51:18

Hi!

I am trying to understand how could I automatically parse an in-line
composite type. By in-line composite type I mean a type corresponding
to ROW. For example, in the following query:

SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body))
FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts

It looks like I can figure out that "comments" is an array of records.
But then there is no way really to understand how to parse those
records? So what are types of fields in the record?

I start the parsing process by looking at types returned in
RowDescription message and then reading descriptions in pg_type table.

Is there some other way to get full typing information of the result I
am assuming is available to PostreSQL internally?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Mitar

Date:

23 October 2019, 22:50:14

Hi!

Bump my previous question. I find it surprising that it seems this
information is not possible to be reconstructed by the client, when
the server has to have it internally. Is this a new feature request or
am I missing something?

> I am trying to understand how could I automatically parse an in-line
> composite type. By in-line composite type I mean a type corresponding
> to ROW. For example, in the following query:
>
> SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body))
> FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts
>
> It looks like I can figure out that "comments" is an array of records.
> But then there is no way really to understand how to parse those
> records? So what are types of fields in the record?
>
> I start the parsing process by looking at types returned in
> RowDescription message and then reading descriptions in pg_type table.
>
> Is there some other way to get full typing information of the result I
> am assuming is available to PostreSQL internally?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Dave Cramer

Date:

29 October 2019, 15:23:39

On Wed, 23 Oct 2019 at 15:50, Mitar <mmitar@gmail.com> wrote:

Hi!

Bump my previous question. I find it surprising that it seems this
information is not possible to be reconstructed by the client, when
the server has to have it internally. Is this a new feature request or
am I missing something?

> I am trying to understand how could I automatically parse an in-line
> composite type. By in-line composite type I mean a type corresponding
> to ROW. For example, in the following query:
>
> SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body))
> FROM comments WHERE comments.post_id=posts._id) AS comments FROM posts
>
> It looks like I can figure out that "comments" is an array of records.
> But then there is no way really to understand how to parse those
> records? So what are types of fields in the record?
>
> I start the parsing process by looking at types returned in
> RowDescription message and then reading descriptions in pg_type table.
>
> Is there some other way to get full typing information of the result I
> am assuming is available to PostreSQL internally?

Reading the RowDescription is the only way I am aware of.

Dave Cramer

davec@postgresintl.com

www.postgresintl.com

Re: Automatically parsing in-line composite types

From

Fabio Ugo Venchiarutti

Date:

29 October 2019, 19:06:24

On 29/10/2019 12:23, Dave Cramer wrote:
> 
> 
> On Wed, 23 Oct 2019 at 15:50, Mitar <mmitar@gmail.com 
> <mailto:mmitar@gmail.com>> wrote:
> 
>     Hi!
> 
>     Bump my previous question. I find it surprising that it seems this
>     information is not possible to be reconstructed by the client, when
>     the server has to have it internally. Is this a new feature request or
>     am I missing something?
> 
>      > I am trying to understand how could I automatically parse an in-line
>      > composite type. By in-line composite type I mean a type corresponding
>      > to ROW. For example, in the following query:
>      >
>      > SELECT _id, body, (SELECT array_agg(ROW(comments._id, comments.body))
>      > FROM comments WHERE comments.post_id=posts._id) AS comments FROM
>     posts
>      >
>      > It looks like I can figure out that "comments" is an array of
>     records.
>      > But then there is no way really to understand how to parse those
>      > records? So what are types of fields in the record?
>      >
>      > I start the parsing process by looking at types returned in
>      > RowDescription message and then reading descriptions in pg_type
>     table.
>      >
>      > Is there some other way to get full typing information of the
>     result I
>      > am assuming is available to PostreSQL internally?
> 
> 
> 
> Reading the RowDescription is the only way I am aware of.
> 
> 
> Dave Cramer
> 
> davec@postgresintl.com <mailto:davec@postgresintl.com>
> www.postgresintl.com <http://www.postgresintl.com>

Perhaps I misunderstood your question, but that sounds like my average 
use-case for the object-relational type system & JSON/JSONB 
functions/types: defining nested structured types as temporary relations 
in my queries and spew out their hierarchical JSON representation - 
often as a single big field (ironically I hate storing JSON in 
relational databases unless I'm storing something really opaque like 
dashboard layouts).

EG:

SELECT
     t.relname AS t_name,
     array_to_json(ARRAY_AGG(ats)) AS fields_json
FROM
     pg_class AS t INNER JOIN (
         SELECT
             ia.attrelid AS table_id,
             ia.attnum AS column_number,
             ia.attname AS column_name
         FROM
             pg_attribute AS ia
     ) AS ats
     ON
         (t.relkind = 'r')
     AND
         (t.relname IN ('pg_type', 'pg_constraint'))
     AND
         (ats.table_id = t.oid)
GROUP BY
     t.relname

You can use subqueries and array_agg() to deepen your output tree all 
the way to a stack overflow, a single <whatever>_to_json() call at the 
top will recursively traverse and convert whatever you feed it.

In your case you can just emit your composite type as a JSON object or 
array thereof (types and relations are the same thing).

-- 
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

-- 

Notice: 
This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group.

If you are not the intended recipient, please notify us 
immediately and delete all copies of this message. Please note that it is 
your responsibility to scan this message for viruses.

References to the 
"Ocado Group" are to Ocado Group plc (registered in England and Wales with 
number 7098618) and its subsidiary undertakings (as that expression is 
defined in the Companies Act 2006) from time to time. The registered office 
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way, 
Hatfield, Hertfordshire, AL10 9UL.

Re: Automatically parsing in-line composite types

From

Mitar

Date:

29 October 2019, 20:52:38

Hi!

On Tue, Oct 29, 2019 at 5:23 AM Dave Cramer <pg@fastcrypt.com> wrote:
> Reading the RowDescription is the only way I am aware of.

But that provides only the types for the top-level fields. Not the
inline composite types. If your top-level field is a registered
composite type then yes, it works out if you then go and read from
system tables definitions of those types. But for any other case where
you for example subselect a list of columns from a table in a
sub-query, it does not work out.

I think ideally, with introduction of composite types into PostgreSQL,
RowDescription should have been extended to provide information for
composite types as well, recursively. In that way you would not even
have to go and fetch additional information from other types,
potentially hitting race conditions.


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Mitar

Date:

29 October 2019, 20:58:22

Hi!

On Tue, Oct 29, 2019 at 9:06 AM Fabio Ugo Venchiarutti
<f.venchiarutti@ocado.com> wrote:
> You can use subqueries and array_agg() to deepen your output tree all
> the way to a stack overflow, a single <whatever>_to_json() call at the
> top will recursively traverse and convert whatever you feed it.

Yes, what you are describing is exactly the sad state of things: the
only way to meaningfully retrieve inline composite types which are
made when one aggregate things like that, or when you subselect a set
of fields from a table in a sub-query, is that you then convert the
whole thing to JSON and transmit it in that way. Because this is the
only way you can parse things on the client. Because if you leave it
as raw composite type encoding, you cannot really parse that on the
client correctly in all cases without knowing what types are stored
inside those composite types you are getting.

But JSON is not a lossless transport format: it does not support full
floating point spec (no inf, NANs) and for many types of fields it
just converts to string representation of that, which can be
problematic. For example, if you have binary blobs.

So no, JSON is a workaround, but it is sad that we should have to use
it. PostgreSQL seems to be almost there with the support for composite
types and nested query results, only it seems you cannot really parse
it out. I mean, why PostgreSQL even has its own binary format for
results, then it could just transmit everything as JSON. :-) But that
does not really work for many data types.

I think RowDescription should be extended to provide full recursive
metadata about all data types. That would be the best way to do it.

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Tom Lane

Date:

29 October 2019, 21:33:00

Mitar <mmitar@gmail.com> writes:
> I think RowDescription should be extended to provide full recursive
> metadata about all data types. That would be the best way to do it.

[ shrug... ]  In a world where stability of the wire protocol were
of zero value, maybe we would do that.  In the real world, don't
hold your breath.

Note that dealing with composite values in full generality is by no
means just a problem of getting this metadata.  Clients would also
have to be prepared to parse and de-escape the data representation,
which is not trivial in either text or binary cases.

On the whole I think it's generally better practice to explode your
composite types into separate fields for transmission to the client.

Note that the cases where JSON or XML shine are where you don't
necessarily have a consistent set of fields in different instances
of the composite values.  Even if we did extend RowDescription to
support describing composites' sub-fields, it wouldn't be in
much of a position to deal with that.

            regards, tom lane

Re: Automatically parsing in-line composite types

From

Mitar

Date:

30 October 2019, 05:38:29

Hi!

On Tue, Oct 29, 2019 at 11:33 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> [ shrug... ]  In a world where stability of the wire protocol were
> of zero value, maybe we would do that.  In the real world, don't
> hold your breath.

Oh, yes. I would hope this would be possible in backwards compatible
way. I am not too familiar with the wire protocol to know the answer
to that though.

> Clients would also
> have to be prepared to parse and de-escape the data representation,
> which is not trivial in either text or binary cases.

Yes, but currently they cannot be prepared. They simply lack necessary
information. So if they are not prepared, then the state is the same
as it is currently: they get some composite type in its encoded
representation as a value. But if they are prepared, they have
necessary metadata to parse it.

> On the whole I think it's generally better practice to explode your
> composite types into separate fields for transmission to the client.

The issue here is that it is really hard to make a general client for
PostgreSQL. User might want to an arbitrary SQL query. I would like to
be able to parse that automatically, without user having to specify
additional how to parse it, or requiring them to change SQL query, or
showing them encoded representation directly (not very user friendly).

I agree that in simple cases one could just change the SQL query, but
that is not really always possible. For example, aggregating into an
array a related table is very useful because it makes amount of data
transmitted over the wire much smaller (instead of having to repeat
again and again contents of rows of main table).

> Note that the cases where JSON or XML shine are where you don't
> necessarily have a consistent set of fields in different instances
> of the composite values.  Even if we did extend RowDescription to
> support describing composites' sub-fields, it wouldn't be in
> much of a position to deal with that.

Yes, but that case is already handled: you just have a column type
"JSON' (or "JSONB") and it is clear how to automatically parse that.
What I am missing is a way to automatically parse composite types.
Those are generally not completely arbitrary, but are defined by the
query, not by data.

What would be the next step to move this further in some direction?

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Merlin Moncure

Date:

30 October 2019, 18:38:10

On Tue, Oct 29, 2019 at 12:58 PM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> On Tue, Oct 29, 2019 at 9:06 AM Fabio Ugo Venchiarutti
> <f.venchiarutti@ocado.com> wrote:
> > You can use subqueries and array_agg() to deepen your output tree all
> > the way to a stack overflow, a single <whatever>_to_json() call at the
> > top will recursively traverse and convert whatever you feed it.
>
> Yes, what you are describing is exactly the sad state of things: the
> only way to meaningfully retrieve inline composite types which are
> made when one aggregate things like that, or when you subselect a set
> of fields from a table in a sub-query, is that you then convert the
> whole thing to JSON and transmit it in that way. Because this is the
> only way you can parse things on the client. Because if you leave it
> as raw composite type encoding, you cannot really parse that on the
> client correctly in all cases without knowing what types are stored
> inside those composite types you are getting.
>
> But JSON is not a lossless transport format: it does not support full
> floating point spec (no inf, NANs) and for many types of fields it
> just converts to string representation of that, which can be
> problematic. For example, if you have binary blobs.
>
> So no, JSON is a workaround, but it is sad that we should have to use
> it. PostgreSQL seems to be almost there with the support for composite
> types and nested query results, only it seems you cannot really parse
> it out. I mean, why PostgreSQL even has its own binary format for
> results, then it could just transmit everything as JSON. :-) But that
> does not really work for many data types.
>
> I think RowDescription should be extended to provide full recursive
> metadata about all data types. That would be the best way to do it.

Check out libpqtypes: https://github.com/pgagarinov/libpqtypes

it does exactly what you want. It's a wrapper for libpq that provides
client side parsing for the binary protocol with array and composite
type parsing.  This library was written before the json train left the
station; it's only use today (assuming you don't like the cute
'PQexedf') would be scenarios where performance requirements are
extreme and the client application is written in C (so you can benefit
from direct binary copying vs string serialization).  Virtually any
non-C client application really ought to be using json rather than the
custom binary structures libpqtyps would provide.  I cowrite the
library, it works wondefully, but I've since moved on to json.

JSON also undercuts the argument that the database ought to provide
deep type facilities on both sides of the protocol IMO; I used to be
interested in that, now I'm not; the world is moving on.

merlin

Re: Automatically parsing in-line composite types

From

Mitar

Date:

30 October 2019, 19:15:22

Hi!

On Wed, Oct 30, 2019 at 8:37 AM Merlin Moncure <mmoncure@gmail.com> wrote:
> Check out libpqtypes: https://github.com/pgagarinov/libpqtypes

Interesting. I have looked at the code a bit but I do not find how it
determines the type for inline compound types, like the ones they
appear in my original SQL query example. Could you maybe point me to
the piece of code there handling that? Because to my
understanding/exploration that information is simply not exposed to
the client in any way. :-(

> it does exactly what you want. It's a wrapper for libpq that provides
> client side parsing for the binary protocol with array and composite
> type parsing.

It looks to me that it does parsing of composite types only if they
are registered composite types. But not for example ones you get if
you project a subset of fields from a table in a subquery. That has no
registered composite type?

Also, how you are handling discovery of registered types, do you read
that on-demand from the database? They are not provided over the wire?

> Virtually any
> non-C client application really ought to be using json rather than the
> custom binary structures libpqtyps would provide.

I thought that initially, too, but then found out that JSON has some
heavy limitations because the implementation in PostgreSQL is standard
based. There is also no hook to do custom encoding of non-JSON values.
So binary blobs are converted in an ugly way (base64 would be better).
You also loose a lot of meta-information, because everything non-JSON
gets converted to strings automatically. Like knowing what is a date.
I think MongoDB with BSON made much more sense here. It looks like
perfect balance between simplicity of JSON structure and adding few
more useful data types.

But yes, JSON is great also because clients often have optimized JSON
readers. Which can beat any other binary serialization format. In
node.js, it is simply the fastest there is to transfer data:

https://mitar.tnode.com/post/in-nodejs-always-query-in-json-from-postgresql/

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Fabio Ugo Venchiarutti

Date:

30 October 2019, 19:51:13

On 30/10/2019 16:15, Mitar wrote:
> Hi!
> 
> On Wed, Oct 30, 2019 at 8:37 AM Merlin Moncure <mmoncure@gmail.com> wrote:
>> Check out libpqtypes: https://github.com/pgagarinov/libpqtypes
> 
> Interesting. I have looked at the code a bit but I do not find how it
> determines the type for inline compound types, like the ones they
> appear in my original SQL query example. Could you maybe point me to
> the piece of code there handling that? Because to my
> understanding/exploration that information is simply not exposed to
> the client in any way. :-(
> 
>> it does exactly what you want. It's a wrapper for libpq that provides
>> client side parsing for the binary protocol with array and composite
>> type parsing.
> 
> It looks to me that it does parsing of composite types only if they
> are registered composite types. But not for example ones you get if
> you project a subset of fields from a table in a subquery. That has no
> registered composite type?
> 
> Also, how you are handling discovery of registered types, do you read
> that on-demand from the database? They are not provided over the wire?
> 
>> Virtually any
>> non-C client application really ought to be using json rather than the
>> custom binary structures libpqtyps would provide.
> 
> I thought that initially, too, but then found out that JSON has some
> heavy limitations because the implementation in PostgreSQL is standard
> based. There is also no hook to do custom encoding of non-JSON values.
> So binary blobs are converted in an ugly way (base64 would be better).
> You also loose a lot of meta-information, because everything non-JSON
> gets converted to strings automatically. Like knowing what is a date.
> I think MongoDB with BSON made much more sense here. It looks like
> perfect balance between simplicity of JSON structure and adding few
> more useful data types.
> 
> But yes, JSON is great also because clients often have optimized JSON
> readers. Which can beat any other binary serialization format. In
> node.js, it is simply the fastest there is to transfer data:
> 
> https://mitar.tnode.com/post/in-nodejs-always-query-in-json-from-postgresql/
> 
> 
> Mitar
> 

Then perhaps, as opposed to wedging this into the tabular paradigm, a 
transition to more targeted support for hierarchical result 
representation would be preferable, just done directly by the backend an 
rendered by libpq... (perhaps still encapsulated as a DataRow field not 
to break the traditional model. Or perhaps a special RowDescription-like 
message in the backend protocol? Not my place to strongly push proposals 
there).

There's a lot of room for optimisation if done natively (think label 
deduplication at the source. Not sure if BSON works this way too).

There's also the problem of independent implementations of the 
protocol...AFAIK the JDBC client is not a wrapper to libpq and they'd 
also have to break their result surfacing paradigms to make it work...

Sounds like an enormous risk & undertaking for the hackers TBH, and I 
currently see another limiting factor to idea's popularity: as it 
stands, advanced SQL is daunting for the much of the industry, and IMHO 
the queries to generate arbitrarily structured & lightweight inline 
types/relations are relatively verbose and deeply nested (eg: last time 
I checked, stripping/renaming some attributes from a relation required 
subselecting them, and - prehaps due to PEBCAK - can't think of a way to 
create results as associative arrays indexed by attributes).

For this to gain traction, a more streamlined syntax/functions/operators 
for precision work may be necessary, or the result would only satisfy a 
narrow set of users who are already intimate with the state of affairs.

Can't help thinking that the current JSON-over-field pinhole may already 
be at the sweet spot between usefulness and inter-operability with 
existing systems. Just the SQL side of it could be less noisy and, yes, 
data type pidgeonhole problem could benefit from something like a GUC 
setting to electively break standard JSON compatibility, function 
arguments or else.

-- 
Regards

Fabio Ugo Venchiarutti
OSPCFC Network Engineering Dpt.
Ocado Technology

-- 

Notice: 
This email is confidential and may contain copyright material of 
members of the Ocado Group. Opinions and views expressed in this message 
may not necessarily reflect the opinions and views of the members of the 
Ocado Group.

If you are not the intended recipient, please notify us 
immediately and delete all copies of this message. Please note that it is 
your responsibility to scan this message for viruses.

References to the 
"Ocado Group" are to Ocado Group plc (registered in England and Wales with 
number 7098618) and its subsidiary undertakings (as that expression is 
defined in the Companies Act 2006) from time to time. The registered office 
of Ocado Group plc is Buildings One & Two, Trident Place, Mosquito Way, 
Hatfield, Hertfordshire, AL10 9UL.

Re: Automatically parsing in-line composite types

From

Merlin Moncure

Date:

31 October 2019, 01:07:32

On Wed, Oct 30, 2019 at 11:15 AM Mitar <mmitar@gmail.com> wrote:
>
> Hi!
>
> On Wed, Oct 30, 2019 at 8:37 AM Merlin Moncure <mmoncure@gmail.com> wrote:
> > Check out libpqtypes: https://github.com/pgagarinov/libpqtypes
>
> Interesting. I have looked at the code a bit but I do not find how it
> determines the type for inline compound types, like the ones they
> appear in my original SQL query example. Could you maybe point me to
> the piece of code there handling that? Because to my
> understanding/exploration that information is simply not exposed to
> the client in any way. :-(

It looks it up from the database.  See implementation in
https://github.com/pgagarinov/libpqtypes/blob/master/source/src/handler.c
 (look for macro LOOKUP_TYPES) and usage in
https://github.com/pgagarinov/libpqtypes/blob/master/source/src/regression-test.c.

> > it does exactly what you want. It's a wrapper for libpq that provides
> > client side parsing for the binary protocol with array and composite
> > type parsing.
>
> It looks to me that it does parsing of composite types only if they
> are registered composite types. But not for example ones you get if
> you project a subset of fields from a table in a subquery. That has no
> registered composite type?

Correct. Only declared (via CREATE TYPE) composite types will work due
to protocol limitations.  For custom C programming this is fine, but
limiting if you are interested in writing a driver that can handle any
type of object the database can throw at you; (and, don't forget
custom types at the C level!).

> Also, how you are handling discovery of registered types, do you read
> that on-demand from the database? They are not provided over the wire?

The client application has to declare at connection time which types
it is interested in, then they are looked up in the SQL level.   This
would be fairly typical of C applications, I think; but mostly this
works around the limitations of the binary protocol.

> > Virtually any
> > non-C client application really ought to be using json rather than the
> > custom binary structures libpqtyps would provide.
>
> I thought that initially, too, but then found out that JSON has some
> heavy limitations because the implementation in PostgreSQL is standard
> based. There is also no hook to do custom encoding of non-JSON values.
> So binary blobs are converted in an ugly way (base64 would be better).
> You also loose a lot of meta-information, because everything non-JSON
> gets converted to strings automatically. Like knowing what is a date.
> I think MongoDB with BSON made much more sense here. It looks like
> perfect balance between simplicity of JSON structure and adding few
> more useful data types.
>
> But yes, JSON is great also because clients often have optimized JSON
> readers. Which can beat any other binary serialization format. In
> node.js, it is simply the fastest there is to transfer data:

Sure, JSON has only very, very basic type support.  In a practical
sense this means type safety has to be built above the json layer,
(e.g. {"field: "foo", "type":"shape", "data": "(1,1), (2,2)"}) exactly
as we do with the textual sql protocol.

The postgres hacker community will tend to target major standards as a
matter of culture and prudence..this used to annoy me, now I support
this strongly.  Since the type system is extensible it's theoretically
possible to implement bson support or some other funky type safe
format.  I would personally argue (perhaps with good company against
such a type being incorporated in core, or even in contrib. In fact, I
argued (win some, lose some, heh!) that jsonb should be in contrib,
not core; we should be moving stuff OUT of the regular namespace and
into extensions, not the other way around..

Aside: now that stored procedures are good to go, there is one feature
left that IMNSHO postgres desperately needs,  and that is external
package repository management (on the order of CPAN, npm, etc) so that
3rd party repositories handle grooming, precompiling, packaging,
downloading, and deploying (via SQL) of extensions that interact with
the database at the C level.  Enterprise environments and managed
postgres providers will never allow custom compiled C extensions which
is the kiss of death; even if I wanted to use those extensions, I
can't.  So if you decided to scratch in itch and create a postgres
BSON type, no one would likely use it, since the chances of adoption
in core are slim to none.

I had seen your article, and liked it.  During the development of what
was to become the jsonb type, I had argued quite strenuously not to
have it completely displace the old json type variant on performance
and other grounds.

merlin

Re: Automatically parsing in-line composite types

From

Andres Freund

Date:

31 October 2019, 01:41:05

Hi,

On 2019-10-29 14:33:00 -0400, Tom Lane wrote:
> Mitar <mmitar@gmail.com> writes:
> > I think RowDescription should be extended to provide full recursive
> > metadata about all data types. That would be the best way to do it.
> 
> [ shrug... ]  In a world where stability of the wire protocol were
> of zero value, maybe we would do that.  In the real world, don't
> hold your breath.

Hm. Wouldn't it be fairly easy to allow the client to specify how much
metadata they'd want? I.e. opt-in into getting more complete metadata?

Presumably a lot of clients/applications wouldn't want the server to do
the extra work / use bandwidth for the full details anyway, so making a
more expansive RowDescription be explicitly opt-in would be good, even
if there were zero compatibility concerns.

There's different ways we could do the opt-in. We could use the "_pq_."
startup option stuff to opt in, we could make it an optional parameter
to D messages (it'd be mildly hacky because unfortunately
describe_target is not a counted text), we could use an additional
describe_type etc...

Greetings,

Andres Freund

Re: Automatically parsing in-line composite types

From

Mitar

Date:

31 October 2019, 02:05:34

Hi!

On Wed, Oct 30, 2019 at 3:06 PM Merlin Moncure <mmoncure@gmail.com> wrote:
> It looks it up from the database.

Yes, this is how I started doing it in my prototype as well.

> Correct. Only declared (via CREATE TYPE) composite types will work due
> to protocol limitations.

Exactly. This is where I got stuck, so this is why I started this thread. :-(

> So if you decided to scratch in itch and create a postgres
> BSON type, no one would likely use it, since the chances of adoption
> in core are slim to none.

Yea. :-( So we get back to square one. :-(

One other approach I was investigating was developing a Babel-like
transpiler for PostgreSQL SQL, so that I could have plugins which
would convert SQL queries to automatically encode values in JSON. And
then parse it back out once results arrive. Because yes, as you note,
JSON is the only stable and supported format among all installations
there is (except for the wire format, which has limitations). So
having to map to it and back, but without developer having to think
about it, might be the best solution.

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m

Re: Automatically parsing in-line composite types

From

Merlin Moncure

Date:

31 October 2019, 17:22:19

On Wed, Oct 30, 2019 at 5:41 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2019-10-29 14:33:00 -0400, Tom Lane wrote:
> > Mitar <mmitar@gmail.com> writes:
> > > I think RowDescription should be extended to provide full recursive
> > > metadata about all data types. That would be the best way to do it.
> >
> > [ shrug... ]  In a world where stability of the wire protocol were
> > of zero value, maybe we would do that.  In the real world, don't
> > hold your breath.
>
> Hm. Wouldn't it be fairly easy to allow the client to specify how much
> metadata they'd want? I.e. opt-in into getting more complete metadata?
>
> Presumably a lot of clients/applications wouldn't want the server to do
> the extra work / use bandwidth for the full details anyway, so making a
> more expansive RowDescription be explicitly opt-in would be good, even
> if there were zero compatibility concerns.
>
> There's different ways we could do the opt-in. We could use the "_pq_."
> startup option stuff to opt in, we could make it an optional parameter
> to D messages (it'd be mildly hacky because unfortunately
> describe_target is not a counted text), we could use an additional
> describe_type etc...

Suppose you had a set of 1000 records, with an array of composites (10
items per array).  How many times would the extra metadata describing
get sent following this approach?  The binary wire format is already
fatter in terms of bandwidth in typical cases (unless escaping is
involved which can cause exponential growth of the text format).  If
the answer is 10k, I'd be worried about performance.

merlin

Re: Automatically parsing in-line composite types

From

Tom Lane

Date:

31 October 2019, 17:59:50

Merlin Moncure <mmoncure@gmail.com> writes:
> On Wed, Oct 30, 2019 at 5:41 PM Andres Freund <andres@anarazel.de> wrote:
>> Hm. Wouldn't it be fairly easy to allow the client to specify how much
>> metadata they'd want? I.e. opt-in into getting more complete metadata?

> Suppose you had a set of 1000 records, with an array of composites (10
> items per array).  How many times would the extra metadata describing
> get sent following this approach?

Presumably once per Describe request (or once per simple Query) ...
but that facile answer actually isn't very satisfactory.  It's
at least theoretically possible that a column of type "record" could
have varying actual rowtype from one tuple to the next.  I don't want
to swear that every part of Postgres is capable of coping with that,
but a lot is --- see record_out() itself for the prototypical coding
pattern.  The wire protocol is really pretty tightly bound to the
assumption that every tuple in a resultset has the same description,
but that's going to fall down if we want to transmit details about
what "record" means.

The same problem occurs with arrays, specifically anyarray --- and
here I don't have to think hard to come up with a query that will
break it.  "select * from pg_stats" is enough to do that.

It's not only the wire protocol that's going to have issues with
varying rowtypes.  How would we deal with that in libpq's resultset
API, for example?  I wonder whether the JDBC API could handle it either.

tl;dr: there are a lot more worms in this can than you might guess.

            regards, tom lane

Re: Automatically parsing in-line composite types

From

Mitar

Date:

31 October 2019, 19:48:43

Hi!

On Thu, Oct 31, 2019 at 7:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> tl;dr: there are a lot more worms in this can than you might guess.

But with all those issues, how do people then use compound types in
practice at all? Just at the top-level and have custom code to parse
them out for every query they are making?


Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m