Thread: Container Types

Container Types

From
Vik Fearing
Date:
The standard has several constructs for creating new types from other 
types.  I don't mean anything like CREATE TYPE here, I mean things like 
this:

- ROW(a, b, c), (<explicit row value constructor>)
- ARRAY[a, b, c], (<array value constructor by enumeration>)
- PERIOD(a, b), (<period predicand>)
- MULTISET[a, b, c], (<multiset value constructor by enumeration>)
- MDARRAY[x(1:3)][a, b, c], (<md-array value constructor by enumeration>)

I am not sure what magic we use for the row value constructor.  We 
handle ARRAY by creating an array type for every non-array type that is 
created.  Periods are very similar to range types and we have to create 
new functions such as int4range(a,b) and int8range(a,b) instead of some 
kind of generic RANGE(a, b, '[)') and not worrying about what the type 
is as long as there is a btree opclass for it.

Obviously there would have to be an actual type in order to store it in 
a table, but what I am most interested in here is being able to create 
them on the fly.  I do not think it is feasible to create N new types 
for every type like we do for arrays on the off-chance you would want to 
put it in a PERIOD for example.

For those who know the code much better than I do, what would be a 
plausible way forward to support these containers?
-- 
Vik Fearing



Re: Container Types

From
Jeff Davis
Date:
On Tue, 2022-12-20 at 10:24 +0100, Vik Fearing wrote:
> Obviously there would have to be an actual type in order to store it
> in
> a table, but what I am most interested in here is being able to
> create
> them on the fly.  I do not think it is feasible to create N new types
> for every type like we do for arrays on the off-chance you would want
> to
> put it in a PERIOD for example.

By "on the fly" do you mean when creating real objects, like a table?
In that case it might not be so hard, because we can just create an
ordinary entry in pg_type.

But for this to be a complete feature, I think we need the container
types to be useful when constructed within a query, too. E.g.

  SELECT two_things(v1, v2) FROM foo;

where the result of two_things is some new type two_things_int_text
which is based on the types of v1 and v2 and has never been used
before.

I don't think it's reasonable to create a permanent pg_type entry on
the fly to answer a read-only query. But we could introduce some notion
of an ephemeral in-memory pg_type entry with its own OID, and create
that on the fly.

One way to do that might be to reserve some of the system OID space
(e.g. 15000-16000) for OIDs for temporary catalog entries, and then
have some in-memory structure that holds those temporary entries. Any
lookups in that range would search the in-memory structure instead of
the real catalog. All of this is easier said than done, but I think it
could work.

We'd also need to think about how to infer types through a container
type, e.g.

  SELECT second_thing(two_things(v1,v2)) FROM foo;

should infer that the return type of second_thing() is the type of v2.
To do that, perhaps pg_proc entries can include some kind of type
sublanguage to do this inference, e.g. "a, b -> b" for second_thing(),
or "a, b -> a" for first_thing().

Regards,
    Jeff Davis




Re: Container Types

From
Andres Freund
Date:
Hi,

On 2023-10-25 15:03:04 -0700, Jeff Davis wrote:
> On Tue, 2022-12-20 at 10:24 +0100, Vik Fearing wrote:
> > Obviously there would have to be an actual type in order to store it
> > in 
> > a table, but what I am most interested in here is being able to
> > create 
> > them on the fly.  I do not think it is feasible to create N new types
> > for every type like we do for arrays on the off-chance you would want
> > to 
> > put it in a PERIOD for example.
> 
> By "on the fly" do you mean when creating real objects, like a table?
> In that case it might not be so hard, because we can just create an
> ordinary entry in pg_type.
> 
> But for this to be a complete feature, I think we need the container
> types to be useful when constructed within a query, too. E.g.
> 
>   SELECT two_things(v1, v2) FROM foo;
> 
> where the result of two_things is some new type two_things_int_text
> which is based on the types of v1 and v2 and has never been used
> before.
> 
> I don't think it's reasonable to create a permanent pg_type entry on
> ethe fly to answer a read-only query. But we could introduce some notion
> of an ephemeral in-memory pg_type entry with its own OID, and create
> that on the fly.

I don't think particularly like the idea of an in-memory pg_type entry. But
I'm not sure we need that anyway - we already have this problem with record
types.  We support both named record types (tables and explicitly created
composite types) and ad-hoc ones (created if you write ROW(foo, bar) or
something like that).  If a record's typmod is negative, it refers to an
anonymous row type, if positive it's a named typmod.

We even have support for sharing such ad-hoc rowtypes across backends for
parallel query...

I'd look whether you can generalize that infrastructure.

Greetings,

Andres Freund



Re: Container Types

From
Jeff Davis
Date:
On Wed, 2023-10-25 at 16:01 -0700, Andres Freund wrote:
> I'd look whether you can generalize that infrastructure.

I had briefly looked at using the record type mechanism before, and it
seemed like a challenge because it doesn't really work when passing
through a function call:

   CREATE TABLE t(a INT, b TEXT);
   INSERT INTO t VALUES(1, 'one');
   CREATE FUNCTION id(RECORD) RETURNS RECORD LANGUAGE plpgsql AS
     $$ BEGIN RETURN $1; END; $$;
   SELECT t.a FROM t; -- 1
   SELECT (id(t)).a FROM t; -- ERROR

But now that I think about it, that's really a type inference
limitation, and that needs to be solved regardless.

After the type inference figures out what the right type is, then I
think you're right that an OID is not required to track it, and however
we do track it should be able to reuse some of the existing
infrastructure for dealing with record types.

Regards,
    Jeff Davis