Re: VARIANT / ANYTYPE datatype - Mailing list pgsql-hackers

From Darren Duncan
Subject Re: VARIANT / ANYTYPE datatype
Date
Msg-id 4DC46E1C.3070405@darrenduncan.net
Whole thread Raw
In response to Re: VARIANT / ANYTYPE datatype  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: VARIANT / ANYTYPE datatype  (Darren Duncan <darren@darrenduncan.net>)
List pgsql-hackers
Thanks for picking up on my mentioning union types; I wasn't sure if anyone did.

Merlin Moncure wrote:
> On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>
>> On 05/06/2011 04:08 PM, Alvaro Herrera wrote:
>>> Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:
>>>
>>>> I see VARIANT/ANYTYPE as the most general case of supporting union types,
>>>> which,
>>>> say, could have more specific examples of "allow any number or date here
>>>> but
>>>> nothing else".  If VARIANT is supported, unions in general ought to be
>>>> also.
>>> Okay, so aside from the performance (storage reduction) gained, there's
>>> this argument for having variant/union types.  It seems to me that this
>>> is indeed possible to build.  Completely general VARIANT, though, is
>>> rather complex.  A declared union, where you specify exactly which types
>>> can be part of the union, can be catalogued, so that the system knows
>>> exactly where to look when a type needs to be modified.  A general
>>> VARIANT however looks complex to me to solve.
>>>
>>> The problem is this: if an user attempts to drop a type, and this type
>>> is used in a variant somewhere, we would lose the stored data.  So the
>>> drop needs to be aborted.  Similarly, if we alter a type (easy example:
>>> a composite type) used in a variant, we need to cascade to modify all
>>> rows using that composite.
>>>
>>> If the unions that use a certain type are catalogued, we at least know
>>> what tables to scan to cascade.
>>>
>>> In a general variant, the system catalogs do not have the information of
>>> what type each variant masquerades as.  We would need to examine the
>>> variant's masqueraded types on each insert; if the current type is not
>>> found, add it.  This seems a bit expensive.
>> So how is a declared union going to look and operate? Something like this?
>>
>>    CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
>>    CREATE TABLE bar (myunion foo);
>>    INSERT INTO bar (myunion) VALUES (ival=>1), (tval=>'some text');
>>    UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;
>>
>>
>> Something like that could actually be quite nice for a number of purposes.
> 
> using your hypothetical example, could you cast types to the union?
> 
> select 1::int::foo;
> 
> record types would presumably work as well?  you could do a lot of
> *really* neat stuff that way...

Like with other respondents to this topic, I consider it much more useful and 
important, as well as practical, to support explicitly defined type unions than 
a wide-open ANYTYPE.

As for how it would operate:  The example above implies a union type 
implementation that is like C's concept of such, where you have to explicitly 
state how you want the value bit pattern to be interpreted, by naming ival/etc, 
rather than the DBMS just knowing that a particular value is of a specific type, 
because per-value stored metadata says so (like with SQLite).

While that might be best in practice for implementation, I had envisioned 
something more like set unions, so you could instead do it like this:
    CREATE TYPE foo AS UNION (int, text, timestamptz):    CREATE TABLE bar (myunion foo);    INSERT INTO bar (myunion)
VALUES(1), ('some text');    UPDATE bar SET myunion = CURRENT_TIMESTAMP;
 

Unlike a record type, where multiple attributes may have the same time, 
presumably with a union, they would all be distinct, and so you could use the 
type name itself to refer to each option; you don't have to make up "ival" or 
whatever ... unless there are situations where types don't have names.

When doing operations that are type-generic, such as equality test or 
assignment, especially assignment, between 2 things that are both declared to be 
of type foo, you could just do it with no verbosity same as if you were doing 2 
int or text or whatever.

When doing operations specific to int or text or whatever, or if you are 
assigning a foo-declared thing to an int/text-declared thing, then you'd want an 
explicit cast or assertion, such as "select myunion::int + 3 as answer from bar 
where is_a(myunion,'int')".

Whether you want an explicit cast to go the other way, I would assume you don't 
need to, like when you have DOMAINs; eg, I would expect the 4th line above to 
just work, because the system knows the type of CURRENT_TIMESTAMP and it knows 
that this is a member of the union type of myunion.  I see a UNION type as being 
like a DOMAIN type in reverse.

-- Darren Duncan


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Prefered Types
Next
From: Robert Haas
Date:
Subject: Re: patch for new feature: Buffer Cache Hibernation