Thread: VARIANT / ANYTYPE datatype

VARIANT / ANYTYPE datatype

From
Alvaro Herrera
Date:
Hello,

A customer came to us with this request: a way to store "any" data in a
column.  We've gone back and forth trying to determine reasonable
implementation restrictions, safety and useful semantics for them.
I note that this has been requested in the past:
http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
and both Oracle and MS-SQL have it and apparently people find them
useful.  I didn't find any indication that SQL contains anything
resembling this.

The main idea is to be able to store column values in an audit table
like this:

change_time    timestamptz
table_name    name
column_name    name
old_value    variant
new_value    variant
So per-column changes, which is much more convenient than the regular
idea of storing the whole NEW and/or OLD record(s).

Currently, they use text for old_value and new_value, but this is, of
course, not very satisfactory.

My thought was that a variant type would store the datum as
<typid><data> so that it would be possible to identify the datatype
stored in each column/row and interpret adequately, calling the
appropriate output function etc.  On input it would be limited to come
only from inside the system, not from the outside world, as that would
have obvious security implications; so it'd be similar to pg_node_tree
in that regard.

Now this has obvious limitations: first, any query that tries to extract
data would need to include a cast of the variant value to the
appropriate type, so that the type can be resolved early.  Thus,
trying to extract rows of different types would be forbidden.

Also, there would be a security problem with a datum storing something
whose datatype later changes (consider a user-defined record type or
things like that).  My first reaction was to do something like CREATE TYPE foo VARIANT OF (int, text, timestamptz);
and then you could declare old_value with type foo, which would only
allow values of the declared types.  This makes it easy to catalogue
used types in any variant, and thus easy to restrict modifying or
dropping types that are used in some variant.  However, this idea was
rejected by the customer due to the unusability: you would have to
remember to edit the variant to add the new type anytime you added a new
column to a table, which would be cumbersome.

What the customer suggested was to have a new fork, which stores
type OIDs of datatypes used in the variant.  Then when a type is to be
altered, all forks would be scanned to determine if the type is used,
and raise an error if so.  I rejected that idea as unworkable.

However, as a middle ground we agreed that we could allow a declared
variant to store any pinned type without restrictions; those can't be
changed or dropped so there's no safety concern.  Storing other types
(user-defined types, records and so on) would require some safety net.

Before spending too much time exploring a detailed design, I'd like to
hear some opinions on the basic idea.

-- 
Álvaro Herrera <alvherre@alvh.no-ip.org>


Re: VARIANT / ANYTYPE datatype

From
"Kevin Grittner"
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> A customer came to us with this request: a way to store "any" data
> in a column.
+1
More than once (usually in replication, interface, or audit
situations) I've had to create a table with one column each of a
number of different data types, only one of which will be used in
each row.  It would make life much simpler for programming such
things if an "unknown" type could be stored.  I could either cast
the types in the rows based on the related identifier column, but it
would be even nicer if they could be read "unknown" and pg_typeof or
something similar could be used to control flow based on the type.
As one example, only *last night* I created the following table as
part of the development of our shop's next generation of software. 
(Yes I know about Slony, Londiste, and others -- we need to roll
something out which integrates with existing systems, without
production disruption, over the next few years.  This does require a
custom solution.)         Table "public.DbTranOpValue"    Column     |       Type        | Modifiers
----------------+-------------------+-----------countyNo       | "CountyNoT"       | not nullbackendPid     | integer
       | not nulltranStart      | "TimestampT"      | not nulllogRecordSeqNo | "LogRecordSeqNoT" | not nullcolumnName
 | "ColumnNameT"     | not nullisAfter        | "BooleanT"        | not nulltextValue      | text
|intValue      | bigint            |numericValue   | numeric           |binaryValue    | bytea
|booleanValue  | "BooleanT"        |dateValue      | "DateT"           |timeValue      | "TimeT"
|timestampValue| "TimestampT"      |
 
Indexes:   "DbTranOpValue_pkey" PRIMARY KEY, btree     ("backendPid", "tranStart", "logRecordSeqNo", "columnName")
I would much rather have had a "value" column of unknown type.
-Kevin


Re: VARIANT / ANYTYPE datatype

From
Tom Lane
Date:
Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
> A customer came to us with this request: a way to store "any" data in a
> column.  We've gone back and forth trying to determine reasonable
> implementation restrictions, safety and useful semantics for them.

Yes, it seems rather messy.

> The main idea is to be able to store column values in an audit table
> like this:
> old_value    variant
> new_value    variant
> Currently, they use text for old_value and new_value, but this is, of
> course, not very satisfactory.

Just out of curiosity, what actual functionality gain would ensue over
just using text?  It seems like doing anything useful with the audit
table contents would still require casting the column to text, or the
moral equivalent of that.

If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[],
it could perhaps be used in pg_statistic, in place of the rather ugly
anyarray hack that's there now.  But I note that nothing above the level
of C code can do anything very useful with the contents of pg_statistic,
and I'm not sure that having an official type would change that.

> However, as a middle ground we agreed that we could allow a declared
> variant to store any pinned type without restrictions; those can't be
> changed or dropped so there's no safety concern.

If you're going to accept that there are restrictions, I don't see that
there is a good argument against your thought of a declared "union"
type.  At least then it's clear what the restrictions are.  I am firmly
against exposing the notion of "pinned" types in any user-visible SQL
semantics, and even more firmly against the idea of creating fundamental
functionality that only works for built-in types and can't be extended
to add-on types.
        regards, tom lane


Re: VARIANT / ANYTYPE datatype

From
Darren Duncan
Date:
Alvaro Herrera wrote:
> A customer came to us with this request: a way to store "any" data in a
> column.  We've gone back and forth trying to determine reasonable
> implementation restrictions, safety and useful semantics for them.
> I note that this has been requested in the past:
> http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php
> and both Oracle and MS-SQL have it and apparently people find them
> useful.  I didn't find any indication that SQL contains anything
> resembling this.

I see VARIANT/ANYTYPE as the most general case of supporting union types, which, 
say, could have more specific examples of "allow any number or date here but 
nothing else".  If VARIANT is supported, unions in general ought to be also.

The most effective way of supporting VARIANT or union types in general is having 
an implementation where in the general case each value in the database knows its 
own data type rather than the data type being provided by a context such as what 
table column it is in.  For example, if rather than storing a data value 
directly we store a 2-attribute struct naming a data type and pointing to or 
holding the data value.

See how SQLite works as an example of how VARIANTs or unions could work, 
although that on its own would need to be made more comprehensive for Pg.

I claim ignorance as to how Pg currently implements these matters.

Where VARIANT/union types are supported by default, declaring more specific 
types is just a type constraint and an optimization.

Of course, when we know the type of a column/etc isn't going to be VARIANT or 
some other union type, then a simple optimization allows us to just store the 
value and have its type provided by context rather than the struct.

-- Darren Duncan


Re: VARIANT / ANYTYPE datatype

From
Merlin Moncure
Date:
On Wed, May 4, 2011 at 12:36 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:
>> A customer came to us with this request: a way to store "any" data in a
>> column.  We've gone back and forth trying to determine reasonable
>> implementation restrictions, safety and useful semantics for them.
>
> Yes, it seems rather messy.
>
>> The main idea is to be able to store column values in an audit table
>> like this:
>> old_value     variant
>> new_value     variant
>> Currently, they use text for old_value and new_value, but this is, of
>> course, not very satisfactory.
>
> Just out of curiosity, what actual functionality gain would ensue over
> just using text?  It seems like doing anything useful with the audit
> table contents would still require casting the column to text, or the
> moral equivalent of that.

The problem with text is that for composite/table types you lose the
ability to cast back when the structure changes.   Most of the
auditing implementations I've done cast new/old to text and drop it
directly into a single column on the audit record.  It works ok, but
you have to messily update the text later when the table changes.  For
non-variant composites you can add columns down the line and it works
ok in dependent records without too much fuss.

I think though that getting this to work such that type dependency is
via row/column instead of just column is going to be tough.  Outside
of composites, I don't see much advantages vs the text route,
performance maybe?

merlin


Re: VARIANT / ANYTYPE datatype

From
Andrew Dunstan
Date:

On 05/04/2011 01:36 PM, Tom Lane wrote:
>
>> The main idea is to be able to store column values in an audit table
>> like this:
>> old_value    variant
>> new_value    variant
>> Currently, they use text for old_value and new_value, but this is, of
>> course, not very satisfactory.
> Just out of curiosity, what actual functionality gain would ensue over
> just using text?  It seems like doing anything useful with the audit
> table contents would still require casting the column to text, or the
> moral equivalent of that.
>

Yeah, I've been down this road once or twice, and I think that's the $64 
question.

I wrote a custom audit app two or three years ago. After several 
iterations the customer and I found that using an hstore for the old/new 
(or old record / changeset, which is what we actually use) was the most 
suitable for our use.

I think if we did this we'd need to add some sort of is_type()  and 
typeof() functions for variant objects.

cheers

andrew


Re: VARIANT / ANYTYPE datatype

From
Merlin Moncure
Date:
On Wed, May 4, 2011 at 2:55 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 05/04/2011 01:36 PM, Tom Lane wrote:
>>
>>> The main idea is to be able to store column values in an audit table
>>> like this:
>>> old_value       variant
>>> new_value       variant
>>> Currently, they use text for old_value and new_value, but this is, of
>>> course, not very satisfactory.
>>
>> Just out of curiosity, what actual functionality gain would ensue over
>> just using text?  It seems like doing anything useful with the audit
>> table contents would still require casting the column to text, or the
>> moral equivalent of that.
>>
>
> Yeah, I've been down this road once or twice, and I think that's the $64
> question.
>
> I wrote a custom audit app two or three years ago. After several iterations
> the customer and I found that using an hstore for the old/new (or old record
> / changeset, which is what we actually use) was the most suitable for our
> use.

yeah -- +1 on that method.  I think it's really the right way to go
with the recent hstore enhancements.

merlin


Re: VARIANT / ANYTYPE datatype

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
> Alvaro Herrera <alvherre@alvh.no-ip.org> writes:

> > The main idea is to be able to store column values in an audit table
> > like this:
> > old_value    variant
> > new_value    variant
> > Currently, they use text for old_value and new_value, but this is, of
> > course, not very satisfactory.
> 
> Just out of curiosity, what actual functionality gain would ensue over
> just using text?  It seems like doing anything useful with the audit
> table contents would still require casting the column to text, or the
> moral equivalent of that.

Storage efficiency.  These people have really huge databases; small
changes in how tight things are packed makes a large difference for
them.  (For example, we developed a type to store SHA-2 digests in a
more compact way than bytea mainly because of this reason.  Also, at
some time they also wanted to apply compression to hstore keys and
values.)

As someone commented downthread, they also want to have things such as a
"typeof" operator.  It could be used in (say) a plpgsql function to
choose different branches of code.

Things go wild when you think about using VARIANT as a function
parameter type.  Given how complex are our current function resolution
rules I'm not really sold on making this work at all, so initially I'm
aiming at just raising an error in that case.

> If we did have a concept of ANYTYPE, or really we'd need ANYTYPE[],
> it could perhaps be used in pg_statistic, in place of the rather ugly
> anyarray hack that's there now.  But I note that nothing above the level
> of C code can do anything very useful with the contents of pg_statistic,
> and I'm not sure that having an official type would change that.

Well, we could get rid of ugly hacks that are in various places in the
backend code to cope with this.  Nor sure how useful it'd be for doing
things at the SQL level.

> > However, as a middle ground we agreed that we could allow a declared
> > variant to store any pinned type without restrictions; those can't be
> > changed or dropped so there's no safety concern.
> 
> If you're going to accept that there are restrictions, I don't see that
> there is a good argument against your thought of a declared "union"
> type.  At least then it's clear what the restrictions are.  I am firmly
> against exposing the notion of "pinned" types in any user-visible SQL
> semantics, and even more firmly against the idea of creating fundamental
> functionality that only works for built-in types and can't be extended
> to add-on types.

The argument was "it's too cumbersome to use" (which makes sense: users
are certain to forget to add the new type to the declared union when
they add a new column to the table, possibly causing a run-time error if
the audit trigger is in place and attempts to load a new datum into the
log table.)

I understand the reluctancy to use pinned-ness in a user-visible way.
Back to the drawing board.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: VARIANT / ANYTYPE datatype

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
>> Just out of curiosity, what actual functionality gain would ensue over
>> just using text?  It seems like doing anything useful with the audit
>> table contents would still require casting the column to text, or the
>> moral equivalent of that.

> Storage efficiency.  These people have really huge databases; small
> changes in how tight things are packed makes a large difference for
> them.  (For example, we developed a type to store SHA-2 digests in a
> more compact way than bytea mainly because of this reason.  Also, at
> some time they also wanted to apply compression to hstore keys and
> values.)

Hmm.  The prototypical case for this would probably be a 4-byte int,
which if you add an OID to it so you can resolve the type is going to
take 8 bytes, plus you are going to need a length word because there is
really no alternative to the "VARIANT" type being varlena overall, which
makes it 9 bytes if you're lucky on alignment and up to 16 if you're
not.  That is not shorter than the average length of the text
representation of an int.  The numbers don't seem a lot better for
8-byte quantities like int8, float8, or timestamp.  It might be
marginally worthwhile for timestamp, but surely this is a huge amount of
effort to substitute for thinking of a more compact text representation
for timestamps.

Pardon me for being unconvinced.
        regards, tom lane


Re: VARIANT / ANYTYPE datatype

From
"David E. Wheeler"
Date:
On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote:

> As someone commented downthread, they also want to have things such as a
> "typeof" operator.  It could be used in (say) a plpgsql function to
> choose different branches of code.

FWIW, pg_typeof(any) has been in core since 9.0.

Best,

David



Re: VARIANT / ANYTYPE datatype

From
Andrew Dunstan
Date:

On 05/04/2011 07:05 PM, Tom Lane wrote:
> Alvaro Herrera<alvherre@commandprompt.com>  writes:
>> Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
>>> Just out of curiosity, what actual functionality gain would ensue over
>>> just using text?  It seems like doing anything useful with the audit
>>> table contents would still require casting the column to text, or the
>>> moral equivalent of that.
>> Storage efficiency.  These people have really huge databases; small
>> changes in how tight things are packed makes a large difference for
>> them.  (For example, we developed a type to store SHA-2 digests in a
>> more compact way than bytea mainly because of this reason.  Also, at
>> some time they also wanted to apply compression to hstore keys and
>> values.)
> Hmm.  The prototypical case for this would probably be a 4-byte int,
> which if you add an OID to it so you can resolve the type is going to
> take 8 bytes, plus you are going to need a length word because there is
> really no alternative to the "VARIANT" type being varlena overall, which
> makes it 9 bytes if you're lucky on alignment and up to 16 if you're
> not.  That is not shorter than the average length of the text
> representation of an int.  The numbers don't seem a lot better for
> 8-byte quantities like int8, float8, or timestamp.  It might be
> marginally worthwhile for timestamp, but surely this is a huge amount of
> effort to substitute for thinking of a more compact text representation
> for timestamps.
>
> Pardon me for being unconvinced.
>
>             

I'm far from convinced that storing deltas per column rather than per 
record is a win anyway. I don't have hard numbers to hand, but my vague 
recollection is that my tests showed it to be a design that used more space.

cheers

andrew


Re: VARIANT / ANYTYPE datatype

From
Tom Lane
Date:
"David E. Wheeler" <david@kineticode.com> writes:
> On May 4, 2011, at 3:04 PM, Alvaro Herrera wrote:
>> As someone commented downthread, they also want to have things such as a
>> "typeof" operator.  It could be used in (say) a plpgsql function to
>> choose different branches of code.

> FWIW, pg_typeof(any) has been in core since 9.0.

But note that pg_typeof() would surely say "VARIANT" if applied to such
a datatype.  You'd need some other, VARIANT-specific function that knew
enough to dig into the value at runtime and get the OID therein.
        regards, tom lane


Re: VARIANT / ANYTYPE datatype

From
Alvaro Herrera
Date:
Excerpts from Tom Lane's message of mié may 04 20:05:54 -0300 2011:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
> > Excerpts from Tom Lane's message of mié may 04 14:36:44 -0300 2011:
> >> Just out of curiosity, what actual functionality gain would ensue over
> >> just using text?  It seems like doing anything useful with the audit
> >> table contents would still require casting the column to text, or the
> >> moral equivalent of that.
> 
> > Storage efficiency.  These people have really huge databases; small
> > changes in how tight things are packed makes a large difference for
> > them.  (For example, we developed a type to store SHA-2 digests in a
> > more compact way than bytea mainly because of this reason.  Also, at
> > some time they also wanted to apply compression to hstore keys and
> > values.)
> 
> Hmm.  The prototypical case for this would probably be a 4-byte int,
> which if you add an OID to it so you can resolve the type is going to
> take 8 bytes, plus you are going to need a length word because there is
> really no alternative to the "VARIANT" type being varlena overall, which
> makes it 9 bytes if you're lucky on alignment and up to 16 if you're
> not.

Hmm, I was hoping that we could skip making it varlena at least in some
cases ... perhaps that's hopeless, in which case we'll have to reassess
the whole idea.  Still there's the new functionality argument, though.

As a followup idea there exists the desire to store records as records
and not text representation of same (given differing record types, of
course), for which it'd be more worthwhile.

> It might be
> marginally worthwhile for timestamp, but surely this is a huge amount of
> effort to substitute for thinking of a more compact text representation
> for timestamps.

Surely if it's just for timestamps it'd be a waste of time.

> Pardon me for being unconvinced.

:-)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: VARIANT / ANYTYPE datatype

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> As a followup idea there exists the desire to store records as records
> and not text representation of same (given differing record types, of
> course), for which it'd be more worthwhile.

Maybe.  The conventional wisdom is that text representation of data is
more compact than PG's internal representation by a significant factor
--- our FAQ says up to 5x, in fact.  I know that that's including row
overhead and indexes and so on, but I still don't find it to be a given
that you're going to win on space with this sort of trick.

Another point here is that once the values in question get to be any
significant number of bytes apiece, both text and the proposed VARIANT
representation could be subject to compression, which would greatly
reduce any nominal advantage of one over the other.  Any
representational inefficiency in either would just be grist for the
compressor.
        regards, tom lane


Re: VARIANT / ANYTYPE datatype

From
Merlin Moncure
Date:
On Wed, May 4, 2011 at 8:03 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@commandprompt.com> writes:
>> As a followup idea there exists the desire to store records as records
>> and not text representation of same (given differing record types, of
>> course), for which it'd be more worthwhile.
>
> Maybe.  The conventional wisdom is that text representation of data is
> more compact than PG's internal representation by a significant factor
> --- our FAQ says up to 5x, in fact.  I know that that's including row
> overhead and indexes and so on, but I still don't find it to be a given
> that you're going to win on space with this sort of trick.

I've done a lot of testing of the text vs binary format on the wire
format...not exactly the same set of issues, but pretty close since
you have to send all the oids, lengths, etc.   Conventional wisdom is
correct although overstated for this topic.  Even in truly
pathological cases for text, for example in sending multiple levels of
redundant escaping in complex structures, the text format will almost
always be smaller.  For 'typical' data it can be significantly
smaller.  Two exceptions most people will run into are bytea obviously
and the timestamp family of types where binary style manipulation is a
huge win both in terms of space and performance.

For complex data (say 3+ levels of composites stacked in arrays),
binary type formats are much *faster*, albeit larger, via binary as
long as you are not bandwidth constrained, and presumably they would
be as well for variants. Perhaps even more so, because some of the
manipulations made converting tuple storage to binary wire formats
don't have to happen.  That said, while there are use cases for
sending highly structured data over the wire, I can't think of any for
direct storage on a table in variant type scenarios, at least not yet
:-).

merlin


Re: VARIANT / ANYTYPE datatype

From
Jim Nasby
Date:
On May 4, 2011, at 6:24 PM, Andrew Dunstan wrote:
> I'm far from convinced that storing deltas per column rather than per record is a win anyway. I don't have hard
numbersto hand, but my vague recollection is that my tests showed it to be a design that used more space. 

It depends on how many fields you're changing in one go and how wide the table is. It's also a PITA to identify what
fieldsactually changed if you're storing everything. In the case of logging, I'd say that what's really needed is a way
tostore a table record that has an indicator of what fields actually changed (and possibly not storing anything for
fieldsthat didn't change). That table record would need to also deal with changes to the underlying table structure. 
--
Jim C. Nasby, Database Architect                   jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net




Re: VARIANT / ANYTYPE datatype

From
Andrew Dunstan
Date:

On 05/05/2011 01:00 PM, Jim Nasby wrote:
> On May 4, 2011, at 6:24 PM, Andrew Dunstan wrote:
>> I'm far from convinced that storing deltas per column rather than per record is a win anyway. I don't have hard
numbersto hand, but my vague recollection is that my tests showed it to be a design that used more space.
 
> It depends on how many fields you're changing in one go and how wide the table is. It's also a PITA to identify what
fieldsactually changed if you're storing everything.
 

No it's not. Instead of storing OLD/NEW, store a base record and a delta 
record (an hstore with just the changed fields) for an update. This 
saves space and means you only have to calculate what changed once.



cheers

andrew


Re: VARIANT / ANYTYPE datatype

From
Alvaro Herrera
Date:
Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:

> I see VARIANT/ANYTYPE as the most general case of supporting union types, which, 
> say, could have more specific examples of "allow any number or date here but 
> nothing else".  If VARIANT is supported, unions in general ought to be also.

Okay, so aside from the performance (storage reduction) gained, there's
this argument for having variant/union types.  It seems to me that this
is indeed possible to build.  Completely general VARIANT, though, is
rather complex.  A declared union, where you specify exactly which types
can be part of the union, can be catalogued, so that the system knows
exactly where to look when a type needs to be modified.  A general
VARIANT however looks complex to me to solve.

The problem is this: if an user attempts to drop a type, and this type
is used in a variant somewhere, we would lose the stored data.  So the
drop needs to be aborted.  Similarly, if we alter a type (easy example:
a composite type) used in a variant, we need to cascade to modify all
rows using that composite.

If the unions that use a certain type are catalogued, we at least know
what tables to scan to cascade.

In a general variant, the system catalogs do not have the information of
what type each variant masquerades as.  We would need to examine the
variant's masqueraded types on each insert; if the current type is not
found, add it.  This seems a bit expensive.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: VARIANT / ANYTYPE datatype

From
Andrew Dunstan
Date:

On 05/06/2011 04:08 PM, Alvaro Herrera wrote:
> Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:
>
>> I see VARIANT/ANYTYPE as the most general case of supporting union types, which,
>> say, could have more specific examples of "allow any number or date here but
>> nothing else".  If VARIANT is supported, unions in general ought to be also.
> Okay, so aside from the performance (storage reduction) gained, there's
> this argument for having variant/union types.  It seems to me that this
> is indeed possible to build.  Completely general VARIANT, though, is
> rather complex.  A declared union, where you specify exactly which types
> can be part of the union, can be catalogued, so that the system knows
> exactly where to look when a type needs to be modified.  A general
> VARIANT however looks complex to me to solve.
>
> The problem is this: if an user attempts to drop a type, and this type
> is used in a variant somewhere, we would lose the stored data.  So the
> drop needs to be aborted.  Similarly, if we alter a type (easy example:
> a composite type) used in a variant, we need to cascade to modify all
> rows using that composite.
>
> If the unions that use a certain type are catalogued, we at least know
> what tables to scan to cascade.
>
> In a general variant, the system catalogs do not have the information of
> what type each variant masquerades as.  We would need to examine the
> variant's masqueraded types on each insert; if the current type is not
> found, add it.  This seems a bit expensive.
>



So how is a declared union going to look and operate? Something like this?
    CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):    CREATE TABLE bar (myunion foo);    INSERT
INTObar (myunion) VALUES (ival=>1), (tval=>'some text');    UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;
 


Something like that could actually be quite nice for a number of purposes.


cheers

andrew


Re: VARIANT / ANYTYPE datatype

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:
>> I see VARIANT/ANYTYPE as the most general case of supporting union types, which, 
>> say, could have more specific examples of "allow any number or date here but 
>> nothing else".  If VARIANT is supported, unions in general ought to be also.

> Okay, so aside from the performance (storage reduction) gained, there's
> this argument for having variant/union types.  It seems to me that this
> is indeed possible to build.  Completely general VARIANT, though, is
> rather complex.

Yeah.  I have no particular objection to a UNION over a specified set of
types, but am not very happy with the idea of an unconstrained union.

Also, a UNION declaration would allow attaching a field name to each
containable type, as I see Andrew just posted about.  That seems like
potentially a good idea.

> In a general variant, the system catalogs do not have the information of
> what type each variant masquerades as.  We would need to examine the
> variant's masqueraded types on each insert; if the current type is not
> found, add it.  This seems a bit expensive.

Not to mention race-condition-prone.  How do you ensure someone is not
inserting another instance of the variant, with some previously not used
content type, while this is going on?
        regards, tom lane


Re: VARIANT / ANYTYPE datatype

From
Merlin Moncure
Date:
On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
>
> On 05/06/2011 04:08 PM, Alvaro Herrera wrote:
>>
>> Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:
>>
>>> I see VARIANT/ANYTYPE as the most general case of supporting union types,
>>> which,
>>> say, could have more specific examples of "allow any number or date here
>>> but
>>> nothing else".  If VARIANT is supported, unions in general ought to be
>>> also.
>>
>> Okay, so aside from the performance (storage reduction) gained, there's
>> this argument for having variant/union types.  It seems to me that this
>> is indeed possible to build.  Completely general VARIANT, though, is
>> rather complex.  A declared union, where you specify exactly which types
>> can be part of the union, can be catalogued, so that the system knows
>> exactly where to look when a type needs to be modified.  A general
>> VARIANT however looks complex to me to solve.
>>
>> The problem is this: if an user attempts to drop a type, and this type
>> is used in a variant somewhere, we would lose the stored data.  So the
>> drop needs to be aborted.  Similarly, if we alter a type (easy example:
>> a composite type) used in a variant, we need to cascade to modify all
>> rows using that composite.
>>
>> If the unions that use a certain type are catalogued, we at least know
>> what tables to scan to cascade.
>>
>> In a general variant, the system catalogs do not have the information of
>> what type each variant masquerades as.  We would need to examine the
>> variant's masqueraded types on each insert; if the current type is not
>> found, add it.  This seems a bit expensive.
>
> So how is a declared union going to look and operate? Something like this?
>
>    CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
>    CREATE TABLE bar (myunion foo);
>    INSERT INTO bar (myunion) VALUES (ival=>1), (tval=>'some text');
>    UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;
>
>
> Something like that could actually be quite nice for a number of purposes.

using your hypothetical example, could you cast types to the union?

select 1::int::foo;

record types would presumably work as well?  you could do a lot of
*really* neat stuff that way...

merlin


Re: VARIANT / ANYTYPE datatype

From
Darren Duncan
Date:
Thanks for picking up on my mentioning union types; I wasn't sure if anyone did.

Merlin Moncure wrote:
> On Fri, May 6, 2011 at 3:18 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>>
>> On 05/06/2011 04:08 PM, Alvaro Herrera wrote:
>>> Excerpts from Darren Duncan's message of mié may 04 15:33:33 -0300 2011:
>>>
>>>> I see VARIANT/ANYTYPE as the most general case of supporting union types,
>>>> which,
>>>> say, could have more specific examples of "allow any number or date here
>>>> but
>>>> nothing else".  If VARIANT is supported, unions in general ought to be
>>>> also.
>>> Okay, so aside from the performance (storage reduction) gained, there's
>>> this argument for having variant/union types.  It seems to me that this
>>> is indeed possible to build.  Completely general VARIANT, though, is
>>> rather complex.  A declared union, where you specify exactly which types
>>> can be part of the union, can be catalogued, so that the system knows
>>> exactly where to look when a type needs to be modified.  A general
>>> VARIANT however looks complex to me to solve.
>>>
>>> The problem is this: if an user attempts to drop a type, and this type
>>> is used in a variant somewhere, we would lose the stored data.  So the
>>> drop needs to be aborted.  Similarly, if we alter a type (easy example:
>>> a composite type) used in a variant, we need to cascade to modify all
>>> rows using that composite.
>>>
>>> If the unions that use a certain type are catalogued, we at least know
>>> what tables to scan to cascade.
>>>
>>> In a general variant, the system catalogs do not have the information of
>>> what type each variant masquerades as.  We would need to examine the
>>> variant's masqueraded types on each insert; if the current type is not
>>> found, add it.  This seems a bit expensive.
>> So how is a declared union going to look and operate? Something like this?
>>
>>    CREATE TYPE foo AS UNION (ival int, tval text, tsval timestamptz):
>>    CREATE TABLE bar (myunion foo);
>>    INSERT INTO bar (myunion) VALUES (ival=>1), (tval=>'some text');
>>    UPDATE bar SET myunion.tsval = CURRENT_TIMESTAMP;
>>
>>
>> Something like that could actually be quite nice for a number of purposes.
> 
> using your hypothetical example, could you cast types to the union?
> 
> select 1::int::foo;
> 
> record types would presumably work as well?  you could do a lot of
> *really* neat stuff that way...

Like with other respondents to this topic, I consider it much more useful and 
important, as well as practical, to support explicitly defined type unions than 
a wide-open ANYTYPE.

As for how it would operate:  The example above implies a union type 
implementation that is like C's concept of such, where you have to explicitly 
state how you want the value bit pattern to be interpreted, by naming ival/etc, 
rather than the DBMS just knowing that a particular value is of a specific type, 
because per-value stored metadata says so (like with SQLite).

While that might be best in practice for implementation, I had envisioned 
something more like set unions, so you could instead do it like this:
    CREATE TYPE foo AS UNION (int, text, timestamptz):    CREATE TABLE bar (myunion foo);    INSERT INTO bar (myunion)
VALUES(1), ('some text');    UPDATE bar SET myunion = CURRENT_TIMESTAMP;
 

Unlike a record type, where multiple attributes may have the same time, 
presumably with a union, they would all be distinct, and so you could use the 
type name itself to refer to each option; you don't have to make up "ival" or 
whatever ... unless there are situations where types don't have names.

When doing operations that are type-generic, such as equality test or 
assignment, especially assignment, between 2 things that are both declared to be 
of type foo, you could just do it with no verbosity same as if you were doing 2 
int or text or whatever.

When doing operations specific to int or text or whatever, or if you are 
assigning a foo-declared thing to an int/text-declared thing, then you'd want an 
explicit cast or assertion, such as "select myunion::int + 3 as answer from bar 
where is_a(myunion,'int')".

Whether you want an explicit cast to go the other way, I would assume you don't 
need to, like when you have DOMAINs; eg, I would expect the 4th line above to 
just work, because the system knows the type of CURRENT_TIMESTAMP and it knows 
that this is a member of the union type of myunion.  I see a UNION type as being 
like a DOMAIN type in reverse.

-- Darren Duncan


Re: VARIANT / ANYTYPE datatype

From
Darren Duncan
Date:
To follow-up, an additional feature that would be useful and resembles union 
types is the variant where you could declare a union type first and then 
separately other types could declare they are a member of the union.  I'm 
talking about loosely what mixins or type-roles or interfaces etc are in other 
languages.  The most trivial example would be declaring an ENUM-alike first and 
then separately declaring the component values where the latter declare they are 
part of the ENUM, and this could make it easier to add or change ENUM values. 
But keep in mind that this is a distinct concept from what we're otherwise 
talking about as being union types. -- Darren Duncan


Re: VARIANT / ANYTYPE datatype

From
Bruce Momjian
Date:
Darren Duncan wrote:
> To follow-up, an additional feature that would be useful and resembles union 
> types is the variant where you could declare a union type first and then 
> separately other types could declare they are a member of the union.  I'm 
> talking about loosely what mixins or type-roles or interfaces etc are in other 
> languages.  The most trivial example would be declaring an ENUM-alike first and 
> then separately declaring the component values where the latter declare they are 
> part of the ENUM, and this could make it easier to add or change ENUM values. 
> But keep in mind that this is a distinct concept from what we're otherwise 
> talking about as being union types. -- Darren Duncan

Should this be a TODO item?

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + It's impossible for everything to be true. +


Re: VARIANT / ANYTYPE datatype

From
Alvaro Herrera
Date:
Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:
> Darren Duncan wrote:
> > To follow-up, an additional feature that would be useful and resembles union 
> > types is the variant where you could declare a union type first and then 
> > separately other types could declare they are a member of the union.  I'm 
> > talking about loosely what mixins or type-roles or interfaces etc are in other 
> > languages.  The most trivial example would be declaring an ENUM-alike first and 
> > then separately declaring the component values where the latter declare they are 
> > part of the ENUM, and this could make it easier to add or change ENUM values. 
> > But keep in mind that this is a distinct concept from what we're otherwise 
> > talking about as being union types. -- Darren Duncan
> 
> Should this be a TODO item?

The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
Seems like it'd be pretty painful.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: VARIANT / ANYTYPE datatype

From
Darren Duncan
Date:
Alvaro Herrera wrote:
> Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:
>> Darren Duncan wrote:
>>> To follow-up, an additional feature that would be useful and resembles union 
>>> types is the variant where you could declare a union type first and then 
>>> separately other types could declare they are a member of the union.  I'm 
>>> talking about loosely what mixins or type-roles or interfaces etc are in other 
>>> languages.  The most trivial example would be declaring an ENUM-alike first and 
>>> then separately declaring the component values where the latter declare they are 
>>> part of the ENUM, and this could make it easier to add or change ENUM values. 
>>> But keep in mind that this is a distinct concept from what we're otherwise 
>>> talking about as being union types. -- Darren Duncan
>> Should this be a TODO item?
> 
> The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
> Seems like it'd be pretty painful.
From the perspective of users, the single greatest distinction between these 2 
kinds of unions is being closed versus being open, and that is the primary 
reason to choose one over the other.

A closed union is the C-style, where the union type declares what other types or 
values it ranges over.  The closed union is best when the union definer can 
reasonably assume that the union won't either ever or would rarely be changed, 
and in particular can assume that application or database code would have 
knowledge of the parts that it deals specially with, so it can be assumed that 
if the closed union type ever is changed then any code designed to use it may be 
changed at the same time.

A good example for a closed union would be a boolean type which just ranges over 
the two singletons false and true or an order type which ranges just over the 
three singletons decrease, same, increase.  Or a type which enumerates the 7 
days of the week, as this is unlikely to change in the life of a system.

An open union is the mixin style, where the component types declare they are 
part of the union.  The open union is best when it is likely that there would be 
either user-defined or extension-defined new types for the union to come along 
later, and we want to have code that can be generic or polymorphic for any types 
that can be used in particular ways.

Examples of open union types could be number, which all the numeric types 
compose, and so you can know say that you can use the generic numeric operators 
on values you have simply if their types compose the number union type, and it 
still works if more numeric types appear later.  Likewise, the string open union 
could include both text and blob, as both support catenation and substring 
matches or extraction, for example.

This would aid to operator overloading in a generic way, letting you use the 
same syntax for different types, but allowing types to mix is optional; eg, you 
could support "add(int,int)" and "add(real,real)" without supporting 
"add(int,real)" etc but the syntax "add(x,y)" is shared, and you do this while 
still having a strong type system; allowing the mixing is optional case-by-case.

Supporting the open union is closer to supporting ANYTYPE while the closed union 
isn't so much.

-- Darren Duncan


Re: VARIANT / ANYTYPE datatype

From
Simon Riggs
Date:
On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> A customer came to us with this request: a way to store "any" data in a
> column.  We've gone back and forth trying to determine reasonable
> implementation restrictions, safety and useful semantics for them.
> I note that this has been requested in the past:
> http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php

I think its a reasonably common use case.

Would it be possible to do this with a "typed" hstore? Seems easier to
add something there than it would be to add the VARIANT type as
discussed here.


> both Oracle and MS-SQL have it

Do they? What types are they called?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: VARIANT / ANYTYPE datatype

From
Eric McKeeth
Date:
On Tue, May 10, 2011 at 3:57 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> A customer came to us with this request: a way to store "any" data in a
> column.  We've gone back and forth trying to determine reasonable
> implementation restrictions, safety and useful semantics for them.
> I note that this has been requested in the past:
> http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php

I think its a reasonably common use case.

Would it be possible to do this with a "typed" hstore? Seems easier to
add something there than it would be to add the VARIANT type as
discussed here.


> both Oracle and MS-SQL have it

Do they? What types are they called?

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


MS SQL Server calls it's variant type 'sql_variant', but it's limited to a subset of the data types they support. Basically, it can store any numeric type, or any binary or text type with a constrained length. No timestamps, geometry, XML, user-defined types, etc. allowed. So it's not really as much of an "any value" type as it might look on the surface. Don't know any details of Oracle's implementation.

-Eric

Re: VARIANT / ANYTYPE datatype

From
Joseph Adams
Date:
It seems to me a reasonable way to implement VARIANT would be to have
a data type called VARIANT that stores an OID of the inner type at the
beginning, followed by the binary data.  When you say
pg_typeof(somevariant), you'll get 'variant'.  Instead, you'd use a
function like this:
   variant_typeof(VARIANT) returns REGTYPE

VARIANT could then be casted to/from other types.  I think the rules
for converting to/from variant should be a little stronger than normal
PostgreSQL casting rules.  For example:
   SELECT '12345'::TEXT::INT; -- allowed   SELECT '12345'::TEXT::VARIANT::INT; -- not allowed   SELECT
'12345'::TEXT::VARIANT::TEXT::INT;-- allowed
 

I'm not sure how conversions to/from VARIANT could be implemented
other than creating conversion functions for every type.  It'd be nice
if we could avoid that.

A more permissive way to convert out of VARIANT might be to have a
function like this:
   variant_unwrap(VARIANT) returns TEXT -- user casts to desired type

I suppose the in/out functions could prefix the value with the type
name and a colon:

> SELECT '12345'::INT::VARIANT;  variant
-------------
integer:12345

The VARIANT type, or similar, would be useful for the JSON data type
I've been intermittently working on, as it would allow us to create a
function like this:
   from_json(JSON) returns VARIANT

from_json would unwrap a JSON string/number/bool/null/array,
converting it to a VARIANT whose inner type is TEXT / (INT or BIGINT
or DOUBLE or NUMERIC) / BOOL / [null] / ARRAY.  In the [null] case,
from_json would actually return NULL (I see no need for VARIANT to
wrap nulls).

This is rather type-safe compared to what I currently have:
   from_json(JSON) returns TEXT -- user casts to desired type

By returning variant, we can then cast to the desired type, and if the
cast is invalid, a type error will occur.


Joey Adams


Re: VARIANT / ANYTYPE datatype

From
Joseph Adams
Date:
On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
<joeyadams3.14159@gmail.com> wrote:
> The VARIANT type, or similar, would be useful for the JSON data type
> I've been intermittently working on, as it would allow us to create a
> function like this:
>
>    from_json(JSON) returns VARIANT

This occurred to me: if PostgreSQL functions could return ANYELEMENT,
then we could just say:
   from_json(JSON) returns ANYELEMENT -- no intermediate VARIANT value

Likewise, the variant conversion functions (which could be invoked
automatically) could be done this way:
   to_variant(ANYELEMENT) returns VARIANT   from_variant(VARIANT) returns ANYELEMENT

However, I'm not familiar enough with the innards of PostgreSQL's type
system to know if returning ANYELEMENT would be possible and make
sense.


Re: VARIANT / ANYTYPE datatype

From
Pavel Stehule
Date:
2011/5/11 Joseph Adams <joeyadams3.14159@gmail.com>:
> On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
> <joeyadams3.14159@gmail.com> wrote:
>> The VARIANT type, or similar, would be useful for the JSON data type
>> I've been intermittently working on, as it would allow us to create a
>> function like this:
>>
>>    from_json(JSON) returns VARIANT
>
> This occurred to me: if PostgreSQL functions could return ANYELEMENT,
> then we could just say:
>
>    from_json(JSON) returns ANYELEMENT -- no intermediate VARIANT value
>
> Likewise, the variant conversion functions (which could be invoked
> automatically) could be done this way:
>
>    to_variant(ANYELEMENT) returns VARIANT
>    from_variant(VARIANT) returns ANYELEMENT
>
> However, I'm not familiar enough with the innards of PostgreSQL's type
> system to know if returning ANYELEMENT would be possible and make
> sense.
>

it isn't possible. You can have a ANYELEMENT on input and output, or
ANYELEMENT on input, but you ca't have ANYELEMENT just on output. The
analogy for ANYELEMENT is C++ template.

Regards

Pavel Stehule

VARIANT is maybe good idea - it can simplify a PL/pgSQL code - but
though anybody about using this type in planner?


> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: VARIANT / ANYTYPE datatype

From
Robert Haas
Date:
On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
<joeyadams3.14159@gmail.com> wrote:
> It seems to me a reasonable way to implement VARIANT would be to have
> a data type called VARIANT that stores an OID of the inner type at the
> beginning, followed by the binary data.

That's likely to be how it gets implemented, but you seem to have
missed the point of some of the discussion upthread: the big problem
with that is that someone might type "DROP TYPE foo", and when they
do, you need an efficient way to figure out whether foo is in use
inside an instance of the variant type anywhere in the system.  The
devil is in the details...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: VARIANT / ANYTYPE datatype

From
Andrew Dunstan
Date:

On 05/11/2011 07:53 AM, Robert Haas wrote:
> On Tue, May 10, 2011 at 10:29 PM, Joseph Adams
> <joeyadams3.14159@gmail.com>  wrote:
>> It seems to me a reasonable way to implement VARIANT would be to have
>> a data type called VARIANT that stores an OID of the inner type at the
>> beginning, followed by the binary data.
> That's likely to be how it gets implemented, but you seem to have
> missed the point of some of the discussion upthread: the big problem
> with that is that someone might type "DROP TYPE foo", and when they
> do, you need an efficient way to figure out whether foo is in use
> inside an instance of the variant type anywhere in the system.  The
> devil is in the details...
>

Actually, I thought that was the variant (pun intended) Tom objected to, 
and wanted declared variant types instead, precisely so we could know 
where we'd need to look if a type was dropped. Using that you might 
still use the type oid at the beginning of the field, but there are 
other possibilities too (e.g. a single byte indicating which known 
variant is in use).

cheers

andrew


Re: VARIANT / ANYTYPE datatype

From
Joseph Adams
Date:
On Wed, May 11, 2011 at 7:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> That's likely to be how it gets implemented, but you seem to have
> missed the point of some of the discussion upthread: the big problem
> with that is that someone might type "DROP TYPE foo", and when they
> do, you need an efficient way to figure out whether foo is in use
> inside an instance of the variant type anywhere in the system.  The
> devil is in the details...

Sorry, I missed that.  That in mind, I think I would lean more toward
the union proposal as well.  Can anyone think of a case where VARIANT
would be more useful?

As for using one or two bytes to store the type of a UNION, that
creates a problem when you want to extend the union in the future.
That is, if a UNION is simply a collection of possible types values of
the UNION type can hold.

If UNION is implemented more like a tagged union:
   CREATE TYPE token AS TAGGED UNION (identifier TEXT, keyword TEXT,
number INT);

Then the problem of altering it is much like the problem of altering an ENUM.

On Tue, May 10, 2011 at 5:19 PM, Darren Duncan <darren@darrenduncan.net> wrote:
> Examples of open union types could be number, which all the numeric types
> compose, and so you can know say that you can use the generic numeric
> operators on values you have simply if their types compose the number union
> type, and it still works if more numeric types appear later.  Likewise, the
> string open union could include both text and blob, as both support
> catenation and substring matches or extraction, for example.
>
> This would aid to operator overloading in a generic way, letting you use the
> same syntax for different types, but allowing types to mix is optional; eg,
> you could support "add(int,int)" and "add(real,real)" without supporting
> "add(int,real)" etc but the syntax "add(x,y)" is shared, and you do this
> while still having a strong type system; allowing the mixing is optional
> case-by-case.

Coming from a Haskell perspective, this is a great idea, but I don't
think the "union" feature should be used to implement it.  Closed
unions correspond to algebraic data types in Haskell, e.g.:
   data Ordering = LT | EQ | GT

while open unions are better-suited to type classes:
   (+) :: (Num a) => a -> a -> a

I, for one, would like to see PostgreSQL steal some features from
Haskell's type system.  PostgreSQL seems to implement a subset of
Haskell's system, without type classes and where functions can have
only one type variable (anyelement).

To express the (+) example in PostgreSQL, it would be tempting to simply say:
   add(real, real) returns real

However, what if each real is a different type (e.g. INT and FLOAT).
Is that allowed?  In the Haskell example above, (+) constraints both
of its arguments to the same type.  In ad-hoc syntax, it would look
like this in PostgreSQL:
   real anyelement => add(anyelement, anyelement) returns anyelement

Another thing to consider: attempting to use a type class as a column
type, e.g.:
   CREATE TABLE foo (n real);

Normally in Haskell, type information is passed implicitly as
parameters (hence the term "parametric polymorphism"), rather than
carried alongside values (like in object-oriented languages).  In the
case above, the type information would have to be carried with each
value.  Haskell actually supports this, but under a somewhat-weird
extension called "Existential types" (see
http://www.haskell.org/haskellwiki/Existential_type#Examples for an
example).  It isn't terribly useful in Haskell, and I don't think it
will be in PostgreSQL either.


Joey Adams


Re: VARIANT / ANYTYPE datatype

From
Alvaro Herrera
Date:
Excerpts from Simon Riggs's message of mar may 10 17:57:20 -0400 2011:
> On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:

> > both Oracle and MS-SQL have it
> 
> Do they? What types are they called?

ANYTYPE

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: VARIANT / ANYTYPE datatype

From
Robert Haas
Date:
On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
<joeyadams3.14159@gmail.com> wrote:
> On Tue, May 10, 2011 at 5:19 PM, Darren Duncan <darren@darrenduncan.net> wrote:
>> Examples of open union types could be number, which all the numeric types
>> compose, and so you can know say that you can use the generic numeric
>> operators on values you have simply if their types compose the number union
>> type, and it still works if more numeric types appear later.  Likewise, the
>> string open union could include both text and blob, as both support
>> catenation and substring matches or extraction, for example.
>>
>> This would aid to operator overloading in a generic way, letting you use the
>> same syntax for different types, but allowing types to mix is optional; eg,
>> you could support "add(int,int)" and "add(real,real)" without supporting
>> "add(int,real)" etc but the syntax "add(x,y)" is shared, and you do this
>> while still having a strong type system; allowing the mixing is optional
>> case-by-case.
>
> Coming from a Haskell perspective, this is a great idea, but I don't
> think the "union" feature should be used to implement it.

I'm unclear what the point of such a feature would be.  A union of all
the common numeric types is not much different from the existing type
"numeric".

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: VARIANT / ANYTYPE datatype

From
Darren Duncan
Date:
Robert Haas wrote:
> On Wed, May 11, 2011 at 11:43 AM, Joseph Adams
> <joeyadams3.14159@gmail.com> wrote:
>> On Tue, May 10, 2011 at 5:19 PM, Darren Duncan <darren@darrenduncan.net> wrote:
>>> Examples of open union types could be number, which all the numeric types
>>> compose, and so you can know say that you can use the generic numeric
>>> operators on values you have simply if their types compose the number union
>>> type, and it still works if more numeric types appear later.  Likewise, the
>>> string open union could include both text and blob, as both support
>>> catenation and substring matches or extraction, for example.
>>>
>>> This would aid to operator overloading in a generic way, letting you use the
>>> same syntax for different types, but allowing types to mix is optional; eg,
>>> you could support "add(int,int)" and "add(real,real)" without supporting
>>> "add(int,real)" etc but the syntax "add(x,y)" is shared, and you do this
>>> while still having a strong type system; allowing the mixing is optional
>>> case-by-case.
>> Coming from a Haskell perspective, this is a great idea, but I don't
>> think the "union" feature should be used to implement it.
> 
> I'm unclear what the point of such a feature would be.  A union of all
> the common numeric types is not much different from the existing type
> "numeric".

In the case of the "open union" numeric, one point is that users or extensions 
could come up with new types that add themselves to the union, for example say a 
type for complex numbers (I didn't see a built-in such).

But I'm just citing numeric as an example; there would be a lot more in 
practice, potentially one for every individual type, so for example if operators 
were defined for the open union rather than for the base type, then 
users/extensions could define their own types and easily declare "you can use it 
like this type" but its different in some important way, which may just be an 
implementation difference.  Operations that don't care about the differences can 
just be written against the open union type where they just work and those that 
do care can be more specific.

Joseph Adams said:
> Coming from a Haskell perspective, this is a great idea, but I don't
> think the "union" feature should be used to implement it.  Closed
> unions correspond to algebraic data types in Haskell, e.g.:
> 
>     data Ordering = LT | EQ | GT
> 
> while open unions are better-suited to type classes:
> 
>     (+) :: (Num a) => a -> a -> a

While closed unions would often be used for your first examlpe, I see they're 
still useful for type classes as well.  Especially in the face of open unions 
being available, the closed unions let users say, no, I don't really want the 
meaning of this union type to change just because someone else declares a new 
type in scope (that adds itself to the union).  For example, one could declare a 
system_numeric closed union type that only includes Pg built-in numerics and 
users of that can be confident that nothing about it will change later unless 
the definition of system_numeric itself or the types it unions are changed.  But 
open unions would be preferred in places they wouldn't cause trouble, where you 
want to allow easier user extensibility.

> I, for one, would like to see PostgreSQL steal some features from
> Haskell's type system.  PostgreSQL seems to implement a subset of
> Haskell's system, without type classes and where functions can have
> only one type variable (anyelement).

I think that Haskell and other functional languages have a lot to teach 
relational DBMSs and I see them as being highly compatible.

-- Darren Duncan


Re: VARIANT / ANYTYPE datatype

From
Darren Duncan
Date:
To follow-up my earlier comments ...

I suspect for practical purposes we may want to limit the scope of some type 
features.

For example, the greatest benefits for "open union" / "mixin" types is with 
routines/operators, not so much with tables.

So, Pg could choose to support open unions but only for routines, where the 
declared types of table columns must still be other types.  Then you don't have 
to worry about searching your data for where one might be used, but only search 
your system catalog for routines/operators that use it.

But closed unions should be supported in tables too.

-- Darren Duncan



Re: VARIANT / ANYTYPE datatype

From
Tom Lane
Date:
Darren Duncan <darren@darrenduncan.net> writes:
> But I'm just citing numeric as an example; there would be a lot more
> in practice, potentially one for every individual type, so for example
> if operators were defined for the open union rather than for the base
> type, then users/extensions could define their own types and easily
> declare "you can use it like this type" but its different in some
> important way, which may just be an implementation difference.
> Operations that don't care about the differences can just be written
> against the open union type where they just work and those that do
> care can be more specific.

I'm just an old-school abstract data type hacker, but I don't see
anything in what you're saying that doesn't work today in our existing
type system: with overloaded and/or polymorphic operators and functions
you can get all those effects.  Maybe there would be some small gain in
ability to share code for tasks that fall between single-data-type and
works-for-anything cases, but it looks like a pretty marginal
improvement from here; probably not worth the cost and compatibility
implications of a major overhaul of the type system.
        regards, tom lane