Thread: equivalent of mysql's SET type?

equivalent of mysql's SET type?

From

Reece Hart

Date:

09 March 2011, 00:32:59

I'm considering porting a MySQL database to PostgreSQL. That database uses MySQL's SET type. Does anyone have advice about representing this type in PostgreSQL?

MySQL DDL excerpt:

CREATE TABLE `transcript_variation` (

`transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,

`transcript_stable_id` varchar(128) NOT NULL,

...

`consequence_type` set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')

) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;

I'm considering implementing this as a new type based on a bit vector, but I wonder if anyone has a more flexible general solution.

Thanks,

Reece

Re: equivalent of mysql's SET type?

From

Steve Atkins

Date:

09 March 2011, 00:52:02

On Mar 8, 2011, at 5:06 PM, Reece Hart wrote:

> I'm considering porting a MySQL database to PostgreSQL. That database uses MySQL's SET type. Does anyone have advice
aboutrepresenting this type in PostgreSQL? 
>
> MySQL DDL excerpt:
> CREATE TABLE `transcript_variation` (
>   `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>   `transcript_stable_id` varchar(128) NOT NULL,
>    ...
>   `consequence_type` set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>
>
> I'm considering implementing this as a new type based on a bit vector, but I wonder if anyone has a more flexible
generalsolution. 

tsvector or intarray might be other options, depending on how you're going to query it and whether those are really
stringsor more enum-ish. 

The more SQLy way of doing it would be to have that stored in a separate table, with a foreign key to this one.

Cheers,
  Steve

Re: equivalent of mysql's SET type?

From

Darren Duncan

Date:

09 March 2011, 01:19:11

Reece Hart wrote:
> I'm considering porting a MySQL database to PostgreSQL. That database uses
> MySQL's SET type. Does anyone have advice about representing this type in
> PostgreSQL?
>
> MySQL DDL excerpt:
> CREATE TABLE `transcript_variation` (
>   `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>   `transcript_stable_id` varchar(128) NOT NULL,
>    ...
>   `consequence_type`
> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>
> I'm considering implementing this as a new type based on a bit vector, but I
> wonder if anyone has a more flexible general solution.

Try starting with an enum type to define the possible values:

   CREATE TYPE Consequence_Type
     AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED',
       'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE');

... and then you could try using "ARRAY OF Consequence_Type" or some such.

-- Darren Duncan

Re: equivalent of mysql's SET type?

From

Reece Hart

Date:

09 March 2011, 03:48:53

On Tue, Mar 8, 2011 at 5:52 PM, Darren Duncan <darren@darrenduncan.net> wrote:

Try starting with an enum type to define the possible values:

CREATE TYPE Consequence_Type
AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED',
'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE');

... and then you could try using "ARRAY OF Consequence_Type" or some such.

Arrays occurred to me, but they don't give a set (i.e., a consequence type can appear more than once) unless I write the code to dedupe the array. However, if I were going to put that much effort into it, I might as well represent the set directly and obviate the dedupe. Or, am I missing something from your suggestion?

-Reece

Re: equivalent of mysql's SET type?

From

Reece Hart

Date:

09 March 2011, 03:55:05

On Tue, Mar 8, 2011 at 5:51 PM, Steve Atkins <steve@blighty.com> wrote:

tsvector or intarray might be other options, depending on how you're going to query it and whether those are really strings or more enum-ish.

The more SQLy way of doing it would be to have that stored in a separate table, with a foreign key to this one.

Using a separate table is what I've got now in an experimental conversion. This works well enough and might save me the bother of an enum update. Perhaps I should just move on.

Thanks Steve.

-Reece

Re: equivalent of mysql's SET type?

From

Darren Duncan

Date:

09 March 2011, 04:28:00

Reece Hart wrote:
> On Tue, Mar 8, 2011 at 5:52 PM, Darren Duncan <darren@darrenduncan.net>wrote:
>
>> Try starting with an enum type to define the possible values:
>>
>>  CREATE TYPE Consequence_Type
>>    AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED',
>>      'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE');
>>
>> ... and then you could try using "ARRAY OF Consequence_Type" or some such.
>
> Arrays occurred to me, but they don't give a set (i.e., a consequence type
> can appear more than once) unless I write the code to dedupe the array.
> However, if I were going to put that much effort into it, I might as well
> represent the set directly and obviate the dedupe. Or, am I missing
> something from your suggestion?

You also have another option, which is to split consequence_type into
a second table, and have a record in there for each value in the set.  You would
maintain a lack of duplicates by having an ordinary unique/primary key
constraint on the second table.  This second-table approach is logically
equivalent to your use of a set-valued field. -- Darren Duncan

Re: equivalent of mysql's SET type?

From

John R Pierce

Date:

09 March 2011, 04:41:57

On 03/08/11 5:06 PM, Reece Hart wrote:
> I'm considering porting a MySQL database to PostgreSQL. That database
> uses MySQL's SET type. Does anyone have advice about representing this
> type in PostgreSQL?
>
> MySQL DDL excerpt:
> CREATE TABLE `transcript_variation` (
>   `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>   `transcript_stable_id` varchar(128) NOT NULL,
>    ...
>   `consequence_type`
> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>
>

why not just have a set of booleans in the table for these individual
on/off attributes?   wouldn't that be simplest?

Re: equivalent of mysql's SET type?

From

Merlin Moncure

Date:

09 March 2011, 14:13:48

On Tue, Mar 8, 2011 at 11:41 PM, John R Pierce <pierce@hogranch.com> wrote:
> On 03/08/11 5:06 PM, Reece Hart wrote:
>>
>> I'm considering porting a MySQL database to PostgreSQL. That database uses
>> MySQL's SET type. Does anyone have advice about representing this type in
>> PostgreSQL?
>>
>> MySQL DDL excerpt:
>> CREATE TABLE `transcript_variation` (
>>  `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>>  `transcript_stable_id` varchar(128) NOT NULL,
>>   ...
>>  `consequence_type`
>> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
>> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>>
>>
>
> why not just have a set of booleans in the table for these individual on/off
> attributes?   wouldn't that be simplest?

+1

merlin

Re: equivalent of mysql's SET type?

From

Reece Hart

Date:

09 March 2011, 16:05:05

On Tue, Mar 8, 2011 at 9:41 PM, John R Pierce <pierce@hogranch.com> wrote:

why not just have a set of booleans in the table for these individual on/off attributes? wouldn't that be simplest?

I like that approach, but I think it's unlikely to fly in this specific case for a couple reasons.

First, there are actually 8 factors (I edited for clarity... sorry about that).

The original database is actively developed (released apx quarterly). I will need an approach that minimizes my burden when they edit the set factors.

And, I'd like to be compatible with mysql syntax and semantics for sets. If you hold your nose for a moment, you'll be able to read the following without becoming ill: mysql uses comma delimited strings to assign and query set types (but stored internally as bit vectors). So, one does validation_status = 'cluster,freq' to set those bits or validation_status like '%freq%' to query. Much to my chagrin, emulating this interface will make migration easier. However, implementing this string interface to set/get boolean columns is just too offensive to whatever modest design sensibilities I have. (For more pleasure reading, see http://dev.mysql.com/doc/refman/5.0/en/set.html. I particularly like the *warning* issued when one tries to add a value that's not part of the set.)

-Reece

Re: equivalent of mysql's SET type?

From

Merlin Moncure

Date:

09 March 2011, 16:17:07

On Wed, Mar 9, 2011 at 10:59 AM, Reece Hart <reece@harts.net> wrote:
> On Tue, Mar 8, 2011 at 9:41 PM, John R Pierce <pierce@hogranch.com> wrote:
>>
>> why not just have a set of booleans in the table for these individual
>> on/off attributes?   wouldn't that be simplest?
>
> I like that approach, but I think it's unlikely to fly in this specific case
> for a couple reasons.
> First, there are actually 8 factors (I edited for clarity... sorry about
> that).
> The original database is actively developed (released apx quarterly). I will
> need an approach that minimizes my burden when they edit the set factors.
> And, I'd like to be compatible with mysql syntax and semantics for sets. If
> you hold your nose for a moment, you'll be able to read the following
> without becoming ill: mysql uses comma delimited strings to assign and query
> set types (but stored internally as bit vectors). So, one does
> validation_status = 'cluster,freq' to set those bits or validation_status
> like '%freq%' to query. Much to my chagrin, emulating this interface will
> make migration easier. However, implementing this string interface to
> set/get boolean columns is just too offensive to whatever modest design
> sensibilities I have. (For more pleasure reading, see
> http://dev.mysql.com/doc/refman/5.0/en/set.html. I particularly like the
> *warning* issued when one tries to add a value that's not part of the set.)
> -Reece

create type validation_flags as
(
  cluster bool,
  freq bool
);

create function validation_flags_in(
  flags text, flags out validation_flags) returns validation_flags as
$$
  select row($1 ~ 'cluster', $1 ~ 'freq')::validation_flags
$$ language sql immutable;

create table foo (flags validation_flags);
insert into foo values (validation_flags_in('cluster'));

select * from foo;
select (flags).* from foo;

merlin

Re: equivalent of mysql's SET type?

From

Reece Hart

Date:

10 March 2011, 20:21:44

On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

create type validation_flags as
(
cluster bool,
freq bool
);

Wow. That solution is nearly sexy, and far and away better than any solution that I would have come up with. Thanks, Merlin!

-Reece

Re: equivalent of mysql's SET type?

From

Merlin Moncure

Date:

10 March 2011, 20:54:46

On Thu, Mar 10, 2011 at 3:21 PM, Reece Hart <reece@harts.net> wrote:
> On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>> create type validation_flags as
>> (
>>  cluster bool,
>>  freq bool
>> );
>
> Wow. That solution is nearly sexy, and far and away better than any solution
> that I would have come up with. Thanks, Merlin!

thanks -- I do what I do. fyi another thing is this only works if no
flags substrings of other flag -- no big deal to add a little guard
against that in the 'in' function though if you need to.
also one pain point with composite types is that you can't flip
specific fields like this:

update foo set (flags).freq = true;

you can work around that like this:

create function validation_flags_out(
 flags validation_flags, flags out text) returns text as
$$
 select array_to_string(array(
   select 'freq' where ($1).freq
     union all
   select 'cluster' where ($1).cluster
   ), ',')
$$ language sql immutable;

set a flag:
update foo set flags = validation_flags_in(validation_flags_out(flags)
|| ',cluster');

remove a flag:
update foo set flags =
validation_flags_in(replace(validation_flags_out(flags), 'cluster',
''));

if you do the above a lot you can wrap the function so that your code becomes:
update foo set flags = validation_flags(flags, 'cluster', true/false);

it really depends on how much process/structure you want to wrap
around this to get it to your liking.  another completely separate
route to this problem is the hstore type:
http://www.postgresql.org/docs/9.0/static/hstore.html

which is a very good syntax generalization of key/value pairs, but it
maybe too loose if you want to reduce to specific flag set, but you
should probably take a look before writing a bunch of functions. If
you do go with flags/composite type route, and you had a lot of mysql
'set' types to convert, I would write a function in postgres to
generate the wrapper functions from list of strings -- ping back if
you need help with that.

merlin

Re: equivalent of mysql's SET type?

From

Dmitriy Igrishin

Date:

10 March 2011, 21:13:49

2011/3/9 John R Pierce <pierce@hogranch.com>

On 03/08/11 5:06 PM, Reece Hart wrote:
I'm considering porting a MySQL database to PostgreSQL. That database uses MySQL's SET type. Does anyone have advice about representing this type in PostgreSQL?

MySQL DDL excerpt:
CREATE TABLE `transcript_variation` (
`transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`transcript_stable_id` varchar(128) NOT NULL,
...
`consequence_type` set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;

why not just have a set of booleans in the table for these individual on/off attributes? wouldn't that be simplest?

Yes, it might be simplest at first sight.
But classical solution is relation N - N scales simpler than
any tricks with bytes.
Unfortunately, enums and composite types are not extensible. And
if you need to add yet another option (or remove some option) it
will be problematic.
In case of N - N relation you need just use INSERT/DELETE.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

--
// Dmitriy.

Re: equivalent of mysql's SET type?

From

Merlin Moncure

Date:

10 March 2011, 21:52:21

On Thu, Mar 10, 2011 at 4:13 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
> 2011/3/9 John R Pierce <pierce@hogranch.com>
>>
>> On 03/08/11 5:06 PM, Reece Hart wrote:
>>>
>>> I'm considering porting a MySQL database to PostgreSQL. That database
>>> uses MySQL's SET type. Does anyone have advice about representing this type
>>> in PostgreSQL?
>>>
>>> MySQL DDL excerpt:
>>> CREATE TABLE `transcript_variation` (
>>>  `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>>>  `transcript_stable_id` varchar(128) NOT NULL,
>>>   ...
>>>  `consequence_type`
>>> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
>>> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>>>
>>>
>>
>> why not just have a set of booleans in the table for these individual
>> on/off attributes?   wouldn't that be simplest?
>
> Yes, it might be simplest at first sight.
> But classical solution is relation N - N scales simpler than
> any tricks with bytes.
> Unfortunately, enums and composite types are not extensible. And
> if you need to add yet another option (or remove some option) it
> will be problematic.
> In case of N - N relation you need just use INSERT/DELETE.

actually composite types are fairly workable if you use table instead
of a type (you can drop/add column, etc). in 9.1 you will be able to
do this with vanilla composite type
(http://developer.postgresql.org/pgdocs/postgres/sql-altertype.html).

in typical case I would agree that classic approach of separate
relation is typically the way to go, there are exceptions -- for
example enum gives you inline ordering -- or as in this case where OP
is looking to simplify porting large body of application code.

merlin

Re: equivalent of mysql's SET type?

From

Dmitriy Igrishin

Date:

11 March 2011, 07:56:29

2011/3/11 Merlin Moncure <mmoncure@gmail.com>

On Thu, Mar 10, 2011 at 4:13 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:
> 2011/3/9 John R Pierce <pierce@hogranch.com>
>>
>> On 03/08/11 5:06 PM, Reece Hart wrote:
>>>
>>> I'm considering porting a MySQL database to PostgreSQL. That database
>>> uses MySQL's SET type. Does anyone have advice about representing this type
>>> in PostgreSQL?
>>>
>>> MySQL DDL excerpt:
>>> CREATE TABLE `transcript_variation` (
>>> `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>>> `transcript_stable_id` varchar(128) NOT NULL,
>>> ...
>>> `consequence_type`
>>> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
>>> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>>>
>>>
>>
>> why not just have a set of booleans in the table for these individual
>> on/off attributes? wouldn't that be simplest?
>
> Yes, it might be simplest at first sight.
> But classical solution is relation N - N scales simpler than
> any tricks with bytes.
> Unfortunately, enums and composite types are not extensible. And
> if you need to add yet another option (or remove some option) it
> will be problematic.
> In case of N - N relation you need just use INSERT/DELETE.

actually composite types are fairly workable if you use table instead
of a type (you can drop/add column, etc). in 9.1 you will be able to
do this with vanilla composite type
(http://developer.postgresql.org/pgdocs/postgres/sql-altertype.html).

Good news! Thanks for pointing that.

in typical case I would agree that classic approach of separate
relation is typically the way to go, there are exceptions -- for
example enum gives you inline ordering -- or as in this case where OP
is looking to simplify porting large body of application code.

Agree.

merlin

--
// Dmitriy.

Re: equivalent of mysql's SET type?

From

Merlin Moncure

Date:

16 March 2011, 14:54:54

On Thu, Mar 10, 2011 at 3:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote:
> On Thu, Mar 10, 2011 at 3:21 PM, Reece Hart <reece@harts.net> wrote:
>> On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>
>>> create type validation_flags as
>>> (
>>>  cluster bool,
>>>  freq bool
>>> );
>>
>> Wow. That solution is nearly sexy, and far and away better than any solution
>> that I would have come up with. Thanks, Merlin!
>
> thanks -- I do what I do. fyi another thing is this only works if no
> flags substrings of other flag -- no big deal to add a little guard
> against that in the 'in' function though if you need to.
> also one pain point with composite types is that you can't flip
> specific fields like this:
>
> update foo set (flags).freq = true;

Actually you *can* do this -- I just didn't know it.

update foo set flags.freq = true;

will work, so all that extra stuff to do that isn't needed.

merlin