Thread: equivalent of mysql's SET type?
I'm considering porting a MySQL database to PostgreSQL. That database uses MySQL's SET type. Does anyone have advice about representing this type in PostgreSQL?
MySQL DDL excerpt:
CREATE TABLE `transcript_variation` (
`transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`transcript_stable_id` varchar(128) NOT NULL,
...
`consequence_type` set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
I'm considering implementing this as a new type based on a bit vector, but I wonder if anyone has a more flexible general solution.
Thanks,
Reece
On Mar 8, 2011, at 5:06 PM, Reece Hart wrote: > I'm considering porting a MySQL database to PostgreSQL. That database uses MySQL's SET type. Does anyone have advice aboutrepresenting this type in PostgreSQL? > > MySQL DDL excerpt: > CREATE TABLE `transcript_variation` ( > `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT, > `transcript_stable_id` varchar(128) NOT NULL, > ... > `consequence_type` set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE') > ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1; > > > I'm considering implementing this as a new type based on a bit vector, but I wonder if anyone has a more flexible generalsolution. tsvector or intarray might be other options, depending on how you're going to query it and whether those are really stringsor more enum-ish. The more SQLy way of doing it would be to have that stored in a separate table, with a foreign key to this one. Cheers, Steve
Reece Hart wrote: > I'm considering porting a MySQL database to PostgreSQL. That database uses > MySQL's SET type. Does anyone have advice about representing this type in > PostgreSQL? > > MySQL DDL excerpt: > CREATE TABLE `transcript_variation` ( > `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT, > `transcript_stable_id` varchar(128) NOT NULL, > ... > `consequence_type` > set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE') > ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1; > > I'm considering implementing this as a new type based on a bit vector, but I > wonder if anyone has a more flexible general solution. Try starting with an enum type to define the possible values: CREATE TYPE Consequence_Type AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED', 'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE'); ... and then you could try using "ARRAY OF Consequence_Type" or some such. -- Darren Duncan
On Tue, Mar 8, 2011 at 5:52 PM, Darren Duncan <darren@darrenduncan.net> wrote:
Try starting with an enum type to define the possible values:
CREATE TYPE Consequence_Type
AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED',
'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE');
... and then you could try using "ARRAY OF Consequence_Type" or some such.
Arrays occurred to me, but they don't give a set (i.e., a consequence type can appear more than once) unless I write the code to dedupe the array. However, if I were going to put that much effort into it, I might as well represent the set directly and obviate the dedupe. Or, am I missing something from your suggestion?
-Reece
On Tue, Mar 8, 2011 at 5:51 PM, Steve Atkins <steve@blighty.com> wrote:
tsvector or intarray might be other options, depending on how you're going to query it and whether those are really strings or more enum-ish.
The more SQLy way of doing it would be to have that stored in a separate table, with a foreign key to this one.
Using a separate table is what I've got now in an experimental conversion. This works well enough and might save me the bother of an enum update. Perhaps I should just move on.
Thanks Steve.
-Reece
Reece Hart wrote: > On Tue, Mar 8, 2011 at 5:52 PM, Darren Duncan <darren@darrenduncan.net>wrote: > >> Try starting with an enum type to define the possible values: >> >> CREATE TYPE Consequence_Type >> AS ENUM ('ESSENTIAL_SPLICE_SITE','STOP_GAINED', >> 'STOP_LOST','COMPLEX_INDEL','SPLICE_SITE'); >> >> ... and then you could try using "ARRAY OF Consequence_Type" or some such. > > Arrays occurred to me, but they don't give a set (i.e., a consequence type > can appear more than once) unless I write the code to dedupe the array. > However, if I were going to put that much effort into it, I might as well > represent the set directly and obviate the dedupe. Or, am I missing > something from your suggestion? You also have another option, which is to split consequence_type into a second table, and have a record in there for each value in the set. You would maintain a lack of duplicates by having an ordinary unique/primary key constraint on the second table. This second-table approach is logically equivalent to your use of a set-valued field. -- Darren Duncan
On 03/08/11 5:06 PM, Reece Hart wrote: > I'm considering porting a MySQL database to PostgreSQL. That database > uses MySQL's SET type. Does anyone have advice about representing this > type in PostgreSQL? > > MySQL DDL excerpt: > CREATE TABLE `transcript_variation` ( > `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT, > `transcript_stable_id` varchar(128) NOT NULL, > ... > `consequence_type` > set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE') > ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1; > > why not just have a set of booleans in the table for these individual on/off attributes? wouldn't that be simplest?
On Tue, Mar 8, 2011 at 11:41 PM, John R Pierce <pierce@hogranch.com> wrote: > On 03/08/11 5:06 PM, Reece Hart wrote: >> >> I'm considering porting a MySQL database to PostgreSQL. That database uses >> MySQL's SET type. Does anyone have advice about representing this type in >> PostgreSQL? >> >> MySQL DDL excerpt: >> CREATE TABLE `transcript_variation` ( >> `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT, >> `transcript_stable_id` varchar(128) NOT NULL, >> ... >> `consequence_type` >> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE') >> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1; >> >> > > why not just have a set of booleans in the table for these individual on/off > attributes? wouldn't that be simplest? +1 merlin
On Tue, Mar 8, 2011 at 9:41 PM, John R Pierce <pierce@hogranch.com> wrote:
why not just have a set of booleans in the table for these individual on/off attributes? wouldn't that be simplest?
I like that approach, but I think it's unlikely to fly in this specific case for a couple reasons.
First, there are actually 8 factors (I edited for clarity... sorry about that).
The original database is actively developed (released apx quarterly). I will need an approach that minimizes my burden when they edit the set factors.
And, I'd like to be compatible with mysql syntax and semantics for sets. If you hold your nose for a moment, you'll be able to read the following without becoming ill: mysql uses comma delimited strings to assign and query set types (but stored internally as bit vectors). So, one does validation_status = 'cluster,freq' to set those bits or validation_status like '%freq%' to query. Much to my chagrin, emulating this interface will make migration easier. However, implementing this string interface to set/get boolean columns is just too offensive to whatever modest design sensibilities I have. (For more pleasure reading, see http://dev.mysql.com/doc/refman/5.0/en/set.html. I particularly like the *warning* issued when one tries to add a value that's not part of the set.)
-Reece
On Wed, Mar 9, 2011 at 10:59 AM, Reece Hart <reece@harts.net> wrote: > On Tue, Mar 8, 2011 at 9:41 PM, John R Pierce <pierce@hogranch.com> wrote: >> >> why not just have a set of booleans in the table for these individual >> on/off attributes? wouldn't that be simplest? > > I like that approach, but I think it's unlikely to fly in this specific case > for a couple reasons. > First, there are actually 8 factors (I edited for clarity... sorry about > that). > The original database is actively developed (released apx quarterly). I will > need an approach that minimizes my burden when they edit the set factors. > And, I'd like to be compatible with mysql syntax and semantics for sets. If > you hold your nose for a moment, you'll be able to read the following > without becoming ill: mysql uses comma delimited strings to assign and query > set types (but stored internally as bit vectors). So, one does > validation_status = 'cluster,freq' to set those bits or validation_status > like '%freq%' to query. Much to my chagrin, emulating this interface will > make migration easier. However, implementing this string interface to > set/get boolean columns is just too offensive to whatever modest design > sensibilities I have. (For more pleasure reading, see > http://dev.mysql.com/doc/refman/5.0/en/set.html. I particularly like the > *warning* issued when one tries to add a value that's not part of the set.) > -Reece create type validation_flags as ( cluster bool, freq bool ); create function validation_flags_in( flags text, flags out validation_flags) returns validation_flags as $$ select row($1 ~ 'cluster', $1 ~ 'freq')::validation_flags $$ language sql immutable; create table foo (flags validation_flags); insert into foo values (validation_flags_in('cluster')); select * from foo; select (flags).* from foo; merlin
On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
create type validation_flags as
(
cluster bool,
freq bool
);
Wow. That solution is nearly sexy, and far and away better than any solution that I would have come up with. Thanks, Merlin!
-Reece
On Thu, Mar 10, 2011 at 3:21 PM, Reece Hart <reece@harts.net> wrote: > On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >> create type validation_flags as >> ( >> cluster bool, >> freq bool >> ); > > Wow. That solution is nearly sexy, and far and away better than any solution > that I would have come up with. Thanks, Merlin! thanks -- I do what I do. fyi another thing is this only works if no flags substrings of other flag -- no big deal to add a little guard against that in the 'in' function though if you need to. also one pain point with composite types is that you can't flip specific fields like this: update foo set (flags).freq = true; you can work around that like this: create function validation_flags_out( flags validation_flags, flags out text) returns text as $$ select array_to_string(array( select 'freq' where ($1).freq union all select 'cluster' where ($1).cluster ), ',') $$ language sql immutable; set a flag: update foo set flags = validation_flags_in(validation_flags_out(flags) || ',cluster'); remove a flag: update foo set flags = validation_flags_in(replace(validation_flags_out(flags), 'cluster', '')); if you do the above a lot you can wrap the function so that your code becomes: update foo set flags = validation_flags(flags, 'cluster', true/false); it really depends on how much process/structure you want to wrap around this to get it to your liking. another completely separate route to this problem is the hstore type: http://www.postgresql.org/docs/9.0/static/hstore.html which is a very good syntax generalization of key/value pairs, but it maybe too loose if you want to reduce to specific flag set, but you should probably take a look before writing a bunch of functions. If you do go with flags/composite type route, and you had a lot of mysql 'set' types to convert, I would write a function in postgres to generate the wrapper functions from list of strings -- ping back if you need help with that. merlin
2011/3/9 John R Pierce <pierce@hogranch.com>
On 03/08/11 5:06 PM, Reece Hart wrote:why not just have a set of booleans in the table for these individual on/off attributes? wouldn't that be simplest?I'm considering porting a MySQL database to PostgreSQL. That database uses MySQL's SET type. Does anyone have advice about representing this type in PostgreSQL?
MySQL DDL excerpt:
CREATE TABLE `transcript_variation` (
`transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`transcript_stable_id` varchar(128) NOT NULL,
...
`consequence_type` set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
Yes, it might be simplest at first sight.
But classical solution is relation N - N scales simpler than
any tricks with bytes.
Unfortunately, enums and composite types are not extensible. And
if you need to add yet another option (or remove some option) it
will be problematic.
In case of N - N relation you need just use INSERT/DELETE.
But classical solution is relation N - N scales simpler than
any tricks with bytes.
Unfortunately, enums and composite types are not extensible. And
if you need to add yet another option (or remove some option) it
will be problematic.
In case of N - N relation you need just use INSERT/DELETE.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
--
// Dmitriy.
On Thu, Mar 10, 2011 at 4:13 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote: > 2011/3/9 John R Pierce <pierce@hogranch.com> >> >> On 03/08/11 5:06 PM, Reece Hart wrote: >>> >>> I'm considering porting a MySQL database to PostgreSQL. That database >>> uses MySQL's SET type. Does anyone have advice about representing this type >>> in PostgreSQL? >>> >>> MySQL DDL excerpt: >>> CREATE TABLE `transcript_variation` ( >>> `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT, >>> `transcript_stable_id` varchar(128) NOT NULL, >>> ... >>> `consequence_type` >>> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE') >>> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1; >>> >>> >> >> why not just have a set of booleans in the table for these individual >> on/off attributes? wouldn't that be simplest? > > Yes, it might be simplest at first sight. > But classical solution is relation N - N scales simpler than > any tricks with bytes. > Unfortunately, enums and composite types are not extensible. And > if you need to add yet another option (or remove some option) it > will be problematic. > In case of N - N relation you need just use INSERT/DELETE. actually composite types are fairly workable if you use table instead of a type (you can drop/add column, etc). in 9.1 you will be able to do this with vanilla composite type (http://developer.postgresql.org/pgdocs/postgres/sql-altertype.html). in typical case I would agree that classic approach of separate relation is typically the way to go, there are exceptions -- for example enum gives you inline ordering -- or as in this case where OP is looking to simplify porting large body of application code. merlin
2011/3/11 Merlin Moncure <mmoncure@gmail.com>
On Thu, Mar 10, 2011 at 4:13 PM, Dmitriy Igrishin <dmitigr@gmail.com> wrote:actually composite types are fairly workable if you use table instead
> 2011/3/9 John R Pierce <pierce@hogranch.com>
>>
>> On 03/08/11 5:06 PM, Reece Hart wrote:
>>>
>>> I'm considering porting a MySQL database to PostgreSQL. That database
>>> uses MySQL's SET type. Does anyone have advice about representing this type
>>> in PostgreSQL?
>>>
>>> MySQL DDL excerpt:
>>> CREATE TABLE `transcript_variation` (
>>> `transcript_variation_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
>>> `transcript_stable_id` varchar(128) NOT NULL,
>>> ...
>>> `consequence_type`
>>> set('ESSENTIAL_SPLICE_SITE','STOP_GAINED','STOP_LOST','COMPLEX_INDEL','SPLICE_SITE')
>>> ) ENGINE=MyISAM AUTO_INCREMENT=174923212 DEFAULT CHARSET=latin1;
>>>
>>>
>>
>> why not just have a set of booleans in the table for these individual
>> on/off attributes? wouldn't that be simplest?
>
> Yes, it might be simplest at first sight.
> But classical solution is relation N - N scales simpler than
> any tricks with bytes.
> Unfortunately, enums and composite types are not extensible. And
> if you need to add yet another option (or remove some option) it
> will be problematic.
> In case of N - N relation you need just use INSERT/DELETE.
of a type (you can drop/add column, etc). in 9.1 you will be able to
do this with vanilla composite type
(http://developer.postgresql.org/pgdocs/postgres/sql-altertype.html).
Good news! Thanks for pointing that.
in typical case I would agree that classic approach of separate
relation is typically the way to go, there are exceptions -- for
example enum gives you inline ordering -- or as in this case where OP
is looking to simplify porting large body of application code.
Agree.
merlin
--
// Dmitriy.
On Thu, Mar 10, 2011 at 3:54 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Thu, Mar 10, 2011 at 3:21 PM, Reece Hart <reece@harts.net> wrote: >> On Wed, Mar 9, 2011 at 9:16 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> >>> create type validation_flags as >>> ( >>> cluster bool, >>> freq bool >>> ); >> >> Wow. That solution is nearly sexy, and far and away better than any solution >> that I would have come up with. Thanks, Merlin! > > thanks -- I do what I do. fyi another thing is this only works if no > flags substrings of other flag -- no big deal to add a little guard > against that in the 'in' function though if you need to. > also one pain point with composite types is that you can't flip > specific fields like this: > > update foo set (flags).freq = true; Actually you *can* do this -- I just didn't know it. update foo set flags.freq = true; will work, so all that extra stuff to do that isn't needed. merlin