Thread: data modeling genes and alleles... help!

data modeling genes and alleles... help!

From
Modulok
Date:
List,

I have a data modeling problem. That much, I know. The question is how do I
model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic
many-to-many setup::

    create table person(
        name varchar(32) primary key
    );
    create table gene(
        name varchar(32) primary key
    );
    create table person_gene(
        person varchar(32) references person(name),
        gene varchar(32) references gene(name)
    );

And I have data like::

    insert into person(name)
    values
        ('foo')
    ;
    insert into gene(name)
    values
        ('hair'),
        ('eye')
    ;
    insert into person_gene(person, gene)
    values
        ('foo', 'hair'),
        ('foo', 'eye')
    ;

Great. This is important as I need to be able to ask questions like "who
carries gene 'x'?" as well as "what genes does person 'y' carry?" But then
things get thorny...

I also need to store the properties of the individual genes (the alleles). This
is akin to an instance of one of the many gene classes in my application code.
So I make more tables::

    create table hair(
        id serial primary key,
        density float,
        thickness float
    );
    create table eye(
        id serial primary key,
        pupil_type int
    );

How do I store a reference to this data? I'd add a column to the person_gene
table, but it points to what? I can't reference a column name because they're
all stored in different tables. I also can't store them in the same table, as
they all store different data. Do I store the *table name* itself in a column
of the gene_table? (Smells like a klude.)

A person might not carry all genes. The number of genes in existence is not
fixed. New ones are introduced infrequently. There may be genes that no one
carries. (I assume I just make a new table each time a new gene is introduced?)

I thought about just pickling/marshaling the instances of my various gene
classes and just having a single 'genes' table which has a blob column but I
hesitate to do that because I want to be able to do queries on aggregate allele
stats. Things like "how many persons have pupil type 1?", etc.

It's late and I've probably over complicated it. Any pointers or advice on how
to model this would be greatly appreciated.

Cheers!
-Modulok-


Re: data modeling genes and alleles... help!

From
Dann Corbit
Date:
Everything should have an id, and combination tables like person-gene should have the person id + the gene id.
You might look at this tool to make your life easier:
https://github.com/pgmodeler/pgmodeler

First, read up on this:
http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
then, it will be obvious how to use the tool.

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Modulok
Sent: Friday, March 29, 2013 7:39 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] data modeling genes and alleles... help!

List,

I have a data modeling problem. That much, I know. The question is how do I model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic many-to-many setup::

    create table person(
        name varchar(32) primary key
    );
    create table gene(
        name varchar(32) primary key
    );
    create table person_gene(
        person varchar(32) references person(name),
        gene varchar(32) references gene(name)
    );

And I have data like::

    insert into person(name)
    values
        ('foo')
    ;
    insert into gene(name)
    values
        ('hair'),
        ('eye')
    ;
    insert into person_gene(person, gene)
    values
        ('foo', 'hair'),
        ('foo', 'eye')
    ;

Great. This is important as I need to be able to ask questions like "who carries gene 'x'?" as well as "what genes does
person'y' carry?" But then things get thorny... 

I also need to store the properties of the individual genes (the alleles). This is akin to an instance of one of the
manygene classes in my application code. 
So I make more tables::

    create table hair(
        id serial primary key,
        density float,
        thickness float
    );
    create table eye(
        id serial primary key,
        pupil_type int
    );

How do I store a reference to this data? I'd add a column to the person_gene table, but it points to what? I can't
referencea column name because they're all stored in different tables. I also can't store them in the same table, as
theyall store different data. Do I store the *table name* itself in a column of the gene_table? (Smells like a klude.) 

A person might not carry all genes. The number of genes in existence is not fixed. New ones are introduced
infrequently.There may be genes that no one carries. (I assume I just make a new table each time a new gene is
introduced?)

I thought about just pickling/marshaling the instances of my various gene classes and just having a single 'genes'
tablewhich has a blob column but I hesitate to do that because I want to be able to do queries on aggregate allele
stats.Things like "how many persons have pupil type 1?", etc. 

It's late and I've probably over complicated it. Any pointers or advice on how to model this would be greatly
appreciated.

Cheers!
-Modulok-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: data modeling genes and alleles... help!

From
Dann Corbit
Date:
I like this one too:
http://mogwai.sourceforge.net/erdesignerng.html

It does reverse engineering, but if you are creating a new model, that capability is not a necessity.
Mogwai is not Postgresql specific, so you can use it with any db.

-----Original Message-----
From: Dann Corbit
Sent: Friday, March 29, 2013 11:45 PM
To: 'Modulok'; pgsql-general@postgresql.org
Subject: RE: [GENERAL] data modeling genes and alleles... help!

Everything should have an id, and combination tables like person-gene should have the person id + the gene id.
You might look at this tool to make your life easier:
https://github.com/pgmodeler/pgmodeler

First, read up on this:
http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
then, it will be obvious how to use the tool.

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Modulok
Sent: Friday, March 29, 2013 7:39 PM
To: pgsql-general@postgresql.org
Subject: [GENERAL] data modeling genes and alleles... help!

List,

I have a data modeling problem. That much, I know. The question is how do I model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic many-to-many setup::

    create table person(
        name varchar(32) primary key
    );
    create table gene(
        name varchar(32) primary key
    );
    create table person_gene(
        person varchar(32) references person(name),
        gene varchar(32) references gene(name)
    );

And I have data like::

    insert into person(name)
    values
        ('foo')
    ;
    insert into gene(name)
    values
        ('hair'),
        ('eye')
    ;
    insert into person_gene(person, gene)
    values
        ('foo', 'hair'),
        ('foo', 'eye')
    ;

Great. This is important as I need to be able to ask questions like "who carries gene 'x'?" as well as "what genes does
person'y' carry?" But then things get thorny... 

I also need to store the properties of the individual genes (the alleles). This is akin to an instance of one of the
manygene classes in my application code. 
So I make more tables::

    create table hair(
        id serial primary key,
        density float,
        thickness float
    );
    create table eye(
        id serial primary key,
        pupil_type int
    );

How do I store a reference to this data? I'd add a column to the person_gene table, but it points to what? I can't
referencea column name because they're all stored in different tables. I also can't store them in the same table, as
theyall store different data. Do I store the *table name* itself in a column of the gene_table? (Smells like a klude.) 

A person might not carry all genes. The number of genes in existence is not fixed. New ones are introduced
infrequently.There may be genes that no one carries. (I assume I just make a new table each time a new gene is
introduced?)

I thought about just pickling/marshaling the instances of my various gene classes and just having a single 'genes'
tablewhich has a blob column but I hesitate to do that because I want to be able to do queries on aggregate allele
stats.Things like "how many persons have pupil type 1?", etc. 

It's late and I've probably over complicated it. Any pointers or advice on how to model this would be greatly
appreciated.

Cheers!
-Modulok-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: data modeling genes and alleles... help!

From
Modulok
Date:
> Everything should have an id, and combination tables like person-gene should
> have the person id + the gene id.
> You might look at this tool to make your life easier:
> https://github.com/pgmodeler/pgmodeler
>
> First, read up on this:
> http://en.wikipedia.org/wiki/Entity%E2%80%93relationship_model
> then, it will be obvious how to use the tool.

>> Everything should have an id, and combination tables like person-gene should
>> have the person id + the gene id.

Yes. The model provided is a minimal approximation of the actual model. The
real thing uses numeric id columns to link tables, as well as many constraints
to ensure valid input and about a hundred other tables. In the example I used
names as keys rather than numeric ID's to try and make it a little easier to
follow my intent. Perhaps this had the opposite effect.

-Modulok-


Re: data modeling genes and alleles... help!

From
Misa Simic
Date:
Hi,

If I have understood well....

You have two options:

1. Person_gene (person_id pk, gene_id int)
    Genes (gene_id serial pk, gene_type int)
    Then for each gene_type separate table with gene_id int what is pk and relates to genes... And other columns what describe that type...

    Person_gene (person_id, gene_id int)

Second option is, instead of separate table for each type... Table genes, has third column, with type hstore or json, where you can store different "properies" for each gene... What is not pure SQL - but postgres can handle that very well, just you need make decision what better fits to you... 

A) create new table every time new type shows up... (Gene type, not concrete gene)
B) store all gene info in one column (as hstore or json)

Kind Regards,

Misa




On Saturday, March 30, 2013, Modulok wrote:
List,

I have a data modeling problem. That much, I know. The question is how do I
model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic
many-to-many setup::

    create table person(
        name varchar(32) primary key
    );
    create table gene(
        name varchar(32) primary key
    );
    create table person_gene(
        person varchar(32) references person(name),
        gene varchar(32) references gene(name)
    );

And I have data like::

    insert into person(name)
    values
        ('foo')
    ;
    insert into gene(name)
    values
        ('hair'),
        ('eye')
    ;
    insert into person_gene(person, gene)
    values
        ('foo', 'hair'),
        ('foo', 'eye')
    ;

Great. This is important as I need to be able to ask questions like "who
carries gene 'x'?" as well as "what genes does person 'y' carry?" But then
things get thorny...

I also need to store the properties of the individual genes (the alleles). This
is akin to an instance of one of the many gene classes in my application code.
So I make more tables::

    create table hair(
        id serial primary key,
        density float,
        thickness float
    );
    create table eye(
        id serial primary key,
        pupil_type int
    );

How do I store a reference to this data? I'd add a column to the person_gene
table, but it points to what? I can't reference a column name because they're
all stored in different tables. I also can't store them in the same table, as
they all store different data. Do I store the *table name* itself in a column
of the gene_table? (Smells like a klude.)

A person might not carry all genes. The number of genes in existence is not
fixed. New ones are introduced infrequently. There may be genes that no one
carries. (I assume I just make a new table each time a new gene is introduced?)

I thought about just pickling/marshaling the instances of my various gene
classes and just having a single 'genes' table which has a blob column but I
hesitate to do that because I want to be able to do queries on aggregate allele
stats. Things like "how many persons have pupil type 1?", etc.

It's late and I've probably over complicated it. Any pointers or advice on how
to model this would be greatly appreciated.

Cheers!
-Modulok-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: data modeling genes and alleles... help!

From
Alban Hertroys
Date:
On Mar 30, 2013, at 3:38, Modulok <modulok@gmail.com> wrote:

List,

I have a data modeling problem. That much, I know. The question is how do I
model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic
many-to-many setup::

   create table person(
       name varchar(32) primary key
   );
   create table gene(
       name varchar(32) primary key
   );
   create table person_gene(
       person varchar(32) references person(name),
       gene varchar(32) references gene(name)
   );

If my vague memories of high-school biology are right, then I think you might be on the wrong track here.

Great. This is important as I need to be able to ask questions like "who
carries gene 'x'?" as well as "what genes does person 'y' carry?" But then
things get thorny...

I also need to store the properties of the individual genes (the alleles). This
is akin to an instance of one of the many gene classes in my application code.
So I make more tables::

   create table hair(
       id serial primary key,
       density float,
       thickness float
   );
   create table eye(
       id serial primary key,
       pupil_type int
   );

How do I store a reference to this data? I'd add a column to the person_gene
table, but it points to what? I can't reference a column name because they're
all stored in different tables. I also can't store them in the same table, as
they all store different data. Do I store the *table name* itself in a column
of the gene_table? (Smells like a klude.)

Aren't alleles related to genes? (Hence my previous remark)

If that's the case, you'd add a gene foreign key to each allele table and then store which alleles are related to a person instead of which genes are related to a person. The genes then follow from the alleles.

Is that closer to what you're after?

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.