Re: data modeling genes and alleles... help! - Mailing list pgsql-general

From Misa Simic
Subject Re: data modeling genes and alleles... help!
Date
Msg-id CAH3i69nqzg5YeF=r8=xxLbdQ9HQqAxpe0g8gyzY0K26bLp8+xQ@mail.gmail.com
Whole thread Raw
In response to data modeling genes and alleles... help!  (Modulok <modulok@gmail.com>)
List pgsql-general
Hi,

If I have understood well....

You have two options:

1. Person_gene (person_id pk, gene_id int)
    Genes (gene_id serial pk, gene_type int)
    Then for each gene_type separate table with gene_id int what is pk and relates to genes... And other columns what describe that type...

    Person_gene (person_id, gene_id int)

Second option is, instead of separate table for each type... Table genes, has third column, with type hstore or json, where you can store different "properies" for each gene... What is not pure SQL - but postgres can handle that very well, just you need make decision what better fits to you... 

A) create new table every time new type shows up... (Gene type, not concrete gene)
B) store all gene info in one column (as hstore or json)

Kind Regards,

Misa




On Saturday, March 30, 2013, Modulok wrote:
List,

I have a data modeling problem. That much, I know. The question is how do I
model this? (Below.)

I'm making a database which will store pseudo-genetic data. It's a basic
many-to-many setup::

    create table person(
        name varchar(32) primary key
    );
    create table gene(
        name varchar(32) primary key
    );
    create table person_gene(
        person varchar(32) references person(name),
        gene varchar(32) references gene(name)
    );

And I have data like::

    insert into person(name)
    values
        ('foo')
    ;
    insert into gene(name)
    values
        ('hair'),
        ('eye')
    ;
    insert into person_gene(person, gene)
    values
        ('foo', 'hair'),
        ('foo', 'eye')
    ;

Great. This is important as I need to be able to ask questions like "who
carries gene 'x'?" as well as "what genes does person 'y' carry?" But then
things get thorny...

I also need to store the properties of the individual genes (the alleles). This
is akin to an instance of one of the many gene classes in my application code.
So I make more tables::

    create table hair(
        id serial primary key,
        density float,
        thickness float
    );
    create table eye(
        id serial primary key,
        pupil_type int
    );

How do I store a reference to this data? I'd add a column to the person_gene
table, but it points to what? I can't reference a column name because they're
all stored in different tables. I also can't store them in the same table, as
they all store different data. Do I store the *table name* itself in a column
of the gene_table? (Smells like a klude.)

A person might not carry all genes. The number of genes in existence is not
fixed. New ones are introduced infrequently. There may be genes that no one
carries. (I assume I just make a new table each time a new gene is introduced?)

I thought about just pickling/marshaling the instances of my various gene
classes and just having a single 'genes' table which has a blob column but I
hesitate to do that because I want to be able to do queries on aggregate allele
stats. Things like "how many persons have pupil type 1?", etc.

It's late and I've probably over complicated it. Any pointers or advice on how
to model this would be greatly appreciated.

Cheers!
-Modulok-


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Julian
Date:
Subject: Re: Money casting too liberal?
Next
From: Alban Hertroys
Date:
Subject: Re: data modeling genes and alleles... help!