Thread: Q: data modeling with inheritance

Q: data modeling with inheritance

From
Reece Hart
Date:
This is a question about data modeling with inheritance and a way to
circumvent the limitation that primary keys are not inherited.

I'm considering a project to model genomic variants and their associated
phenotypes. (Phenotype is a description of the observable trait, such as
disease or hair color.) There are many types of variation, many types of
phenotypes, and many types of association. By "type", I mean that they
have distinct structure (column names and inter-row dependencies).  The
abstract relations might look like this:

  variant              association                phenotype
  -------              -----------                ---------
  variant_id --------- variant_id        +------- phenotype_id
  genome_id            phenotype_id -----+        short_descr
  strand               origin_id (i.e., who)      long_descr
  start_coord          ts (timestamp)
  stop_coord


There are several types of variants, such as insertions, deletions,
inversions, copy-number variants, single nucleotide polymorphisms,
translocations, and unknowable future genomic shenanigans.

Phenotypes might come from ontologies or controlled vocabularies that
need a graph structure, others domains might be free text.  Each is
probably best-served by a subclass table.

Associations might be quantitative or qualitative, and would come from
multiple origins.

The problem that arises is the combinatorial nature of the schema design
coupled with the lack of inherited primary keys.  In the current state
of PG, one must (I think) make joining tables (association subclasses)
for every combination of referenced foreign keys (variant and phenotype
subclasses).

So, how would you model this data?  Do I ditch inheritance?

Thanks,
Reece





Re: Q: data modeling with inheritance

From
Nathan Boley
Date:
>
>  variant              association                phenotype
>  -------              -----------                ---------
>  variant_id --------- variant_id        +------- phenotype_id
>  genome_id            phenotype_id -----+        short_descr
>  strand               origin_id (i.e., who)      long_descr
>  start_coord          ts (timestamp)
>  stop_coord
>

Is an association, for example, an experiment that establishes a
dependent relationship? So could there be multiple associations
between variant and phenotype?

> The problem that arises is the combinatorial nature of the schema design
> coupled with the lack of inherited primary keys.  In the current state
> of PG, one must (I think) make joining tables (association subclasses)
> for every combination of referenced foreign keys (variant and phenotype
> subclasses).
>

Is your concern that the number of joins will grow exponentially in
the number of variants and phenotypes?

> So, how would you model this data?  Do I ditch inheritance?

I've put some thought into representing an ontology via table
inheritance, and I've never been able to figure out a good way ( of
course, that's not to say one doesn't exist... ) .

If I understand your problem correctly, I would use composite keys (
ie  ( variant type, id ) ) and then use an extra join to separate
ontology tables to restrict searches to specific branches. So all
variants would be stored in the variants table, all phenotypes are in
the phenotypes table, and you join through association. It's not as
elegant as inheritance, but it will be much more flexible in both the
types of queries that you can write and in case your requirements
change.

-Nathan

Re: Q: data modeling with inheritance

From
David Fetter
Date:
On Thu, Jul 02, 2009 at 01:54:04PM -0700, Reece Hart wrote:
> This is a question about data modeling with inheritance and a way to
> circumvent the limitation that primary keys are not inherited.

I'm missing what you're doing here that foreign keys don't cover.
Could you send along your DDL?

Just generally, I've only found table inheritance useful for
partitioning.  "Polymorphic" foreign key constraints can be handled
other ways such as the one sketched out below.

http://archives.postgresql.org/sfpug/2005-04/msg00022.php

Cheers,
David.

>
> I'm considering a project to model genomic variants and their associated
> phenotypes. (Phenotype is a description of the observable trait, such as
> disease or hair color.) There are many types of variation, many types of
> phenotypes, and many types of association. By "type", I mean that they
> have distinct structure (column names and inter-row dependencies).  The
> abstract relations might look like this:
>
>   variant              association                phenotype
>   -------              -----------                ---------
>   variant_id --------- variant_id        +------- phenotype_id
>   genome_id            phenotype_id -----+        short_descr
>   strand               origin_id (i.e., who)      long_descr
>   start_coord          ts (timestamp)
>   stop_coord
>
>
> There are several types of variants, such as insertions, deletions,
> inversions, copy-number variants, single nucleotide polymorphisms,
> translocations, and unknowable future genomic shenanigans.
>
> Phenotypes might come from ontologies or controlled vocabularies that
> need a graph structure, others domains might be free text.  Each is
> probably best-served by a subclass table.
>
> Associations might be quantitative or qualitative, and would come from
> multiple origins.
>
> The problem that arises is the combinatorial nature of the schema design
> coupled with the lack of inherited primary keys.  In the current state
> of PG, one must (I think) make joining tables (association subclasses)
> for every combination of referenced foreign keys (variant and phenotype
> subclasses).
>
> So, how would you model this data?  Do I ditch inheritance?
>
> Thanks,
> Reece
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Q: data modeling with inheritance

From
Reece Hart
Date:
On Thu, 2009-07-02 at 19:19 -0700, Nathan Boley wrote:
Is an association, for example, an experiment that establishes a
dependent relationship? So could there be multiple associations
between variant and phenotype?
Exactly. You might have one group say that allele X "causes" some trait, whereas another group might report a more precise increase in odds ratio (for example) for the same genotype/phenotype.

Is your concern that the number of joins will grow exponentially in
the number of variants and phenotypes?
Not the number of joins, but the number of association subclasses. If I have Nv variant subclasses and Np phenotype subclasses, I'd need Nv * Np association subclasses. Multiply that by the number of association subclasses.

So all variants would be stored in the variants table, all phenotypes are in
the phenotypes table, and you join through association.
Thanks. I had considered that too and that's probably what I'll end up using.


-Reece

Re: Q: data modeling with inheritance

From
Reece Hart
Date:
On Fri, 2009-07-03 at 11:29 -0700, David Fetter wrote:
I'm missing what you're doing here that foreign keys don't cover.
Could you send along your DDL?
No DDL yet... I'm just in the thinking stages. FKs technically would do it, but would become unwieldy. The intention was to have subclasses of each of the variant, association, and phenotype tables. That leads to the polymorphic key problem.

Just generally, I've only found table inheritance useful for
partitioning.  "Polymorphic" foreign key constraints can be handled
other ways such as the one sketched out below.
That answers the question -- I do want polymorphic foreign keys. Dang.

Thanks,
Reece

Re: Q: data modeling with inheritance

From
David Fetter
Date:
On Fri, Jul 03, 2009 at 05:37:20PM -0700, Reece Hart wrote:
> On Fri, 2009-07-03 at 11:29 -0700, David Fetter wrote:
>
> > I'm missing what you're doing here that foreign keys don't cover.
> > Could you send along your DDL?
>
> No DDL yet... I'm just in the thinking stages. FKs technically would
> do it, but would become unwieldy. The intention was to have
> subclasses of each of the variant, association, and phenotype
> tables. That leads to the polymorphic key problem.

How many (order of magnitude) are we talking about here?

> > Just generally, I've only found table inheritance useful for
> > partitioning.  "Polymorphic" foreign key constraints can be
> > handled other ways such as the one sketched out below.
>
> That answers the question -- I do want polymorphic foreign keys.
> Dang.

It solved some real-world problems I had at the time, mostly from the
game space.  My biology is a little rusty :/

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate