Thread: Q: data modeling with inheritance
This is a question about data modeling with inheritance and a way to circumvent the limitation that primary keys are not inherited. I'm considering a project to model genomic variants and their associated phenotypes. (Phenotype is a description of the observable trait, such as disease or hair color.) There are many types of variation, many types of phenotypes, and many types of association. By "type", I mean that they have distinct structure (column names and inter-row dependencies). The abstract relations might look like this: variant association phenotype ------- ----------- --------- variant_id --------- variant_id +------- phenotype_id genome_id phenotype_id -----+ short_descr strand origin_id (i.e., who) long_descr start_coord ts (timestamp) stop_coord There are several types of variants, such as insertions, deletions, inversions, copy-number variants, single nucleotide polymorphisms, translocations, and unknowable future genomic shenanigans. Phenotypes might come from ontologies or controlled vocabularies that need a graph structure, others domains might be free text. Each is probably best-served by a subclass table. Associations might be quantitative or qualitative, and would come from multiple origins. The problem that arises is the combinatorial nature of the schema design coupled with the lack of inherited primary keys. In the current state of PG, one must (I think) make joining tables (association subclasses) for every combination of referenced foreign keys (variant and phenotype subclasses). So, how would you model this data? Do I ditch inheritance? Thanks, Reece
> > variant association phenotype > ------- ----------- --------- > variant_id --------- variant_id +------- phenotype_id > genome_id phenotype_id -----+ short_descr > strand origin_id (i.e., who) long_descr > start_coord ts (timestamp) > stop_coord > Is an association, for example, an experiment that establishes a dependent relationship? So could there be multiple associations between variant and phenotype? > The problem that arises is the combinatorial nature of the schema design > coupled with the lack of inherited primary keys. In the current state > of PG, one must (I think) make joining tables (association subclasses) > for every combination of referenced foreign keys (variant and phenotype > subclasses). > Is your concern that the number of joins will grow exponentially in the number of variants and phenotypes? > So, how would you model this data? Do I ditch inheritance? I've put some thought into representing an ontology via table inheritance, and I've never been able to figure out a good way ( of course, that's not to say one doesn't exist... ) . If I understand your problem correctly, I would use composite keys ( ie ( variant type, id ) ) and then use an extra join to separate ontology tables to restrict searches to specific branches. So all variants would be stored in the variants table, all phenotypes are in the phenotypes table, and you join through association. It's not as elegant as inheritance, but it will be much more flexible in both the types of queries that you can write and in case your requirements change. -Nathan
On Thu, Jul 02, 2009 at 01:54:04PM -0700, Reece Hart wrote: > This is a question about data modeling with inheritance and a way to > circumvent the limitation that primary keys are not inherited. I'm missing what you're doing here that foreign keys don't cover. Could you send along your DDL? Just generally, I've only found table inheritance useful for partitioning. "Polymorphic" foreign key constraints can be handled other ways such as the one sketched out below. http://archives.postgresql.org/sfpug/2005-04/msg00022.php Cheers, David. > > I'm considering a project to model genomic variants and their associated > phenotypes. (Phenotype is a description of the observable trait, such as > disease or hair color.) There are many types of variation, many types of > phenotypes, and many types of association. By "type", I mean that they > have distinct structure (column names and inter-row dependencies). The > abstract relations might look like this: > > variant association phenotype > ------- ----------- --------- > variant_id --------- variant_id +------- phenotype_id > genome_id phenotype_id -----+ short_descr > strand origin_id (i.e., who) long_descr > start_coord ts (timestamp) > stop_coord > > > There are several types of variants, such as insertions, deletions, > inversions, copy-number variants, single nucleotide polymorphisms, > translocations, and unknowable future genomic shenanigans. > > Phenotypes might come from ontologies or controlled vocabularies that > need a graph structure, others domains might be free text. Each is > probably best-served by a subclass table. > > Associations might be quantitative or qualitative, and would come from > multiple origins. > > The problem that arises is the combinatorial nature of the schema design > coupled with the lack of inherited primary keys. In the current state > of PG, one must (I think) make joining tables (association subclasses) > for every combination of referenced foreign keys (variant and phenotype > subclasses). > > So, how would you model this data? Do I ditch inheritance? > > Thanks, > Reece > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Thu, 2009-07-02 at 19:19 -0700, Nathan Boley wrote:
-Reece
Exactly. You might have one group say that allele X "causes" some trait, whereas another group might report a more precise increase in odds ratio (for example) for the same genotype/phenotype.Is an association, for example, an experiment that establishes a dependent relationship? So could there be multiple associations between variant and phenotype?
Not the number of joins, but the number of association subclasses. If I have Nv variant subclasses and Np phenotype subclasses, I'd need Nv * Np association subclasses. Multiply that by the number of association subclasses.Is your concern that the number of joins will grow exponentially in the number of variants and phenotypes?
Thanks. I had considered that too and that's probably what I'll end up using.So all variants would be stored in the variants table, all phenotypes are in the phenotypes table, and you join through association.
-Reece
On Fri, 2009-07-03 at 11:29 -0700, David Fetter wrote:
Thanks,
Reece
I'm missing what you're doing here that foreign keys don't cover.No DDL yet... I'm just in the thinking stages. FKs technically would do it, but would become unwieldy. The intention was to have subclasses of each of the variant, association, and phenotype tables. That leads to the polymorphic key problem.
Could you send along your DDL?
Just generally, I've only found table inheritance useful forThat answers the question -- I do want polymorphic foreign keys. Dang.
partitioning. "Polymorphic" foreign key constraints can be handled
other ways such as the one sketched out below.
Thanks,
Reece
On Fri, Jul 03, 2009 at 05:37:20PM -0700, Reece Hart wrote: > On Fri, 2009-07-03 at 11:29 -0700, David Fetter wrote: > > > I'm missing what you're doing here that foreign keys don't cover. > > Could you send along your DDL? > > No DDL yet... I'm just in the thinking stages. FKs technically would > do it, but would become unwieldy. The intention was to have > subclasses of each of the variant, association, and phenotype > tables. That leads to the polymorphic key problem. How many (order of magnitude) are we talking about here? > > Just generally, I've only found table inheritance useful for > > partitioning. "Polymorphic" foreign key constraints can be > > handled other ways such as the one sketched out below. > > That answers the question -- I do want polymorphic foreign keys. > Dang. It solved some real-world problems I had at the time, mostly from the game space. My biology is a little rusty :/ Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate