This is a question about data modeling with inheritance and a way to
circumvent the limitation that primary keys are not inherited.
I'm considering a project to model genomic variants and their associated
phenotypes. (Phenotype is a description of the observable trait, such as
disease or hair color.) There are many types of variation, many types of
phenotypes, and many types of association. By "type", I mean that they
have distinct structure (column names and inter-row dependencies). The
abstract relations might look like this:
variant association phenotype
------- ----------- ---------
variant_id --------- variant_id +------- phenotype_id
genome_id phenotype_id -----+ short_descr
strand origin_id (i.e., who) long_descr
start_coord ts (timestamp)
stop_coord
There are several types of variants, such as insertions, deletions,
inversions, copy-number variants, single nucleotide polymorphisms,
translocations, and unknowable future genomic shenanigans.
Phenotypes might come from ontologies or controlled vocabularies that
need a graph structure, others domains might be free text. Each is
probably best-served by a subclass table.
Associations might be quantitative or qualitative, and would come from
multiple origins.
The problem that arises is the combinatorial nature of the schema design
coupled with the lack of inherited primary keys. In the current state
of PG, one must (I think) make joining tables (association subclasses)
for every combination of referenced foreign keys (variant and phenotype
subclasses).
So, how would you model this data? Do I ditch inheritance?
Thanks,
Reece