Re: Recursive Arrays 101 - Mailing list pgsql-general

From David Blomstrom
Subject Re: Recursive Arrays 101
Date
Msg-id CAA54Z0iMJpHLzOL6NWZ5+kO8E+6kEtpmPkwLdvZQs+n3cHmn8Q@mail.gmail.com
Whole thread Raw
In response to Re: Recursive Arrays 101  (David Blomstrom <david.blomstrom@gmail.com>)
Responses Re: Recursive Arrays 101  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
List pgsql-general
I've created my first table in postgreSQL. I'd like to ask 1) if you see any errors, 2) do you have any suggestions for improving it, and 3) can you give me the code I need to paste into the shell (or whatever you call the command-line tool) to recreate it?

This is what the table's schema looks like in MySQL...

N - int(6) [Primary Key]
Taxon - varchar(50) [Unique Key]
Parent - varchar(50) [Index Key]
ParentID - tinyint(1) [Index Key]
Slug - varchar(50) [Index Key]
NameCommon - varchar(50)
Plural - varchar(50)
Extinct - tinyint(1)
Rank - tinyint(2)
Key - tinyint(1)

The table type is MyIsam, collation is latin1_general_ci

Slug, NameCommon and Plural are NULL.

All of my tables have a default first column named N or ID, which is simply a numerical key that begins with 1. It's always designated the primary key.

All the other columns in this table can be divided into two categories, text (varchar) and numerical (tinyint).

The values in the columns Taxon and Slug serve as URL's, so they can have no spaces, apostrophes, accents, etc. (Taxon handles scientific names, Slug common names, if any.) So a row focusing on the Steller's jay would have values like these:

NameCommmon - Steller&#8217;s jay
Plural - Steller&#8217;s jays
Taxon - Cyanocitta-stelleri
Slug - stellers-jay
Parent - Cyanocitta

The column ParentID - which I want to use for hierarchical relationships - has values ranging from 1 for Mammalia (the first row) to 5 for the species level. The column Extinct has the value 1 (not extinct) or 2, 3 or 4 for various categories of extinct taxons.

The column Rank has the value 25 for the first row (class Mammalia), 35 for each order (e.g. Carnivora), 45 for each family, 55 for each genus and 65 for each species. The value for Key is 1 (for every row), designating it a tetrapod. The bird, reptile and amphibian tables have the same key value, while fish, invertebrates and plants have their own unique keys.

I have Unique keys on N and Taxon, Index keys (not unique) on Parent, ParentID and Slug.

My PostgreSQL table is in a database named GeoZoo. When I go into pgAdmin3 > SQLPane, it looks like this:

CREATE TABLE public.gz_life_mammals
(
  "N" integer NOT NULL,
  "Taxon" character varying(50) NOT NULL,
  "Parent" character varying(50) NOT NULL,
  "ParentID" smallint NOT NULL,
  "Slug" character varying(50),
  "NameCommon" character varying(50),
  "Plural" character varying(50),
  "Extinct" smallint NOT NULL,
  "Rank" smallint NOT NULL,
  "Key" smallint NOT NULL,
  CONSTRAINT "Primary Key" PRIMARY KEY ("N"),
  CONSTRAINT "Unique Key" UNIQUE ("Taxon")
[I haven't added any non-unique keys yet.]
)
WITH (
  OIDS=FALSE
);
ALTER TABLE public.gz_life_mammals
  OWNER TO postgres;

I should also mention that Taxon is the column I use to UNION or JOIN this table with other tables.

P.S. If I decide to organize things by taxonomic levels (e.g. kingdom, class, etc.), then it should be easy to rename the table, delete a few columns, and refill it with data associated with a particular class.



On Mon, Oct 26, 2015 at 1:29 PM, David Blomstrom <david.blomstrom@gmail.com> wrote:
Sorry for the late response. I don't have Internet access at home, so I only post from the library or a WiFi cafe.

Anyway, where do I begin?

Regarding my "usage patterns," I use spreadsheets (Apple's Numbers program) to organize data. I then save it as a CSV file and import it into a database table. It would be very hard to break with that tradition, because I don't know of any other way to organize my data.

On the other hand, I have a column (Rank) that identifies different taxonomic levels (kingdom, class, etc.). So I can easily sort a table into specific taxonomic levels and save one level at a time for a database table.

There is one problem, though. I can easily put all the vertebrate orders and even families into a table. But genera might be harder, and species probably won't work; there are simply too many. My spreadsheet program is almost overwhelmed by fish species alone. The only solution would be if I could import Mammals.csv, then import Birds.csv, Reptiles.csv, etc. But that might be kind of tedious, especially if I have to make multiple updates.

As for "attributes," I'll post my table's schema, with a description, next.

On Mon, Oct 26, 2015 at 10:44 AM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 10/26/2015 10:33 AM, Rob Sargent wrote:
On 10/26/2015 11:14 AM, Adrian Klaver wrote:
On 10/26/2015 08:32 AM, Rob Sargent wrote:
On 10/26/2015 09:22 AM, Adrian Klaver wrote:
On 10/26/2015 08:12 AM, Rob Sargent wrote:
On 10/26/2015 08:43 AM, Jim Nasby wrote:
On 10/25/15 8:10 PM, David Blomstrom wrote:
@ Adrian Klaver: Oh, so you're suggesting I make separate tables for
kingdoms, classes and on down to species. I'll research foreign
keys and
see what I can come up with. I hope I can make separate tables for
mammal species, bird species, fish species, etc. There are just so
many
species - especially fish - the spreadsheets I use to organize them
are
just about maxed out as it is.

The suggestion is simply to have 7 tables:

CREATE TABLE kingdom(
  kingdom_id serial PRIMARY KEY
  , kingdom_name text NOT NULL
  , ...
);
CREATE TABLE phylum(
  phylum_id serial PRIMARY KEY
  , kingdom_id int NOT NULL REFERENCES kingdom
  , ...
);
CREATE TABLE class(
...
);

and so-on.
Seems to me that if life boils down to four attributes one would
have a
single table with those four attributes on the particular life form.

Out of curiosity what are those four attributes? It would have made
memorizing all those organisms a lot easier when I was in school:)

kingdom phylum class genus as attributes in species table.  Talk about
your "natural key".  The hibernate boys would love it :)

Well in this classification system it would need to be:

kingdom phylum class order family genus
Sorry, wasn't tracking carefully: 6 attributes

What makes it complicated is that these are just the slots. How
organisms are slotted depends on attributes and there are a lot of
them. This means there is a constant rearrangement in the slotting.

But at the end of the day, is it not the intent to have those six filled
per species. Is your point that maintenance would be problematic?
Agreed.  Certainly not just a single pointer redirect in a recursive
structure.  All depends on OPs usage patterns.  I personally love 'with
recursion' but it's more complicated than for example
     select count(*) from species where class = '<some class name>'
if, and only if, all 6 attributes are always there.  Which highlights
your caveat "In this classification system".

This is the current system. If you want to be historically complete then you have to take into account the ways things where classified before. Granted this is running in the crawl, walk , run sequence but it cannot be entirely ignored. Then there  are the more detailed versions of the above:

http://www.itis.gov/servlet/SingleRpt/SingleRpt?search_topic=TSN&search_value=584927

It comes done to what view of taxonomy you want to support.


Now, the four attributes could be ids into definitional tables but I
suspect the querying will be done string/name so why complicate the
lookups: make the names a foreign key in the defs if necessary.

Personally I think the recursive structure is the way to go.

Jtbc, I'm not advocating this structure but it may suit the OP's usage
patterns.







--
Adrian Klaver
adrian.klaver@aklaver.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org



--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org

pgsql-general by date:

Previous
From: Rob Sargent
Date:
Subject: Re: Recursive Arrays 101
Next
From: David Blomstrom
Date:
Subject: Re: Recursive Arrays 101