Re: Recursive Arrays 101 - Mailing list pgsql-general

From David Blomstrom
Subject Re: Recursive Arrays 101
Date
Msg-id CAA54Z0gqWzu42MFAsET9=KS48=+tv8LyRHpS+zXcUOXe-o9FyA@mail.gmail.com
Whole thread Raw
In response to Re: Recursive Arrays 101  (Alban Hertroys <haramrae@gmail.com>)
Responses Re: Recursive Arrays 101
Re: Recursive Arrays 101
Re: Recursive Arrays 101
List pgsql-general
@ Adrian Klaver: Oh, so you're suggesting I make separate tables for kingdoms, classes and on down to species. I'll research foreign keys and see what I can come up with. I hope I can make separate tables for mammal species, bird species, fish species, etc. There are just so many species - especially fish - the spreadsheets I use to organize them are just about maxed out as it is.

I've been using the Catalogue of Life as a guide, but I'm limited because I can never get their downloads to work. So all I can do is go to their website and copy a bunch of genera and species at a time.

However, I did open up some of the tables I downloaded and was amazed at how apparently amateurish they are. Yet their site works just fine and is fast enough.

@ Alban Hertroys: What does EOL mean? It reminds me of Encyclopedia of Life, which is doing what I was attempting to do years ago.

On Sun, Oct 25, 2015 at 2:07 PM, Alban Hertroys <haramrae@gmail.com> wrote:

> On 25 Oct 2015, at 19:38, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
>
> On 10/25/2015 11:12 AM, David Blomstrom wrote:
>> I'm sorry, I don't know exactly what you mean by "definitions." The
>> fields Taxon and Parent are both varchar, with a 50-character limit.
>> ParentID is int(1).
>
> By definition I meant the schema, so from the below:
>
> CREATE TABLE t (
> N INT(6) default None auto_increment,
> Taxon varchar(50) default NULL,
> Parent varchar(25) default NULL,
> NameCommon varchar(50) default NULL,
> Rank smallint(2) default 0
> PRIMARY KEY (N)
> ) ENGINE=MyISAM

That can indeed be solved using a hierarchical query (provided you have a suitable table in PG); something akin to:

WITH RECURSIVE taxons AS (
--  Hierarchical root nodes
        SELECT N AS id, Taxon, Rank, 1 AS level, '' || N AS Path -- A useful addition explained further down
        FROM t
        WHERE ParentID IS NULL

-- Child nodes
        UNION ALL
        SELECT  N AS id, Taxon, Rank, taxons.level +1 AS level, taxons.Path || ':' || N AS Path
        FROM taxons
        JOIN t ON taxons.id = t.ParentID
)
SELECT id, Taxon, Rank, level
FROM taxons
ORDER BY Path
;

The Path-bit looks complicated, but basically that just appends ID's within the same hierarchy such that, when sorted on that field, you get the hierarchy in their hierarchical order. What the hierarchy would look like if it were shown as a file hierarchy with sub-directories expanded, for example. That's pretty much the only viable alternative (alternatives vary on the column used to create the hierarchy), which is why I added it to the example.

The fun thing with hierarchical queries is that you can add all kinds of extra information and make it trickle down to the child nodes, such as the items that make up the root of the hierarchy (pretty useful for grouping), for example or a field that calculates a string to prepend for indentation, etc. Or a computation that depends on values in parent items (I used this successfully in a bill of materials to calculate absolute quantities by volume, quantities by weight and cost of components in the end product where they were given relative to 1 kg of their parent, for example).

It's highly flexible and powerful (and standard SQL), but it takes a bit of time to get in the right mindset.

PS. I usually write my hierarchical queries in Oracle, which isn't quite as good at them as Postgres is, but it's what we have @work. Hence, I'm not sure I got the syntax 100% correct. We're working on getting PG in for a project upgrade (replacing RDB on OpenVMS, which will go EOL in <10 years!) - fingers crossed.

Cheers!

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.




--
David Blomstrom
Writer & Web Designer (Mac, M$ & Linux)
www.geobop.org

pgsql-general by date:

Previous
From: Jim Nasby
Date:
Subject: Re: Duplicate rows during pg_dump
Next
From: John R Pierce
Date:
Subject: Re: Recursive Arrays 101