Re: Recursive Arrays 101 - Mailing list pgsql-general

From Adrian Klaver
Subject Re: Recursive Arrays 101
Date
Msg-id 562EA0E6.5090500@aklaver.com
Whole thread Raw
In response to Re: Recursive Arrays 101  (David Blomstrom <david.blomstrom@gmail.com>)
Responses Re: Recursive Arrays 101  (David Blomstrom <david.blomstrom@gmail.com>)
List pgsql-general
On 10/26/2015 01:51 PM, David Blomstrom wrote:
> I'm focusing primarily on vertebrates at the moment, which have a total
> of (I think) about 60,000-70,000 rows for all taxons (species, families,
> etc.). My goal is to create a customized database that does a really
> good job of handling vertebrates first, manually adding a few key
> invertebrates and plants as needed.
>
> I couldn't possibly repeat the process with invertebrates or plants,
> which are simply overwhelming. So, if I ever figure out the Catalogue of
> Life's database, then I'm simply going to modify its tables so they work
> with my system. My vertebrates database will override their vertebrate
> rows (except for any extra information they have to offer).
>
> As for "hand-entry," I do almost all my work in spreadsheets. I spent a
> day or two copying scientific names from the Catalogue of Life into my
> spreadsheet. Common names and slugs (common names in a URL format) is a
> project that will probably take years. I might type a scientific name or
> common name into Google and see where it leads me. If a certain
> scientific name is associated with the common name "yellow birch," then
> its slug becomes yellow-birch. If two or more species are called yellow
> birch, then I enter yellow-birch in a different table ("Floaters"),
> which leads to a disambiguation page.
>
> For organisms with two or more popular common names - well, I haven't
> really figured that out yet. I'll probably have to make an extra table
> for additional names. Catalogue of Life has common names in its
> database, but they all have upper case first letters - like American
> Beaver. That works fine for a page title but in regular text I need to
> make beaver lowercase without changing American. So I'm just starting
> from square one and recreating all the common names from scratch.

I think there has to be a better way as this is just a formatting issue
Can't remember what programming language you are working in, but in Python:

In [13]: s = 'American Beaver'

In [14]: s.capitalize()
Out[14]: 'American beaver'

In [15]: s.lower()
Out[15]: 'american beaver'

>
> It gets still more complicated when you get into "specialist names." ;)
> But the system I've set up so far seems to be working pretty nicely.
>
> On Mon, Oct 26, 2015 at 1:41 PM, Rob Sargent <robjsargent@gmail.com
> <mailto:robjsargent@gmail.com>> wrote:
>
>     On 10/26/2015 02:29 PM, David Blomstrom wrote:
>
>         Sorry for the late response. I don't have Internet access at
>         home, so I only post from the library or a WiFi cafe.
>
>         Anyway, where do I begin?
>
>         Regarding my "usage patterns," I use spreadsheets (Apple's
>         Numbers program) to organize data. I then save it as a CSV file
>         and import it into a database table. It would be very hard to
>         break with that tradition, because I don't know of any other way
>         to organize my data.
>
>         On the other hand, I have a column (Rank) that identifies
>         different taxonomic levels (kingdom, class, etc.). So I can
>         easily sort a table into specific taxonomic levels and save one
>         level at a time for a database table.
>
>         There is one problem, though. I can easily put all the
>         vertebrate orders and even families into a table. But genera
>         might be harder, and species probably won't work; there are
>         simply too many. My spreadsheet program is almost overwhelmed by
>         fish species alone. The only solution would be if I could import
>         Mammals.csv, then import Birds.csv, Reptiles.csv, etc. But that
>         might be kind of tedious, especially if I have to make multiple
>         updates.
>
>     Yes I suspect you spreadsheet will be limited in rows, but of course
>     you can send all the spreadsheets to a single table in the database.
>     If that's what you want.  You don't have to, but you see mention of
>     tables millions of records routinely.  On the other hand, if
>     performance becomes an issue with the single table approach you
>     might want to look at "partitioning".  But I would be surprised if
>     you had to go there.
>
>     What is your data source?  How much hand-entry are you doing? There
>     are tools which (seriously) upgrade the basic 'COPY into <table>'
>     command.
>
>
>         As for "attributes," I'll post my table's schema, with a
>         description, next.
>
>
>
>
>
> --
> David Blomstrom
> Writer & Web Designer (Mac, M$ & Linux)
> www.geobop.org <http://www.geobop.org>


--
Adrian Klaver
adrian.klaver@aklaver.com


pgsql-general by date:

Previous
From: Gavin Flower
Date:
Subject: Re: Recursive Arrays 101
Next
From: Adrian Klaver
Date:
Subject: Re: Importing CSV File