Re: Aggregate functions on groups - Mailing list pgsql-general

From Morris de Oryx
Subject Re: Aggregate functions on groups
Date
Msg-id CAKqncciB=yzfCpcbANb20NBxxVuQcmWOkUi8=2fYuaKuP-w9fQ@mail.gmail.com
Whole thread Raw
In response to Aggregate functions on groups  (Rich Shepard <rshepard@appl-ecosys.com>)
Responses Re: Aggregate functions on groups
List pgsql-general
Your tributaries and fish master tables make sense. If I read your code right, you're grouping by too many columns. I flattened the data into a survey table for this simple example:

select tributary,
       common_name,
       scientific_name,
       sum(count_value) as fish_seen,
       count(count_value) as observations_made
       
   from survey
   
   group by 1,2,3 -- The GROUP BY clause can use positions on the select list, if you feel like typing less.


tributary                              common_name     scientific_name        fish_seen      observations_made
Anderson Creek trib to Nehalem River   Black crappie   Pomoxis nigromaculatus         3                      2
Anderson Creek trib to Nehalem River   Brook trout     Salvelinus fontinalis          3                      2
Anderson Creek trib to Nehalem River   Bluegill        Lepomis macrochirus            3                      2
Anderson Creek trib to Nehalem River   Brown bullhead  Ameiurus nebulosus             3                      2

But this is not why I'm answering. I'm responding as I wanted to make sure that you're aware of the pg-similarity extension:


This tool implements a *lot* of similarity measures for fuzzy cmparisons. Some are sting-oriented algorithms (Jaro-Winkler, Soundex, Levenshtein, etc.), and others derive from and/or apply to field population comparisons, like the Jaccard and Dice Coefficients. There's a lot of great stuff in the package.

On Sat, Aug 31, 2019 at 3:14 AM Rich Shepard <rshepard@appl-ecosys.com> wrote:
Tables hold data on fish counts by stream name, species, and (unreported)
collection dates. I'm trying to write a query that returns the total number
of each species in each stream.

The latest attempt is (lines wrapped by alpine; submitted as one line):

\copy (select f.stream_tribs, f.count_value, sum(f.count_value),
i.common_name, i.sci_name  from fish_counts as f, itis as i where
f.stream_tribs like '%Nehalem River%' group by f.stream_tribs,
i.common_name, i.sci_name, f.count_value  order by f.stream_tribs,
i.common_name, i.sci_name, f.count_value) to
'/home/rshepard/projects/oregon/mohler_sand/data/fish/fishes.dat';

The returned set starts this way:

Anderson Creek trib to Nehalem River    0       0       Black crappie   Pomoxis nigromaculatus
Anderson Creek trib to Nehalem River    3       3       Black crappie   Pomoxis nigromaculatus
Anderson Creek trib to Nehalem River    0       0       Bluegill        Lepomis macrochirus
Anderson Creek trib to Nehalem River    3       3       Bluegill        Lepomis macrochirus
Anderson Creek trib to Nehalem River    0       0       Brook trout     Salvelinus fontinalis
Anderson Creek trib to Nehalem River    3       3       Brook trout     Salvelinus fontinalis
Anderson Creek trib to Nehalem River    0       0       Brown bullhead  Ameiurus nebulosus
Anderson Creek trib to Nehalem River    3       3       Brown bullhead  Ameiurus nebulosus

What I want returned would look like this:

Anderson Creek trib to Nehalem River  Black crappie  Pomoxis nigromaculatus 3
Anderson Creek trib to Nehalem River  Bluegill       Lepomis macrochirus    3
Anderson Creek trib to Nehalem River  Brook trout    Salvelinus fontinalis  3
Anderson Creek trib to Nehalem River  Brown bullhead Ameiurus nebulosus     3

I've read the manual yet must have not seen the section explaining how to
apply aggregate functions to groups.

Thanks in advance,

Rich


pgsql-general by date:

Previous
From: Ken Tanzer
Date:
Subject: Re: Exclusion constraints on overlapping text arrays?
Next
From: Francisco Olarte
Date:
Subject: Re: SQL equivalint of #incude directive ?