Thread: how do i count() similar items

how do i count() similar items

From
jackassplus
Date:
lets say I hve the following in the 'fruits' table:

Round orange
Sunkist orange
navel orange
strawberry
blueberry
sunkist orange
apple

how would I get something like the following:

count as c | Fruit type
---------------------------------
4              | orange
2              | berry
1              | apple


Re: how do i count() similar items

From
Bill Moran
Date:
In response to jackassplus <jackassplus@gmail.com>:

> lets say I have the following in the 'fruits' table:
>
> Round orange
> Sunkist orange
> navel orange
> strawberry
> blueberry
> sunkist orange
> apple
>
> how would I get something like the following:
>
> count as c | Fruit type
> ---------------------------------
> 4              | orange
> 2              | berry
> 1              | apple

Organize your data better.

If you want to analyze data in the way you describe, then you need to store
the data in a way that makes in analyzable.  I'm 100% sure that someone can
provide you with some query magic that will make the example you provided
produce the results you're wanting.  I'm also 100% sure that a few weeks
or months down the line you find exceptions that will break that solution,
and that said solution will never really be reliable.

If you're going to analyze fruits by type, then you need a type column.
You can then group by type to get count()s.  To ensure data integrity,
you should probably create a fruit_type table with a unique column that
lists the possible types, and then foreign key the fruit_type column in
the fruits table to that to ensure nothing funky is entered.  An enum
for type is another possibility.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: how do i count() similar items

From
Darren Duncan
Date:
jackassplus wrote:
> lets say I hve the following in the 'fruits' table:
>
> Round orange
> Sunkist orange
> navel orange
> strawberry
> blueberry
> sunkist orange
> apple
>
> how would I get something like the following:
>
> count as c | Fruit type
> ---------------------------------
> 4              | orange
> 2              | berry
> 1              | apple

Your best solution is to have separate database fields for your 2 levels of
detail, say call them "fruit type" and "variety", or maybe other fields
depending on purpose.

This is because the organization of fruit is somewhat arbitrary and isn't easily
encoded into variety names without kludges.  Also, many fruit with similar names
are actually very different.  So simple textual analysis of the field you have
often won't be very useful.

For example, a kiwifruit is a berry, and a strawberry is very different from a
blueberry, the first not actually being a berry at all.  There are also various
citrus which are crosses between oranges and other citrus.  There are also stone
fruits which are crosses between plums and apricots.  Also, "Sunkist" isn't a
variety of orange but rather is a brand name used for multiple varieties.

(Yes, members of my family cultivate fruit trees, so I have a lot of second-hand
experience with this.)

If you still want to go by textual analysis as you suggest, it will be
nontrivial and involve pattern matching for common suffixes where some are
separate words and some aren't, and you can do this pattern matching in an extra
select-list item which you then group by.

-- Darren Duncan

Re: how do i count() similar items

From
jackassplus
Date:
<snip> 
> To ensure data integrity,
> you should probably create a fruit_type table with a unique column that
> lists the possible types, and then foreign key the fruit_type column in
> the fruits table to that to ensure nothing funky is entered.  An enum
> for type is another possibility.

In the real world, this column actiually holds Operating Systems.
I have 7 variants of Windows XP, even more of server, a dozen *nixes,
etc, etc and it is fed from an external app.
So I am looking for a magic query, or even a perl function to wrap up
insde a procedure, whatever.

Re: how do i count() similar items

From
bricklen
Date:
On Wed, Sep 8, 2010 at 12:22 PM, jackassplus <jackassplus@gmail.com> wrote:
>
> <snip>
>> To ensure data integrity,
>> you should probably create a fruit_type table with a unique column that
>> lists the possible types, and then foreign key the fruit_type column in
>> the fruits table to that to ensure nothing funky is entered.  An enum
>> for type is another possibility.
>
> In the real world, this column actiually holds Operating Systems.
> I have 7 variants of Windows XP, even more of server, a dozen *nixes,
> etc, etc and it is fed from an external app.
> So I am looking for a magic query, or even a perl function to wrap up
> insde a procedure, whatever.

If your column values can be grouped by regexp, you might be able to
get away with using a CASE statement.
eg.
select
sum(case when val ~* 'windows xp' then 1 else 0 end) as winxp,
sum(case when val ~* 'nix' then 1 else 0 end) as nix,
sum(case when val ~* 'redhat|rhel' then 1 else 0 end) as rh
....

(I don't have your original mail handy to reference the example values).

Re: how do i count() similar items

From
Susan Cassidy
Date:
Use a CASE statement?

something like:
select  case WHEN os ~* E'^windows' then 'windows'
   WHEN os ~* E'server' then 'server'
   WHEN os ~* E'nix$' then '*nix'
   else 'other' end
   as osval, count(*) from os_tbl  group by osval order by osval;

The hard part is making sure your regexes cover all the bases, without duplication.

It still sounds like the value should be a reference to a unique value in a small table of operating system entries,
thenstore the value, rather than the string, in the main table. 

Susan

-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of jackassplus
Sent: Wednesday, September 08, 2010 12:22 PM
To: pgsql-general@postgresql.org
Subject: Re: [GENERAL] how do i count() similar items


<snip> 
> To ensure data integrity,
> you should probably create a fruit_type table with a unique column that
> lists the possible types, and then foreign key the fruit_type column in
> the fruits table to that to ensure nothing funky is entered.  An enum
> for type is another possibility.

In the real world, this column actiually holds Operating Systems.
I have 7 variants of Windows XP, even more of server, a dozen *nixes,
etc, etc and it is fed from an external app.
So I am looking for a magic query, or even a perl function to wrap up
insde a procedure, whatever.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: how do i count() similar items

From
jackassplus
Date:
Use a CASE statement?

something like:
select  case WHEN os ~* E'^windows' then 'windows'
  WHEN os ~* E'server' then 'server'
  WHEN os ~* E'nix$' then '*nix'
  else 'other' end
  as osval, count(*) from os_tbl  group by osval order by osval;

The hard part is making sure your regexes cover all the bases, without duplication.

It still sounds like the value should be a reference to a unique value in a small table of operating system entries, then store the value, rather than the string, in the main table.

Susan

I'll give this a shot. looks like it will work well. Regexes, I don't mind, SQL, I'm new at. Much less the pg functions.
thanks to both you and bricklin.