Thread: how do i count() similar items
lets say I hve the following in the 'fruits' table: Round orange Sunkist orange navel orange strawberry blueberry sunkist orange apple how would I get something like the following: count as c | Fruit type --------------------------------- 4 | orange 2 | berry 1 | apple
In response to jackassplus <jackassplus@gmail.com>: > lets say I have the following in the 'fruits' table: > > Round orange > Sunkist orange > navel orange > strawberry > blueberry > sunkist orange > apple > > how would I get something like the following: > > count as c | Fruit type > --------------------------------- > 4 | orange > 2 | berry > 1 | apple Organize your data better. If you want to analyze data in the way you describe, then you need to store the data in a way that makes in analyzable. I'm 100% sure that someone can provide you with some query magic that will make the example you provided produce the results you're wanting. I'm also 100% sure that a few weeks or months down the line you find exceptions that will break that solution, and that said solution will never really be reliable. If you're going to analyze fruits by type, then you need a type column. You can then group by type to get count()s. To ensure data integrity, you should probably create a fruit_type table with a unique column that lists the possible types, and then foreign key the fruit_type column in the fruits table to that to ensure nothing funky is entered. An enum for type is another possibility. -- Bill Moran http://www.potentialtech.com http://people.collaborativefusion.com/~wmoran/
jackassplus wrote: > lets say I hve the following in the 'fruits' table: > > Round orange > Sunkist orange > navel orange > strawberry > blueberry > sunkist orange > apple > > how would I get something like the following: > > count as c | Fruit type > --------------------------------- > 4 | orange > 2 | berry > 1 | apple Your best solution is to have separate database fields for your 2 levels of detail, say call them "fruit type" and "variety", or maybe other fields depending on purpose. This is because the organization of fruit is somewhat arbitrary and isn't easily encoded into variety names without kludges. Also, many fruit with similar names are actually very different. So simple textual analysis of the field you have often won't be very useful. For example, a kiwifruit is a berry, and a strawberry is very different from a blueberry, the first not actually being a berry at all. There are also various citrus which are crosses between oranges and other citrus. There are also stone fruits which are crosses between plums and apricots. Also, "Sunkist" isn't a variety of orange but rather is a brand name used for multiple varieties. (Yes, members of my family cultivate fruit trees, so I have a lot of second-hand experience with this.) If you still want to go by textual analysis as you suggest, it will be nontrivial and involve pattern matching for common suffixes where some are separate words and some aren't, and you can do this pattern matching in an extra select-list item which you then group by. -- Darren Duncan
<snip> > To ensure data integrity, > you should probably create a fruit_type table with a unique column that > lists the possible types, and then foreign key the fruit_type column in > the fruits table to that to ensure nothing funky is entered. An enum > for type is another possibility. In the real world, this column actiually holds Operating Systems. I have 7 variants of Windows XP, even more of server, a dozen *nixes, etc, etc and it is fed from an external app. So I am looking for a magic query, or even a perl function to wrap up insde a procedure, whatever.
On Wed, Sep 8, 2010 at 12:22 PM, jackassplus <jackassplus@gmail.com> wrote: > > <snip> >> To ensure data integrity, >> you should probably create a fruit_type table with a unique column that >> lists the possible types, and then foreign key the fruit_type column in >> the fruits table to that to ensure nothing funky is entered. An enum >> for type is another possibility. > > In the real world, this column actiually holds Operating Systems. > I have 7 variants of Windows XP, even more of server, a dozen *nixes, > etc, etc and it is fed from an external app. > So I am looking for a magic query, or even a perl function to wrap up > insde a procedure, whatever. If your column values can be grouped by regexp, you might be able to get away with using a CASE statement. eg. select sum(case when val ~* 'windows xp' then 1 else 0 end) as winxp, sum(case when val ~* 'nix' then 1 else 0 end) as nix, sum(case when val ~* 'redhat|rhel' then 1 else 0 end) as rh .... (I don't have your original mail handy to reference the example values).
Use a CASE statement? something like: select case WHEN os ~* E'^windows' then 'windows' WHEN os ~* E'server' then 'server' WHEN os ~* E'nix$' then '*nix' else 'other' end as osval, count(*) from os_tbl group by osval order by osval; The hard part is making sure your regexes cover all the bases, without duplication. It still sounds like the value should be a reference to a unique value in a small table of operating system entries, thenstore the value, rather than the string, in the main table. Susan -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of jackassplus Sent: Wednesday, September 08, 2010 12:22 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] how do i count() similar items <snip> > To ensure data integrity, > you should probably create a fruit_type table with a unique column that > lists the possible types, and then foreign key the fruit_type column in > the fruits table to that to ensure nothing funky is entered. An enum > for type is another possibility. In the real world, this column actiually holds Operating Systems. I have 7 variants of Windows XP, even more of server, a dozen *nixes, etc, etc and it is fed from an external app. So I am looking for a magic query, or even a perl function to wrap up insde a procedure, whatever. -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
Use a CASE statement?
something like:
select case WHEN os ~* E'^windows' then 'windows'
WHEN os ~* E'server' then 'server'
WHEN os ~* E'nix$' then '*nix'
else 'other' end
as osval, count(*) from os_tbl group by osval order by osval;
The hard part is making sure your regexes cover all the bases, without duplication.
It still sounds like the value should be a reference to a unique value in a small table of operating system entries, then store the value, rather than the string, in the main table.
Susan
I'll give this a shot. looks like it will work well. Regexes, I don't mind, SQL, I'm new at. Much less the pg functions.
thanks to both you and bricklin.