Re: select top_countries and for each country, select top_cities in that country, in 1 query - Mailing list pgsql-general

From John McKown
Subject Re: select top_countries and for each country, select top_cities in that country, in 1 query
Date
Msg-id CAAJSdjgRdi3Wn2AP6U1hftc1wpq32fY+swckF1P9q6hwu6a1Qg@mail.gmail.com
Whole thread Raw
In response to select top_countries and for each country, select top_cities in that country, in 1 query  (Dorian Hoxha <dorian.hoxha@gmail.com>)
Responses Re: select top_countries and for each country, select top_cities in that country, in 1 query  (John McKown <john.archie.mckown@gmail.com>)
List pgsql-general
On Mon, Aug 18, 2014 at 9:28 AM, Dorian Hoxha <dorian.hoxha@gmail.com> wrote:
I have CREATE TABLE t (country text, city text);

I want to get with 1 query,

select count(country),country GROUP BY country ORDER BY count(country) DESC


And for each country, to get the same for cities.

Is it possible ?

Thanks

Just to be sure that I understand what you want. Will you have a given country, city combination occur multiple times? And, if so, you want to know how many times such a combination occurs (count city by country)? So you want a result which has one row for each country, city combination which lists the city name, country name, count(city within country), count(total times country occurs,regardless  of city). It also depends on what you mean by "one query". I would hope you mean "sending a single SELECT command to the database server and getting the result set'. I ask because you could possibly do something like:

SELECT 'city',count(city),city,country FROM t GROUP BY city, country ORDER BY 2 DESC
UNION
SELECT 'country',count(country), country, NULL FROM t GROUP BY country ORDER BY 2 DESC

But I'm thinking that's not what you want.

Note: ORDER BY 2 means to order by the second output column, which is just a shorter way of saying "count(....)". I'm lazy.

===
Now, assuming that what I think you want really is what you want, I have some "hairy" SQL for you to try. The SQL query below seemed to work for me on some test data that I just "faked up" on my own.

SELECT avg(b.countcountry)::int as "CountryCount", b.country, a.city, count(a.city) as "CityCount"
FROM t AS a
INNER JOIN
(SELECT COUNT(country) AS countcountry, country FROM t GROUP BY country) AS b
ON a.country = b.country
GROUP BY b.country, a.city
ORDER BY 1 DESC,4 DESC;

You may well wonder about that "avg(b.countrycount)::int" in the top line. That is _funky_, I will admit. Why does it work? Because I group by b.country. And for each given value of b.country, b.countrycount is the same value (think about it). So if you add up "n" copies of a specific integer number, then divide by "n", you get back the same number. This is to satisfy the SQL requirement that the column either be in the GROUP BY (which I don't want) or be in an aggregate funciton. So I picked avg() as an aggregate function that "just happens" (by design) to have the original value that I need. I'm sneaky! The ::int at the end casts the result from a floating point number back to an integer. This, to me, is a nicer display and is, again, guaranteed to be correct due to the way things are being calculated. Sorry if I'm being a bit "pushy" on this point. This point is what could be the most confusing to someone else, so I want to try to explain my thought process as well as I can.

You can rearrange the columns in the SELECT, but if you do be sure to change the values in the ORDER BY to still be the same column values. Again, I'm too lazy to type in the column names.

I am not saying the above is the best possible SQL. But it did work for me in my testing. 

--
There is nothing more pleasant than traveling and meeting new people!
Genghis Khan

Maranatha! <><
John McKown

pgsql-general by date:

Previous
From:
Date:
Subject: Re: logfile character encoding
Next
From:
Date:
Subject: FW: logfile character encoding