Thread: count (DISTINCT field) OVER ()

count (DISTINCT field) OVER ()

From
Tarlika Elisabeth Schmitz
Date:
I would like to implement the equivalent of "count (DISTINCT field) OVER
()":


SELECT
  id, name, similarity(name, 'Tooneyvara') as delta,
  count (id) OVER() AS cnt
  FROM vtown
  WHERE
      similarity(name, 'Tooneyvara') > 0.1
  ORDER BY  delta DESC

produces result:
1787    Toomyvara    0.5    4
1787    Toomevara    0.4    4
1700    Ardcroney    0.105    4
1788    Townsfield    0.1    4

What I would like is a "3" in the cnt column (ignoring id duplicates).


How can I achieve this?


The best I can come up with is:

SELECT id, name,delta,  count (*) OVER()
FROM (
SELECT DISTINCT ON (id)
  id, name, similarity(name, 'Tooneyvara') as delta
  FROM vtown
  WHERE
    similarity(name, 'Tooneyvara') > 0.1
    ORDER BY id, delta DESC
) AS x
ORDER by delta DESC

--

Best Regards,
Tarlika Elisabeth Schmitz

Re: count (DISTINCT field) OVER ()

From
David Johnston
Date:
On Nov 9, 2011, at 18:52, Tarlika Elisabeth Schmitz <postgresql6@numerixtechnology.de> wrote:

> I would like to implement the equivalent of "count (DISTINCT field) OVER
> ()":
>
>
> SELECT
>  id, name, similarity(name, 'Tooneyvara') as delta,
>  count (id) OVER() AS cnt
>  FROM vtown
>  WHERE
>      similarity(name, 'Tooneyvara') > 0.1
>  ORDER BY  delta DESC
>
> produces result:
> 1787    Toomyvara    0.5    4
> 1787    Toomevara    0.4    4
> 1700    Ardcroney    0.105    4
> 1788    Townsfield    0.1    4
>
> What I would like is a "3" in the cnt column (ignoring id duplicates).
>
>
> How can I achieve this?
>
>
> The best I can come up with is:
>
> SELECT id, name,delta,  count (*) OVER()
> FROM (
> SELECT DISTINCT ON (id)
>  id, name, similarity(name, 'Tooneyvara') as delta
>  FROM vtown
>  WHERE
>    similarity(name, 'Tooneyvara') > 0.1
>    ORDER BY id, delta DESC
> ) AS x
> ORDER by delta DESC
>
> --
>
> Best Regards,
> Tarlika Elisabeth Schmitz
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

1.  Write your main query in a WITH (CTE)
2.  Query #1 with appropriate GROUP BY clause (CTE)
3.  In the main statement JOIN 1 and 2

David J.


Re: count (DISTINCT field) OVER ()

From
Thomas Kellerer
Date:
Tarlika Elisabeth Schmitz, 10.11.2011 00:52:
> I would like to implement the equivalent of "count (DISTINCT field) OVER ()":
>
>
> SELECT
>    id, name, similarity(name, 'Tooneyvara') as delta,
>    count (id) OVER() AS cnt
>    FROM vtown
>    WHERE
>        similarity(name, 'Tooneyvara')>  0.1
>    ORDER BY  delta DESC
>
> produces result:
> 1787    Toomyvara    0.5    4
> 1787    Toomevara    0.4    4
> 1700    Ardcroney    0.105    4
> 1788    Townsfield    0.1    4
>
> What I would like is a "3" in the cnt column (ignoring id duplicates).
>
>

This should do it:

SELECT id,
        name,
        delta,
        sum(case when rn = 1 then rn else null end)  over() as distinct_id_count
FROM (
     SELECT
       id, name, similarity(name, 'Tooneyvara') as delta,
       row_number() OVER(partition by id) AS rn
       FROM vtown
       WHERE
           similarity(name, 'Tooneyvara') > 0.1
) t
ORDER BY delta DESC


Re: count (DISTINCT field) OVER ()

From
Tarlika Elisabeth Schmitz
Date:
On Thu, 10 Nov 2011 10:02:36 +0100
Thomas Kellerer <spam_eater@gmx.net> wrote:

>Tarlika Elisabeth Schmitz, 10.11.2011 00:52:
>> I would like to implement the equivalent of "count (DISTINCT id)
>> OVER ()":
>>
>>[...]
>>
>> produces result:
>> id, name, delta, cnt
>> 1787    Toomyvara    0.5    4
>> 1787    Toomevara    0.4    4
>> 1700    Ardcroney    0.105    4
>> 1788    Townsfield    0.1    4
>>
>
>This should do it:
>
>SELECT id,
>        name,
>        delta,
>        sum(case when rn = 1 then rn else null end)  over() as
> distinct_id_count
>FROM (
>     SELECT
>       id, name, similarity(name, 'Tooneyvara') as delta,
>       row_number() OVER(partition by id) AS rn
>       FROM vtown
>       WHERE
>           similarity(name, 'Tooneyvara') > 0.1
>) t
>ORDER BY delta DESC
>


I like you suggestion, Thomas. It is not that dissimilar from but
cleaner than my original SELECT ... FROM (SELECT DISTINCT ON(id)
attempt.
It's also very slightly faster.

Here's another, slightly shorter, variation of your suggestion:

SELECT id, name, delta,
       max(rank)  OVER() as cnt
FROM (
     SELECT
       id, name, similarity(name, 'Tooneyvara') as delta,
       dense_rank() OVER(ORDER BY id) AS rank
       FROM vtown
       WHERE
           similarity(name, 'Tooneyvara') > 0.1
) t
ORDER BY delta DESC






Re: count (DISTINCT field) OVER ()

From
Thomas Kellerer
Date:
Tarlika Elisabeth Schmitz, 10.11.2011 11:24:
>> SELECT id,
>>         name,
>>         delta,
>>         sum(case when rn = 1 then rn else null end)  over() as
>> distinct_id_count
>>FROM (
>>      SELECT
>>        id, name, similarity(name, 'Tooneyvara') as delta,
>>        row_number() OVER(partition by id) AS rn
>>        FROM vtown
>>        WHERE
>>            similarity(name, 'Tooneyvara')>  0.1
>> ) t
>> ORDER BY delta DESC
>>
>
>
> I like you suggestion, Thomas. It is not that dissimilar from but
> cleaner than my original SELECT ... FROM (SELECT DISTINCT ON(id)
> attempt.
> It's also very slightly faster.
>
> Here's another, slightly shorter, variation of your suggestion:
>
> SELECT id, name, delta,
>         max(rank)  OVER() as cnt
> FROM (
>       SELECT
>         id, name, similarity(name, 'Tooneyvara') as delta,
>         dense_rank() OVER(ORDER BY id) AS rank
>         FROM vtown
>         WHERE
>             similarity(name, 'Tooneyvara')>  0.1
> ) t
> ORDER BY delta DESC

Nice trick with the dense_rank(), never thought of that.

Regards
Thomas