Home > mailing lists

Re: Count of records in a row - Mailing list pgsql-general

From	Elliot
Subject	Re: Count of records in a row
Date	October 25, 2013 12:26:26
Msg-id	526A6365.4000108@gmail.com Whole thread Raw
In response to	Re: Count of records in a row (Robert James <srobertjames@gmail.com>)
Responses	Re: Count of records in a row
List	pgsql-general

Tree view

On 2013-10-24 17:09, Robert James wrote:
> On 10/22/13, Elliot <yields.falsehood@gmail.com> wrote:
>> It looks like you already found a solution, but here's one with a CTE. I
>> cobbled this together from an older query I had for doing something
>> similar, for which I unfortunately lost the original source of this
>> approach. Also, this implies that there is something that gives an
>> ordering to these rows (in this case, the field "i").
>>
>> create temp table data (i int, val char);
>>
>> insert into data (val, i)
>> values
>> ('A',1),
>> ('A',2),
>> ('A',3),
>> ('B',4),
>> ('C',5),
>>
>> with x
>> as
>> (
>>     select i,
>>            row_number() over () as xxx,
>>            val,
>>            row_number() over (partition by val order by i asc)
>>              - row_number() over () as d
>>     from data
>>     order by i
>> )
>> select val,
>>          count(*)
>> from x
>> group by d,
>>            val
>> order by min(i)
>> ;
> Elliot - Thanks for this great solution; I've tested in on my data and
> it gives great results.
>
> I'd like to understand your code.  I believe I understand most of it.
> Can you explain what 'd' is?
>
> And this clause "row_number() over (partition by val order by i asc) -
> row_number() over () as d"?
>
> (Hey, while I'm at it, is there a descriptive name for "x" too?)
>
> Thanks
Glad I could help. It's easier to understand if you break apart the CTE.
I'm also moving around the order by i to clean this up a little. Sorry
for the formatting.

Running this:
    select i,
           val,
           row_number() over (partition by val order by i asc) as class_i,
           row_number() over (order by i asc) as overall_i,
           row_number() over (partition by val order by i asc)
             - row_number() over () as d
    from data

Yields this:
i    val    class_i    overall_i    d
1    A    1    1    0
2    A    2    2    0
3    A    3    3    0
4    B    1    4    -3
5    C    1    5    -4
6    A    4    6    -2
7    D    1    7    -6
8    A    5    8    -3
9    A    6    9    -3
10    D    2    10    -8
11    D    3    11    -8
12    B    2    12    -10
13    C    2    13    -11
14    C    3    14    -11

class_i counts the row number within a class and overall_i counts the
overall row number in the sequence. Here's just one class extracted to
emphasize that:

i    val    class_i    overall_i    d
1    A    1    1    0
2    A    2    2    0
3    A    3    3    0
6    A    4    6    -2
8    A    5    8    -3
9    A    6    9    -3

Within a given consecutive run of a particular class the difference
between class_i and overall_i will always be the same (because they're
both increasing by the same amount) but that difference between runs
will always be different (because each run starts the sequences at
different offsets). "d" is the difference of the two. Because that value
segments the runs, all that needs to be done is group by it and count
the items in the group to get the length of the runs.

The xxx column was junk left over from copying and pasting and
verifying. Apologies :). This is a cleaned up version:

with x
as
(
   select i,
          val,
          row_number() over (partition by val order by i asc)
            - row_number() over (order by i asc) as d
   from data
)
select val,
        count(*)
from x
group by d,
          val
order by min(i)
;

pgsql-general by date:

From: Marc Mamin
Date: 25 October 2013, 11:20:31
Subject: pg_upgrade 9.1.9 ->9.3.1

From: Peter Eisentraut
Date: 25 October 2013, 12:30:47
Subject: Re: pg_upgrade 9.1.9 ->9.3.1

Re: Count of records in a row - Mailing list pgsql-general

Previous

Next