Re: Count of records in a row - Mailing list pgsql-general

From Robert James
Subject Re: Count of records in a row
Date
Msg-id CAGYyBggJjXfDUQv10D=PV8PHKF7Ovff5+fh6w0UhQ7AQRHuzig@mail.gmail.com
Whole thread Raw
In response to Re: Count of records in a row  (Elliot <yields.falsehood@gmail.com>)
List pgsql-general
Ingenious!

I actually think, however, there was a subtle bug in, though I see you fixed it.

The line:
              - row_number() over () as d
needs to be:
              - row_number() over (order by i asc) as d

I discovered this when working your code into my application.  I got
very, very weird results - with one order of columns in the select, I
got the correct answer, but with another one I didn't.  After much
debugging, I realized that the original version ("- row_number over
()") wasn't defined! So, depending on how I wrote the select
statement, Postgres could pick different orders!

But I see your cleaned up version already fixed this!

On 10/25/13, Elliot <yields.falsehood@gmail.com> wrote:
> Glad I could help. It's easier to understand if you break apart the CTE.
> I'm also moving around the order by i to clean this up a little. Sorry
> for the formatting.
>
> Running this:
>     select i,
>            val,
>            row_number() over (partition by val order by i asc) as class_i,
>            row_number() over (order by i asc) as overall_i,
>            row_number() over (partition by val order by i asc)
>              - row_number() over () as d
>     from data
>
> Yields this:
> i    val    class_i    overall_i    d
> 1    A    1    1    0
> 2    A    2    2    0
> 3    A    3    3    0
> 4    B    1    4    -3
> 5    C    1    5    -4
> 6    A    4    6    -2
> 7    D    1    7    -6
> 8    A    5    8    -3
> 9    A    6    9    -3
> 10    D    2    10    -8
> 11    D    3    11    -8
> 12    B    2    12    -10
> 13    C    2    13    -11
> 14    C    3    14    -11
>
> class_i counts the row number within a class and overall_i counts the
> overall row number in the sequence. Here's just one class extracted to
> emphasize that:
>
> i    val    class_i    overall_i    d
> 1    A    1    1    0
> 2    A    2    2    0
> 3    A    3    3    0
> 6    A    4    6    -2
> 8    A    5    8    -3
> 9    A    6    9    -3
>
> Within a given consecutive run of a particular class the difference
> between class_i and overall_i will always be the same (because they're
> both increasing by the same amount) but that difference between runs
> will always be different (because each run starts the sequences at
> different offsets). "d" is the difference of the two. Because that value
> segments the runs, all that needs to be done is group by it and count
> the items in the group to get the length of the runs.
>
> The xxx column was junk left over from copying and pasting and
> verifying. Apologies :). This is a cleaned up version:
>
> with x
> as
> (
>    select i,
>           val,
>           row_number() over (partition by val order by i asc)
>             - row_number() over (order by i asc) as d
>    from data
> )
> select val,
>         count(*)
> from x
> group by d,
>           val
> order by min(i)
> ;
>
>


pgsql-general by date:

Previous
From: Alan Hodgson
Date:
Subject: Re: PostgreSQL Point In Time Recovery
Next
From: Robert James
Date:
Subject: Detecting change in event properties