Re: Count of records in a row - Mailing list pgsql-general
| From | Robert James |
|---|---|
| Subject | Re: Count of records in a row |
| Date | |
| Msg-id | CAGYyBggJjXfDUQv10D=PV8PHKF7Ovff5+fh6w0UhQ7AQRHuzig@mail.gmail.com Whole thread Raw |
| In response to | Re: Count of records in a row (Elliot <yields.falsehood@gmail.com>) |
| List | pgsql-general |
Ingenious!
I actually think, however, there was a subtle bug in, though I see you fixed it.
The line:
- row_number() over () as d
needs to be:
- row_number() over (order by i asc) as d
I discovered this when working your code into my application. I got
very, very weird results - with one order of columns in the select, I
got the correct answer, but with another one I didn't. After much
debugging, I realized that the original version ("- row_number over
()") wasn't defined! So, depending on how I wrote the select
statement, Postgres could pick different orders!
But I see your cleaned up version already fixed this!
On 10/25/13, Elliot <yields.falsehood@gmail.com> wrote:
> Glad I could help. It's easier to understand if you break apart the CTE.
> I'm also moving around the order by i to clean this up a little. Sorry
> for the formatting.
>
> Running this:
> select i,
> val,
> row_number() over (partition by val order by i asc) as class_i,
> row_number() over (order by i asc) as overall_i,
> row_number() over (partition by val order by i asc)
> - row_number() over () as d
> from data
>
> Yields this:
> i val class_i overall_i d
> 1 A 1 1 0
> 2 A 2 2 0
> 3 A 3 3 0
> 4 B 1 4 -3
> 5 C 1 5 -4
> 6 A 4 6 -2
> 7 D 1 7 -6
> 8 A 5 8 -3
> 9 A 6 9 -3
> 10 D 2 10 -8
> 11 D 3 11 -8
> 12 B 2 12 -10
> 13 C 2 13 -11
> 14 C 3 14 -11
>
> class_i counts the row number within a class and overall_i counts the
> overall row number in the sequence. Here's just one class extracted to
> emphasize that:
>
> i val class_i overall_i d
> 1 A 1 1 0
> 2 A 2 2 0
> 3 A 3 3 0
> 6 A 4 6 -2
> 8 A 5 8 -3
> 9 A 6 9 -3
>
> Within a given consecutive run of a particular class the difference
> between class_i and overall_i will always be the same (because they're
> both increasing by the same amount) but that difference between runs
> will always be different (because each run starts the sequences at
> different offsets). "d" is the difference of the two. Because that value
> segments the runs, all that needs to be done is group by it and count
> the items in the group to get the length of the runs.
>
> The xxx column was junk left over from copying and pasting and
> verifying. Apologies :). This is a cleaned up version:
>
> with x
> as
> (
> select i,
> val,
> row_number() over (partition by val order by i asc)
> - row_number() over (order by i asc) as d
> from data
> )
> select val,
> count(*)
> from x
> group by d,
> val
> order by min(i)
> ;
>
>
pgsql-general by date: