Re: Count of records in a row - Mailing list pgsql-general
From | Robert James |
---|---|
Subject | Re: Count of records in a row |
Date | |
Msg-id | CAGYyBggJjXfDUQv10D=PV8PHKF7Ovff5+fh6w0UhQ7AQRHuzig@mail.gmail.com Whole thread Raw |
In response to | Re: Count of records in a row (Elliot <yields.falsehood@gmail.com>) |
List | pgsql-general |
Ingenious! I actually think, however, there was a subtle bug in, though I see you fixed it. The line: - row_number() over () as d needs to be: - row_number() over (order by i asc) as d I discovered this when working your code into my application. I got very, very weird results - with one order of columns in the select, I got the correct answer, but with another one I didn't. After much debugging, I realized that the original version ("- row_number over ()") wasn't defined! So, depending on how I wrote the select statement, Postgres could pick different orders! But I see your cleaned up version already fixed this! On 10/25/13, Elliot <yields.falsehood@gmail.com> wrote: > Glad I could help. It's easier to understand if you break apart the CTE. > I'm also moving around the order by i to clean this up a little. Sorry > for the formatting. > > Running this: > select i, > val, > row_number() over (partition by val order by i asc) as class_i, > row_number() over (order by i asc) as overall_i, > row_number() over (partition by val order by i asc) > - row_number() over () as d > from data > > Yields this: > i val class_i overall_i d > 1 A 1 1 0 > 2 A 2 2 0 > 3 A 3 3 0 > 4 B 1 4 -3 > 5 C 1 5 -4 > 6 A 4 6 -2 > 7 D 1 7 -6 > 8 A 5 8 -3 > 9 A 6 9 -3 > 10 D 2 10 -8 > 11 D 3 11 -8 > 12 B 2 12 -10 > 13 C 2 13 -11 > 14 C 3 14 -11 > > class_i counts the row number within a class and overall_i counts the > overall row number in the sequence. Here's just one class extracted to > emphasize that: > > i val class_i overall_i d > 1 A 1 1 0 > 2 A 2 2 0 > 3 A 3 3 0 > 6 A 4 6 -2 > 8 A 5 8 -3 > 9 A 6 9 -3 > > Within a given consecutive run of a particular class the difference > between class_i and overall_i will always be the same (because they're > both increasing by the same amount) but that difference between runs > will always be different (because each run starts the sequences at > different offsets). "d" is the difference of the two. Because that value > segments the runs, all that needs to be done is group by it and count > the items in the group to get the length of the runs. > > The xxx column was junk left over from copying and pasting and > verifying. Apologies :). This is a cleaned up version: > > with x > as > ( > select i, > val, > row_number() over (partition by val order by i asc) > - row_number() over (order by i asc) as d > from data > ) > select val, > count(*) > from x > group by d, > val > order by min(i) > ; > >
pgsql-general by date: