Home > mailing lists

grouping consecutive records - Mailing list pgsql-general

From	Morus Walter
Subject	grouping consecutive records
Date	February 4, 2013 12:35:01
Msg-id	20130204103454.0b3c6b23@tucholsky.experteer.muc Whole thread Raw
Responses	Re: grouping consecutive records
List	pgsql-general

Tree view

Hallo,

I have a question regarding a selection.

I'd like to group and merge certain records having the same values in
some columns, but only if they are contiguous with regard to some sort
order.

So for a table
create table foo (
       id int,
       user_id int,
       key varchar,
       sort int );

and values e.g.
insert into foo
       values ( 1, 1, 'foo', 1),
              ( 2, 1, 'foo', 2),
              ( 3, 1, 'bar', 3),
              ( 4, 1, 'foo', 4),
              ( 5, 1, 'foo', 5),
              ( 6, 1, 'foo', 6),
              ( 7, 1, 'bla', 7),
              ( 8, 2, 'bar', 1),
              ( 9, 2, 'foo', 2),
              (10, 2, 'foo', 3),
              (11, 2, 'bla', 4);

I'd like to merge all consecutive records (ordered by sort, user_id)
having the same value in user_id and key and keep the first/last
value of sort of the merged records (and probably some more values
from the first or last merged record).

So the result should be something like
user_id, key, sort_first, sort_last
1, 'foo', 1, 2
1, 'bar', 3, 3
1, 'foo', 4, 6
1, 'bla', 7, 7
2, 'bar', 1, 1
2, 'foo', 2, 3
2, 'bla', 4, 4

I was trying to do that using window functions, which works great -
except it merges non consecutive occurences (key foo for user_id 1 in
my sample) as well.

select user_id, key, sort_first, sort_last
from (
  select user_id,
         key,
         first_value(sort) over w as sort_first,
         last_value(sort) over w as sort_last,
         lead(key) over w as next_key
  from foo
  window w as (partition by user_id, key order by sort
               range between unbounded preceding and unbounded following)
) as foo
where next_key is null
order by user_id, sort_first;

user_id | key | sort_first | sort_last
---------+-----+------------+-----------
       1 | foo |          1 |         6     <-- would like to have two records
                                                1/2 and 4/6 here
       1 | bar |          3 |         3
       1 | bla |          7 |         7
       2 | bar |          1 |         1
       2 | foo |          2 |         3
       2 | bla |          4 |         4

Introducing another window on user_id only allows me to keep two records for
1/foo but I still cannot determine the intended sort_first/sort_last.

select user_id, key, sort_first, sort_last
from (
  select user_id,
         key,
         first_value(sort) over w as sort_first,
         last_value(sort) over w as sort_last,
         lead(key) over u as next_key
  from foo
  window u as (partition by user_id order by sort),
         w as (partition by user_id, key order by sort
               range between unbounded preceding and unbounded following)
) as foo
where next_key is null or key != next_key
order by user_id, sort_first;

 user_id | key | sort_first | sort_last
---------+-----+------------+-----------
       1 | foo |          1 |         6
       1 | foo |          1 |         6
       1 | bar |          3 |         3
       1 | bla |          7 |         7
       2 | bar |          1 |         1
       2 | foo |          2 |         3
       2 | bla |          4 |         4

So the question is: is this doable with a selection?
Can I use window functions for this type of grouping?
Are there other options?

I do have an alternative plan to select records into a temporary table first,
and then do updates merging two consecutive records and repeat that until
all groups are completely merged, but I'd still like to know if I miss
something regarding selection options.

best
    Morus

PS: the alternative plan is something like

select id, user_id,
         key,
         sort,
         sort as sort_last,
         lead(key) over u as next_key,
         lead(id) over u as next_id,
         lag(key) over u as prev_key
into temp table footmp
  from foo
  window u as (partition by user_id order by sort);


update footmp set sort = f2.sort, prev_key = f2.prev_key
from footmp f2
where footmp.id = f2.next_id and
      footmp.key = f2.key and
      f2.key = f2.next_key and
      ( f2.prev_key is null or f2.prev_key != f2.key );

delete from footmp
where id in (
  select id from (
    select first_value(id) over w as id,
           count(*) over w as cnt
    from footmp
    window w as ( partition by user_id, sort )
  ) as foo where cnt > 1
);

(repeat update/delete until no row is affected)

select user_id,
       key,
       sort as sort_first,
       sort_last
from footmp
order by user_id, sort_first;


pretty ugly and complicated but at least gives me what I want...

pgsql-general by date:

From: wd
Date: 04 February 2013, 11:35:57
Subject: Re: Weird explain output

From: zeljko
Date: 04 February 2013, 13:01:01
Subject: Re: Diferences between IN and EXISTS?

grouping consecutive records - Mailing list pgsql-general

Previous

Next