Re: Query to find contiguous ranges on a column - Mailing list pgsql-general

From Thomas Kellerer
Subject Re: Query to find contiguous ranges on a column
Date
Msg-id hb2uqn$8qj$1@ger.gmane.org
Whole thread Raw
In response to Query to find contiguous ranges on a column  (Peter Hunsberger <peter.hunsberger@gmail.com>)
List pgsql-general
Peter Hunsberger wrote on 13.10.2009 23:23:
> I need a query to find the contiguous ranges within this column, in
> this case returning the result set:
>
> start, end
> 2, 5
> 11, 19
> 23, 23
> 32, 37
>
> I have one solution that joins the table against itself and does
> (among other things) a subselect looking "not exists col +1" and "not
> exists col -1" on the two instances of the table to find the start and
> end.  This is, as you might guess, is not very efficient (my actual
> data is some 6 million+ rows) and I'm guessing there has to be
> something more efficient with windowing or possibly grouping on min
> and max (though I can't see how to make sure they are part of a
> contiguous set).  Anyone have any ideas?

This is the best I can put together right now.

Not very nice, but currently I can't think of a better solution:

select *
from
(
  select soi as start_of_interval,
         case
           when soi is not null and eoi is null then lead(eoi) over()
           when soi is not null and eoi is not null then eoi
           else null
         end as end_of_interval
  from (
      select case
               when col - (lag(col,1,col + 1) over (order by col)) - 1 <> 0 then col
               else null
             end as soi,
             case
               when col - (lead(col,1,col + 1) over (order by col)) + 1 <> 0 then col
               else null
             end as eoi
      from numbers
  ) t1
  where t1.soi is not null
     or t1.eoi is not null
) t2
where t2.start_of_interval is not null
  and t2.end_of_interval is not null;

The outer-most select is needed to get rid of the "empty" rows. I couldn't find a way to push that into one of the
sub-queries.

The execution plan doesn't look too bad (probably better than your plan with a self join and a subselect), but it does
sortthe whole table which might be a problem.  

Regards
Thomas





pgsql-general by date:

Previous
From: Tim Landscheidt
Date:
Subject: Re: Query to find contiguous ranges on a column
Next
From: Tim Landscheidt
Date:
Subject: Re: Procedure for feature requests?