Peter Hunsberger <peter.hunsberger@gmail.com> wrote:
> [...]
> I have one solution that joins the table against itself and does
> (among other things) a subselect looking "not exists col +1" and "not
> exists col -1" on the two instances of the table to find the start and
> end. This is, as you might guess, is not very efficient (my actual
> data is some 6 million+ rows) and I'm guessing there has to be
> something more efficient with windowing or possibly grouping on min
> and max (though I can't see how to make sure they are part of a
> contiguous set). Anyone have any ideas?
You can either use a PL/pgSQL function ("SETOF TEXT" just
for the convenience of the example):
| CREATE FUNCTION SummarizeRanges () RETURNS SETOF TEXT AS $$
| DECLARE
| CurrentFirst INT;
| CurrentLast INT;
| CurrentRecord RECORD;
| BEGIN
| FOR CurrentRecord IN SELECT col FROM t ORDER BY col LOOP
| IF CurrentFirst IS NULL THEN
| CurrentFirst := CurrentRecord.col;
| CurrentLast := CurrentRecord.col;
| ELSIF CurrentRecord.col = CurrentLast + 1 THEN
| CurrentLast := CurrentRecord.col;
| ELSE
| RETURN NEXT CurrentFirst || ', ' || CurrentLast;
| CurrentFirst := CurrentRecord.col;
| CurrentLast := CurrentRecord.col;
| END IF;
| END LOOP;
| IF CurrentFirst IS NOT NULL THEN
| RETURN NEXT CurrentFirst || ', ' || CurrentLast;
| END IF;
| RETURN;
| END;
| $$ LANGUAGE plpgsql;
or a recursive query (which I always find very hard to com-
prehend):
| WITH RECURSIVE RecCols (LeftBoundary, Value) AS
| (SELECT col, col FROM t WHERE (col - 1) NOT IN (SELECT col FROM t)
| UNION ALL SELECT p.LeftBoundary, c.col FROM RecCols AS p, t AS c WHERE c.col = p.Value + 1)
| SELECT LeftBoundary, MAX(Value) AS RightBoundary FROM RecCols
| GROUP BY LeftBoundary
| ORDER BY LeftBoundary;
Could you run both against your data set and find out which
one is faster for your six million rows?
Tim