Thread: Number Sequence Query

Number Sequence Query

From
John Nix
Date:
I have a field with case numbers in it that I need to find out what
numbers are left out in the sequence.  The numbers can be formatted fairly
strangely, so let me give you an example:

 case_num
----------
 020018
 020020
 020021
 02002201
 020024

I'm only really looking at the numbers that start with 02 and the number
sequence is the 4 numbers right after the 02.  As you can see by my
example, 020023 is missing.  I would like to be able to run a query to
see all the missing numbers in sequence.  I'm not even sure where to start
looking for information.  I did searches from Google and in the groups
(which normally answers my questions), but I can't find anything.  Can
someone help me with this?  Thanks...

John


Re: Number Sequence Query

From
"Devinder K Rajput"
Date:
What you could do is create another table (t_range) that has one column
containing numbers from 020000 to 029999.  Then run the query

select num
from t_range
where num not in (select case_num from t_case_num where case_num like
'num%');

what the above does is that it takes each entry from the t_range table and
checks if it exists in the t_case_num table.  The like clause is used to
take care of cases like "02002201", where there are extra numbers.  If
there are no extra numbers, you could use "where case_num=num".  The issues
with this method is that you have to create an extra table and also it will
give all numbers don't exist in the t_case_num; so if you table entries
ended at 020024, it will give you numbers between 020025 and 029999.
probably not what you want. to fix the latter, you would have recreate the
t_range table with the last value being the largest case_num.  That's
pretty painful...scratch that solution...so do it in programming by writing
a perl script or something.

regards,

Devinder Rajput
Stores Division Corporate Offices
Chicago, IL
(773) 442-6474



    
                    "John Nix"
    
                    <maximum@shreve.net>          To:     pgsql-novice@postgresql.org
    
                    Sent by:                      cc:
    
                    pgsql-novice-owner@post       Subject:     [NOVICE] Number Sequence Query
    
                    gresql.org
    

    

    
                    10/09/2002 04:14 PM
    

    

    





I have a field with case numbers in it that I need to find out what
numbers are left out in the sequence.  The numbers can be formatted fairly
strangely, so let me give you an example:

 case_num
----------
 020018
 020020
 020021
 02002201
 020024

I'm only really looking at the numbers that start with 02 and the number
sequence is the 4 numbers right after the 02.  As you can see by my
example, 020023 is missing.  I would like to be able to run a query to
see all the missing numbers in sequence.  I'm not even sure where to start
looking for information.  I did searches from Google and in the groups
(which normally answers my questions), but I can't find anything.  Can
someone help me with this?  Thanks...

John


---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster






Re: Number Sequence Query

From
Josh Berkus
Date:
John,

>  case_num
> ----------
>  020018
>  020020
>  020021
>  02002201
>  020024
>
> I'm only really looking at the numbers that start with 02 and the number
> sequence is the 4 numbers right after the 02.  As you can see by my
> example, 020023 is missing.  I would like to be able to run a query to
> see all the missing numbers in sequence.  I'm not even sure where to start
> looking for information.  I did searches from Google and in the groups
> (which normally answers my questions), but I can't find anything.  Can
> someone help me with this?  Thanks...

A classic SQL problem.   Buy Celko's "SQL for Smarties"; you won't regret it.

Setp 1: Copy the first 6 digits of each number into a temp table and convert
them to INT.  For the rest of the example, we will call that table
"case_numbers" and the colum "cnum".

Step2: Index "cnum" and VACUUM ANALYZE it.

Step 3:  Run this:

SELECT (cbefore.cnum + 1) as begin_gap, (cafter.cnum - 1) as end_gap
FROM (SELECT cnum FROM case_numbers c1
    WHERE NOT EXISTS (select cnum FROM case_numbers c2
        WHERE c2.cnum = c1.cnum +1)
    AND c1.cnum < (SELECT max(cnum) FROM case_numbers)) cbefore,
(SELECT cnum FROM case_numbers c3
    WHERE NOT EXISTS (select cnum FROM case_numbers c4
        WHERE c3.cnum = c4.cnum -1)
    AND c1.cnum > (SELECT min(cnum) FROM case_numbers)) cafter
WHERE NOT EXISTS (SELECT cnum FROM case_numbers c5
    WHERE c5.cnum BETWEEN cbefore.cnum AND cafter.cnum);

(check above for typos!  This is off-the-cuff)

This should give you a list of all gaps in the sequence, in the form of:

begin_gap        end_gap
20023            20023
20037            20041
20079            20079
etc.

What the query does is search for all sequence numbers that do not have a
number immediately following, and then all sequence numbers that do not have
a number immediately preceeding, and matches them up by testing if the gap is
continuous.

As you can imagine with all the sub-selects, it is a RAM-intensive query on
any large data set.

--
-Josh Berkus
 Aglio Database Solutions
 San Francisco