Thread: Guidance needed on an alternative take on common prefix SQL

Guidance needed on an alternative take on common prefix SQL

From

Laura Smith

Date:

06 August 2019, 23:25:09

Hi,

I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given 1234
showme the longest match"). 

I'm in need of a bit of guidance on how best to implement an alternative take.  Frankly I don't quite know where to
startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on. 

So, without further ado, here's the scenario:

Given an SQL filtering query output that includes the following column:
87973891
87973970
87973971
87973972
87973973
87973975
87973976
87973977
87973978
87973979
8797400

The final output should be further filtered down to:
87973891
8797397
8797400

i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e. in
thiscase, when $last_digit[0-9] is removed, 8797397 is the same). 

So, coming back to the example above:
8797397[0-9] is present
so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the
preceedingdigit. 

The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9].

Hope this question makes sense !

Laura

Re: Guidance needed on an alternative take on common prefix SQL

From

Andy Colson

Date:

07 August 2019, 01:01:54

On 8/6/19 6:25 PM, Laura Smith wrote:
> Hi,
> 
> I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given 1234
showme the longest match").
 
> 
> I'm in need of a bit of guidance on how best to implement an alternative take.  Frankly I don't quite know where to
startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on.
 
> 
> So, without further ado, here's the scenario:
> 
> Given an SQL filtering query output that includes the following column:
> 87973891
> 87973970
> 87973971
> 87973972
> 87973973
> 87973975
> 87973976
> 87973977
> 87973978
> 87973979
> 8797400
> 
> The final output should be further filtered down to:
> 87973891
> 8797397
> 8797400
> 
> i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e. in
thiscase, when $last_digit[0-9] is removed, 8797397 is the same).
 
> 
> So, coming back to the example above:
> 8797397[0-9] is present
> so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the
preceedingdigit.
 
> 
> The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9].
> 
> Hope this question makes sense !
> 
> Laura
> 
> 


Hows this?

select distinct
   case cc
      when 1 then num
      else left(num,-1)
   end
from (
    select
       num,
       (select count(*) as cc from numbers n2 where left(n2.num, -1) = left(numbers.num, -1))
    from numbers
) as tmpx ;


-Andy

Re: Guidance needed on an alternative take on common prefix SQL

From

Laura Smith

Date:

07 August 2019, 09:36:26

On Wednesday, August 7, 2019 2:01 AM, Andy Colson <andy@squeakycode.net> wrote:

> On 8/6/19 6:25 PM, Laura Smith wrote:
>
> > Hi,
> > I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given
1234show me the longest match"). 
> > I'm in need of a bit of guidance on how best to implement an alternative take. Frankly I don't quite know where to
startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on. 
> > So, without further ado, here's the scenario:
> > Given an SQL filtering query output that includes the following column:
> > 87973891
> > 87973970
> > 87973971
> > 87973972
> > 87973973
> > 87973975
> > 87973976
> > 87973977
> > 87973978
> > 87973979
> > 8797400
> > The final output should be further filtered down to:
> > 87973891
> > 8797397
> > 8797400
> > i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e.
inthis case, when $last_digit[0-9] is removed, 8797397 is the same). 
> > So, coming back to the example above:
> > 8797397[0-9] is present
> > so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the
preceedingdigit. 
> > The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9].
> > Hope this question makes sense !
> > Laura
>
> Hows this?
>
> select distinct
> case cc
> when 1 then num
> else left(num,-1)
> end
> from (
> select
> num,
> (select count(*) as cc from numbers n2 where left(n2.num, -1) = left(numbers.num, -1))
> from numbers
> ) as tmpx ;
>
> -Andy



Hi Andy,

That looks supremely clever !

I have just done a quick test and looks like it works as intended. Will do some more thorough testing with a larger
datasetin due course. 

Thank you very much indeed

Laura

Re: Guidance needed on an alternative take on common prefix SQL

From

Rob Sargent

Date:

08 August 2019, 00:36:35

On 8/7/19 3:36 AM, Laura Smith wrote:
> On Wednesday, August 7, 2019 2:01 AM, Andy Colson <andy@squeakycode.net> wrote:
>
>> On 8/6/19 6:25 PM, Laura Smith wrote:
>>
>>> Hi,
>>> I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given
1234show me the longest match").
 
>>> I'm in need of a bit of guidance on how best to implement an alternative take. Frankly I don't quite know where to
startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on.
 
>>> So, without further ado, here's the scenario:
>>> Given an SQL filtering query output that includes the following column:
>>> 87973891
>>> 87973970
>>> 87973971
>>> 87973972
>>> 87973973
>>> 87973975
>>> 87973976
>>> 87973977
>>> 87973978
>>> 87973979
>>> 8797400
>>> The final output should be further filtered down to:
>>> 87973891
>>> 8797397
>>> 8797400
>>> i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e.
inthis case, when $last_digit[0-9] is removed, 8797397 is the same).
 
>>> So, coming back to the example above:
>>> 8797397[0-9] is present
>>> so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the
preceedingdigit.
 
>>> The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9].
>>> Hope this question makes sense !
>>> Laura
>> Hows this?
>>
>> select distinct
>> case cc
>> when 1 then num
>> else left(num,-1)
>> end
>> from (
>> select
>> num,
>> (select count(*) as cc from numbers n2 where left(n2.num, -1) = left(numbers.num, -1))
>> from numbers
>> ) as tmpx ;
>>
>> -Andy
>
>
> Hi Andy,
>
> That looks supremely clever !
>
> I have just done a quick test and looks like it works as intended. Will do some more thorough testing with a larger
datasetin due course.
 
>
> Thank you very much indeed
>
> Laura
>
>

If the target field is really an integer type and you have lots of rows 
you might be better off with arithmetic functions.


create table short as select id/10 as base, array_agg(mod(id,10)) as 
odds from head group by base;

select * from short;
   base   |        odds
---------+---------------------
   879740 | {0}
  8797389 | {1}
  8797397 | {0,1,2,3,5,6,7,8,9}
(3 rows)

  select case when array_length(odds,1) = 1 then 10*base + odds[1] else 
base end from short;
    base
----------
   8797400
  87973891
   8797397
(3 rows)