Thread: Guidance needed on an alternative take on common prefix SQL
Hi, I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given 1234 showme the longest match"). I'm in need of a bit of guidance on how best to implement an alternative take. Frankly I don't quite know where to startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on. So, without further ado, here's the scenario: Given an SQL filtering query output that includes the following column: 87973891 87973970 87973971 87973972 87973973 87973975 87973976 87973977 87973978 87973979 8797400 The final output should be further filtered down to: 87973891 8797397 8797400 i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e. in thiscase, when $last_digit[0-9] is removed, 8797397 is the same). So, coming back to the example above: 8797397[0-9] is present so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the preceedingdigit. The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9]. Hope this question makes sense ! Laura
On 8/6/19 6:25 PM, Laura Smith wrote: > Hi, > > I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given 1234 showme the longest match"). > > I'm in need of a bit of guidance on how best to implement an alternative take. Frankly I don't quite know where to startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on. > > So, without further ado, here's the scenario: > > Given an SQL filtering query output that includes the following column: > 87973891 > 87973970 > 87973971 > 87973972 > 87973973 > 87973975 > 87973976 > 87973977 > 87973978 > 87973979 > 8797400 > > The final output should be further filtered down to: > 87973891 > 8797397 > 8797400 > > i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e. in thiscase, when $last_digit[0-9] is removed, 8797397 is the same). > > So, coming back to the example above: > 8797397[0-9] is present > so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the preceedingdigit. > > The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9]. > > Hope this question makes sense ! > > Laura > > Hows this? select distinct case cc when 1 then num else left(num,-1) end from ( select num, (select count(*) as cc from numbers n2 where left(n2.num, -1) = left(numbers.num, -1)) from numbers ) as tmpx ; -Andy
On Wednesday, August 7, 2019 2:01 AM, Andy Colson <andy@squeakycode.net> wrote: > On 8/6/19 6:25 PM, Laura Smith wrote: > > > Hi, > > I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given 1234show me the longest match"). > > I'm in need of a bit of guidance on how best to implement an alternative take. Frankly I don't quite know where to startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on. > > So, without further ado, here's the scenario: > > Given an SQL filtering query output that includes the following column: > > 87973891 > > 87973970 > > 87973971 > > 87973972 > > 87973973 > > 87973975 > > 87973976 > > 87973977 > > 87973978 > > 87973979 > > 8797400 > > The final output should be further filtered down to: > > 87973891 > > 8797397 > > 8797400 > > i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e. inthis case, when $last_digit[0-9] is removed, 8797397 is the same). > > So, coming back to the example above: > > 8797397[0-9] is present > > so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the preceedingdigit. > > The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9]. > > Hope this question makes sense ! > > Laura > > Hows this? > > select distinct > case cc > when 1 then num > else left(num,-1) > end > from ( > select > num, > (select count(*) as cc from numbers n2 where left(n2.num, -1) = left(numbers.num, -1)) > from numbers > ) as tmpx ; > > -Andy Hi Andy, That looks supremely clever ! I have just done a quick test and looks like it works as intended. Will do some more thorough testing with a larger datasetin due course. Thank you very much indeed Laura
On 8/7/19 3:36 AM, Laura Smith wrote: > On Wednesday, August 7, 2019 2:01 AM, Andy Colson <andy@squeakycode.net> wrote: > >> On 8/6/19 6:25 PM, Laura Smith wrote: >> >>> Hi, >>> I've seen various Postgres examples here and elsewhere that deal with the old common-prefix problem (i.e. "given 1234show me the longest match"). >>> I'm in need of a bit of guidance on how best to implement an alternative take. Frankly I don't quite know where to startbut I'm guessing it will probably involve CTEs, which is an area I'm very weak on. >>> So, without further ado, here's the scenario: >>> Given an SQL filtering query output that includes the following column: >>> 87973891 >>> 87973970 >>> 87973971 >>> 87973972 >>> 87973973 >>> 87973975 >>> 87973976 >>> 87973977 >>> 87973978 >>> 87973979 >>> 8797400 >>> The final output should be further filtered down to: >>> 87973891 >>> 8797397 >>> 8797400 >>> i.e. if $last_digit is present 0–9 inclusive, recursively filter until the remaining string is all the same (i.e. inthis case, when $last_digit[0-9] is removed, 8797397 is the same). >>> So, coming back to the example above: >>> 8797397[0-9] is present >>> so the "nearest common" I would be looking for is 8797397 because once [0-9] is removed, the 7 is the same on the preceedingdigit. >>> The other two rows ( 87973891 and 8797400) are left untouched because $last_digit is not present in [0-9]. >>> Hope this question makes sense ! >>> Laura >> Hows this? >> >> select distinct >> case cc >> when 1 then num >> else left(num,-1) >> end >> from ( >> select >> num, >> (select count(*) as cc from numbers n2 where left(n2.num, -1) = left(numbers.num, -1)) >> from numbers >> ) as tmpx ; >> >> -Andy > > > Hi Andy, > > That looks supremely clever ! > > I have just done a quick test and looks like it works as intended. Will do some more thorough testing with a larger datasetin due course. > > Thank you very much indeed > > Laura > > If the target field is really an integer type and you have lots of rows you might be better off with arithmetic functions. create table short as select id/10 as base, array_agg(mod(id,10)) as odds from head group by base; select * from short; base | odds ---------+--------------------- 879740 | {0} 8797389 | {1} 8797397 | {0,1,2,3,5,6,7,8,9} (3 rows) select case when array_length(odds,1) = 1 then 10*base + odds[1] else base end from short; base ---------- 8797400 87973891 8797397 (3 rows)