Thread: finding columns that have three or fewer distinct characters

finding columns that have three or fewer distinct characters

From
Jeff Frost
Date:
I've got an interesting one...I'm trying to find columns that have three or 
fewer distinct characters (for example, "aaaaaaaaaa").  Wondering if I need to 
write a function or if someone has an idea how to do it with built in 
functions and/or pattern matching?

I think the thing to do would be to lowercase everything, then remove all 
duplicate chars and spaces, then use length() on that, but it's not obvious to 
me how I might remove the duplicate chars with the pattern matching support 
in the docs.

-- 
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954


Re: finding columns that have three or fewer distinct characters

From
Colin Wetherbee
Date:
Jeff Frost wrote:
> I've got an interesting one...I'm trying to find columns that have three 
> or fewer distinct characters (for example, "aaaaaaaaaa").  Wondering if 
> I need to write a function or if someone has an idea how to do it with 
> built in functions and/or pattern matching?
> 
> I think the thing to do would be to lowercase everything, then remove 
> all duplicate chars and spaces, then use length() on that, but it's not 
> obvious to me how I might remove the duplicate chars with the pattern 
> matching support in the docs.

It's interesting, indeed.

Here's how you might do it with a PL/Perl function. :)


CREATE OR REPLACE FUNCTION remove_duplicates(TEXT) RETURNS TEXT AS
$$  my ($text) = @_;  while ($text =~ s/(.)(.*)\1/$1$2/g != 0) {};  return $text;
$$ LANGUAGE plperl;


cww=# SELECT 
remove_duplicates('ffffoooooooobbbbaaaarrrr.!@#$.foobar-baz-qux'); remove_duplicates
------------------- fobar.!@#$-zqux
(1 row)


Colin


Re: finding columns that have three or fewer distinct characters

From
Osvaldo Rosario Kussama
Date:
Jeff Frost escreveu:
> I've got an interesting one...I'm trying to find columns that have three 
> or fewer distinct characters (for example, "aaaaaaaaaa").  Wondering if 
> I need to write a function or if someone has an idea how to do it with 
> built in functions and/or pattern matching?
> 
> I think the thing to do would be to lowercase everything, then remove 
> all duplicate chars and spaces, then use length() on that, but it's not 
> obvious to me how I might remove the duplicate chars with the pattern 
> matching support in the docs.
> 

Try:
SELECT  array_to_string(array(    SELECT DISTINCT lower(substr('aabbcdddef  AB',i,1))           FROM
generate_series(1,length('aabbcdddef AB'))                s(i))                  , '');
 

Osvaldo


Re: finding columns that have three or fewer distinct characters

From
Jeff Frost
Date:
On Wed, 5 Mar 2008, Osvaldo Rosario Kussama wrote:

> Jeff Frost escreveu:
>> I've got an interesting one...I'm trying to find columns that have three or 
>> fewer distinct characters (for example, "aaaaaaaaaa").  Wondering if I need 
>> to write a function or if someone has an idea how to do it with built in 
>> functions and/or pattern matching?
>> 
>> I think the thing to do would be to lowercase everything, then remove all 
>> duplicate chars and spaces, then use length() on that, but it's not obvious 
>> to me how I might remove the duplicate chars with the pattern matching 
>> support in the docs.
>> 
>
> Try:
> SELECT
>  array_to_string(array(
>    SELECT DISTINCT lower(substr('aabbcdddef  AB',i,1))
>           FROM generate_series(1,length('aabbcdddef  AB'))
>                s(i))
>                  , '');
>

Osvaldo, that appears to work perfectly!  Thank you much!

-- 
Jeff Frost, Owner     <jeff@frostconsultingllc.com>
Frost Consulting, LLC     http://www.frostconsultingllc.com/
Phone: 650-780-7908    FAX: 650-649-1954