Thread: finding columns that have three or fewer distinct characters
I've got an interesting one...I'm trying to find columns that have three or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if I need to write a function or if someone has an idea how to do it with built in functions and/or pattern matching? I think the thing to do would be to lowercase everything, then remove all duplicate chars and spaces, then use length() on that, but it's not obvious to me how I might remove the duplicate chars with the pattern matching support in the docs. -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954
Jeff Frost wrote: > I've got an interesting one...I'm trying to find columns that have three > or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if > I need to write a function or if someone has an idea how to do it with > built in functions and/or pattern matching? > > I think the thing to do would be to lowercase everything, then remove > all duplicate chars and spaces, then use length() on that, but it's not > obvious to me how I might remove the duplicate chars with the pattern > matching support in the docs. It's interesting, indeed. Here's how you might do it with a PL/Perl function. :) CREATE OR REPLACE FUNCTION remove_duplicates(TEXT) RETURNS TEXT AS $$ my ($text) = @_; while ($text =~ s/(.)(.*)\1/$1$2/g != 0) {}; return $text; $$ LANGUAGE plperl; cww=# SELECT remove_duplicates('ffffoooooooobbbbaaaarrrr.!@#$.foobar-baz-qux'); remove_duplicates ------------------- fobar.!@#$-zqux (1 row) Colin
Jeff Frost escreveu: > I've got an interesting one...I'm trying to find columns that have three > or fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if > I need to write a function or if someone has an idea how to do it with > built in functions and/or pattern matching? > > I think the thing to do would be to lowercase everything, then remove > all duplicate chars and spaces, then use length() on that, but it's not > obvious to me how I might remove the duplicate chars with the pattern > matching support in the docs. > Try: SELECT array_to_string(array( SELECT DISTINCT lower(substr('aabbcdddef AB',i,1)) FROM generate_series(1,length('aabbcdddef AB')) s(i)) , ''); Osvaldo
On Wed, 5 Mar 2008, Osvaldo Rosario Kussama wrote: > Jeff Frost escreveu: >> I've got an interesting one...I'm trying to find columns that have three or >> fewer distinct characters (for example, "aaaaaaaaaa"). Wondering if I need >> to write a function or if someone has an idea how to do it with built in >> functions and/or pattern matching? >> >> I think the thing to do would be to lowercase everything, then remove all >> duplicate chars and spaces, then use length() on that, but it's not obvious >> to me how I might remove the duplicate chars with the pattern matching >> support in the docs. >> > > Try: > SELECT > array_to_string(array( > SELECT DISTINCT lower(substr('aabbcdddef AB',i,1)) > FROM generate_series(1,length('aabbcdddef AB')) > s(i)) > , ''); > Osvaldo, that appears to work perfectly! Thank you much! -- Jeff Frost, Owner <jeff@frostconsultingllc.com> Frost Consulting, LLC http://www.frostconsultingllc.com/ Phone: 650-780-7908 FAX: 650-649-1954