Thread: Method to detect certain characters in column?
So I have a column that contains usernames that have characters such as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names with non A-Za-z0-9? Thanks in advance! - Ian
Ian Meyer wrote: > So I have a column that contains usernames that have characters such > as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names > with non A-Za-z0-9? ...WHERE col ~ '[^a-zA-Z0-9]'; Someone with a bit more regex fu can probably condense down the regex. -- Tommy Gildseth
use this regular expression ~ regular expression - case insensitive ~* example SELECT name FROM table1 where name ~* '*Ã*' check here http://www.postgresql.org/docs/current/static/functions-matching.html --- El lun 23-jun-08, Ian Meyer <ianmmeyer@gmail.com> escribió: De: Ian Meyer <ianmmeyer@gmail.com> |
Antonio Perez wrote: > example > > SELECT name FROM table1 where name > ~* '*Ã*' Actually this regex is flawed. It looks like a common shell "glob" pattern (I don't know the real name of these things), which is a very different and simpler animal from a regex. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
"Ian Meyer" <ianmmeyer@gmail.com> writes: > So I have a column that contains usernames that have characters such > as �(c)(R), for example: fuch�(c)r.. is there any way to find names > with non A-Za-z0-9? Hmm, none of the responses so far look right to me. How about WHERE NOT (col ~ '^[A-Za-z0-9]*$') regards, tom lane
On Jun 23, 2008, at 1:58 PM, Ian Meyer wrote: > So I have a column that contains usernames that have characters such > as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names > with non A-Za-z0-9? ... WHERE column ~* '[^a-z0-9]' Cheers, Steve
Ah, so I forgot to mention the one caveat to this (sorry!) was there was a ton of punctuation/spaces and other ilk.. so this is what I came up with: bco=# select name from member where not (name ~ '^[A-Za-z0-9[:punct:] ]*$'); name ---------------------- Señorita Lolita Long Pig täkäurgh blåbärsöl fuchér MkII fuchér ver2.0 Gûm-ishi Ashi Gurum kängnäve Fuchér-version 2.1 fuchÃ(c)r Thank you everyone for your help.. that looks to be the correct amount I was looking for. Ian On Mon, Jun 23, 2008 at 7:28 PM, Steve Atkins <steve@blighty.com> wrote: > > On Jun 23, 2008, at 1:58 PM, Ian Meyer wrote: > >> So I have a column that contains usernames that have characters such >> as Ã(c)(R), for example: fuchÃ(c)r.. is there any way to find names >> with non A-Za-z0-9? > > ... WHERE column ~* '[^a-z0-9]' > > Cheers, > Steve > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Hi Ian, Ian Meyer wrote: > Ah, so I forgot to mention the one caveat to this (sorry!) was there > was a ton of punctuation/spaces and other ilk.. so this is what I came > up with: > > bco=# select name from member where not (name ~ '^[A-Za-z0-9[:punct:] ]*$'); > name > ---------------------- > Señorita Lolita > Long Pig > täkäurgh > blåbärsöl > fuchér MkII > fuchér ver2.0 > Gûm-ishi Ashi Gurum > kängnäve > Fuchér-version 2.1 > fuchÃ(c)r Uh, is that really the name as it should be? To me it looks much more like UTF-8 stored in SQL-Ascii. Maybe converting it correctly would help? Cheers Tino
Attachment
That's entirely possible.. which is the reason for cleanup.. we're moving to a model where members can be queried by name, and UTF-8 isn't allowed in URLs, so we need to rename/remove users with those types of names. A lot of these members are from years ago where we were on mysql with not enough experience to sanity check everything, or do things as we're doing now. And yeah, that's how the names are. People got crafty. *shrugs* thanks again for all the help everyone! On Mon, Jun 23, 2008 at 11:54 PM, Tino Wildenhain <tino@wildenhain.de> wrote: > Hi Ian, > > Ian Meyer wrote: >> >> Ah, so I forgot to mention the one caveat to this (sorry!) was there >> was a ton of punctuation/spaces and other ilk.. so this is what I came >> up with: >> >> bco=# select name from member where not (name ~ '^[A-Za-z0-9[:punct:] >> ]*$'); >> name >> ---------------------- >> Señorita Lolita >> Long Pig >> täkäurgh >> blåbärsöl >> fuchér MkII >> fuchér ver2.0 >> Gûm-ishi Ashi Gurum >> kängnäve >> Fuchér-version 2.1 >> fuchÃ(c)r > > Uh, is that really the name as it should be? To me it > looks much more like UTF-8 stored in SQL-Ascii. Maybe > converting it correctly would help? > > Cheers > Tino > >
Ian Meyer wrote: > That's entirely possible.. which is the reason for cleanup.. we're > moving to a model where members can be queried by name, and UTF-8 > isn't allowed in URLs, so we need to rename/remove users with those > types of names. Depending on your webserver, Unicode characters should be possible in URLs. Certainly some non-7-bit characters can be represented in URLs with % encoding. On my Apache system, for example, the string: áüÜć in latin-1 encoding is represented in a URL as: %c3%a1%c3%bc%c3%9c%c4%87 If that same byte sequence is decoded as UTF-8 instead, it is: áüÜć (a-acute u-umlaut U-umlaut c-acute) In other words, you can send an arbitrary byte sequence in a URL with % escapes. How that byte sequence is decoded into a sequence of characters depends on your web server and its configuration - or, if the web server sends the % encoded URL directly to your client, on how your client chooses to interpret it. The main trouble will be ensuring that clients and the server agree on the character encoding used in the URL. -- Craig Ringer