Thread: regexp_replace to remove sql comments
Hi, I am trying to clean up the query field returned by the pg_stat_statements extension and remove all comments. Some of the queries in the query field contain comments like '-- some comment' and also '/* c style comments */' I have managed to strip off the '--' comments and also white space but after trying numerous regex for this via google but I am stuck. WITH to_clean AS ( SELECT regexp_replace( regexp_replace(trim(query), '--[^\r\n]*', '') --clear up comments like this one <-- this is ok , '\s+', ' ', 'g') as q --clear up white space <-- this is ok FROM public.pg_stat_statements WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database()) ) SELECT regexp_replace(q,'/\*.*\*/','') as q /* strip off comments like this */ <-- cannot get a regex to do this FROM to_clean ORDER BY q Im now thinking it may be better to do in a pgsql function as I think if the comments are in queries then they need to be ignored. Has anyone done anything like this? Thanks, Mike.
>________________________________________ >Von: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org]" im Auftrag von "Mike [mike@wolman.co.uk] >Gesendet: Mittwoch, 28. Oktober 2015 20:04 >An: pgsql-general@postgresql.org >Betreff: [GENERAL] regexp_replace to remove sql comments > >Hi, > >I am trying to clean up the query field returned by the >pg_stat_statements extension and remove all comments. > >Some of the queries in the query field contain comments like '-- some >comment' and also '/* c style comments */' > >I have managed to strip off the '--' comments and also white space but >after trying numerous regex for this via google but I am stuck. > >WITH to_clean AS ( > SELECT > regexp_replace( > regexp_replace(trim(query), '--[^\r\n]*', '') --clear up >comments like this one <-- this is ok > , '\s+', ' ', 'g') as q --clear up white space <-- this is ok > FROM public.pg_stat_statements > WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = >current_database()) >) > >SELECT regexp_replace(q,'/\*.*\*/','') as q /* strip off comments like >this */ <-- cannot get a regex to do this >FROM to_clean ORDER BY q Hi, Does this help ? select regexp_replace(' aaa /* x y z */ foo', '\/\*.+\*\/','','g'): regards, Marc Mamin > >Im now thinking it may be better to do in a pgsql function as I think if >the comments are in queries then they need to be ignored. > >Has anyone done anything like this? > >Thanks, > >Mike. > > > >-- >Sent via pgsql-general mailing list (pgsql-general@postgresql.org) >To make changes to your subscription: >http://www.postgresql.org/mailpref/pgsql-general >
Thanks with a bit of moving stuff about I think thats sorted it - in case anyone every needs it: SELECT query, trim(regexp_replace( regexp_replace( regexp_replace(query,'\/\*.+\*\/','','g'), '--[^\r\n]*', ' ', 'g') , '\s+', ' ', 'g')) as q FROM public.pg_stat_statements WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = current_database()) order by query Thanks again, Mike. On 28/10/2015 22:43, Marc Mamin wrote: > ', '\/\*.+\*\/','','g'):
Mike <mike@wolman.co.uk> writes: > Thanks with a bit of moving stuff about I think thats sorted it - in > case anyone every needs it: > SELECT > query, > trim(regexp_replace( > regexp_replace( > regexp_replace(query,'\/\*.+\*\/','','g'), > '--[^\r\n]*', ' ', 'g') > , '\s+', ' ', 'g')) as q > FROM public.pg_stat_statements > WHERE dbid IN (SELECT oid FROM pg_database WHERE datname = > current_database()) This doesn't look too reliable from here: 1. Doesn't handle multiline /* comments. 2. Does wrong thing if more than one /* comment appears on one line. (You could improve that by using .*? instead of .+, but then it'd do the wrong thing with nested /* comments.) 3. Breaks things if either -- or /* appear inside a string literal, double-quoted identifier, or $$ literal. I'm not at all sure that it's possible to handle this requirement 100% correctly with regexes; they're unable to do context-sensitive processing. But so far as pg_stat_statements is concerned, why would you need to do this at all? The duplicate-query elimination it does should be insensitive to comments already. regards, tom lane