Thread: regexp_replace to remove sql comments

regexp_replace to remove sql comments

From
Mike
Date:
Hi,

I am trying to clean up the query field returned by the
pg_stat_statements extension and remove all comments.

Some of the queries in the query field contain comments like '-- some
comment' and also '/* c style comments */'

I have managed to strip off the '--' comments and also white space but
after trying numerous regex for this via google but I am stuck.

WITH to_clean AS (
   SELECT
     regexp_replace(
        regexp_replace(trim(query), '--[^\r\n]*', '')  --clear up
comments like this one <-- this is ok
     , '\s+', ' ', 'g') as q    --clear up white space  <-- this is ok
   FROM public.pg_stat_statements
   WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
current_database())
)

SELECT  regexp_replace(q,'/\*.*\*/','') as q  /* strip off comments like
this */ <-- cannot get a regex to do this
FROM to_clean ORDER BY q


Im now thinking it may be better to do in a pgsql function as I think if
the comments are in queries then they need to be ignored.

Has anyone done anything like this?

Thanks,

Mike.



Re: regexp_replace to remove sql comments

From
Marc Mamin
Date:
>________________________________________
>Von: pgsql-general-owner@postgresql.org [pgsql-general-owner@postgresql.org]" im Auftrag von "Mike
[mike@wolman.co.uk]
>Gesendet: Mittwoch, 28. Oktober 2015 20:04
>An: pgsql-general@postgresql.org
>Betreff: [GENERAL] regexp_replace to remove sql comments
>
>Hi,
>
>I am trying to clean up the query field returned by the
>pg_stat_statements extension and remove all comments.
>
>Some of the queries in the query field contain comments like '-- some
>comment' and also '/* c style comments */'
>
>I have managed to strip off the '--' comments and also white space but
>after trying numerous regex for this via google but I am stuck.
>
>WITH to_clean AS (
>   SELECT
>     regexp_replace(
>        regexp_replace(trim(query), '--[^\r\n]*', '')  --clear up
>comments like this one <-- this is ok
>     , '\s+', ' ', 'g') as q    --clear up white space  <-- this is ok
>   FROM public.pg_stat_statements
>   WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
>current_database())
>)
>
>SELECT  regexp_replace(q,'/\*.*\*/','') as q  /* strip off comments like
>this */ <-- cannot get a regex to do this
>FROM to_clean ORDER BY q

Hi,
Does this help ?

select regexp_replace(' aaa /*
x
y
z
*/ foo', '\/\*.+\*\/','','g'):

regards,

Marc Mamin

>
>Im now thinking it may be better to do in a pgsql function as I think if
>the comments are in queries then they need to be ignored.
>
>Has anyone done anything like this?
>
>Thanks,
>
>Mike.
>
>
>
>--
>Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
>To make changes to your subscription:
>http://www.postgresql.org/mailpref/pgsql-general
>

Re: regexp_replace to remove sql comments

From
Mike
Date:
Thanks with a bit of moving stuff about I think thats sorted it - in
case anyone every needs it:

   SELECT
    query,
      trim(regexp_replace(
         regexp_replace(
            regexp_replace(query,'\/\*.+\*\/','','g'),
         '--[^\r\n]*', ' ', 'g')
      , '\s+', ' ', 'g')) as q
    FROM public.pg_stat_statements
    WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
  current_database())

  order by query

Thanks again,

Mike.

On 28/10/2015 22:43, Marc Mamin wrote:
> ', '\/\*.+\*\/','','g'):



Re: regexp_replace to remove sql comments

From
Tom Lane
Date:
Mike <mike@wolman.co.uk> writes:
> Thanks with a bit of moving stuff about I think thats sorted it - in
> case anyone every needs it:
>    SELECT
>     query,
>       trim(regexp_replace(
>          regexp_replace(
>             regexp_replace(query,'\/\*.+\*\/','','g'),
>          '--[^\r\n]*', ' ', 'g')
>       , '\s+', ' ', 'g')) as q
>     FROM public.pg_stat_statements
>     WHERE dbid IN (SELECT oid FROM pg_database WHERE datname =
>   current_database())

This doesn't look too reliable from here:

1. Doesn't handle multiline /* comments.

2. Does wrong thing if more than one /* comment appears on one line.
(You could improve that by using .*? instead of .+, but then it'd
do the wrong thing with nested /* comments.)

3. Breaks things if either -- or /* appear inside a string literal,
double-quoted identifier, or $$ literal.

I'm not at all sure that it's possible to handle this requirement 100%
correctly with regexes; they're unable to do context-sensitive processing.

But so far as pg_stat_statements is concerned, why would you need to
do this at all?  The duplicate-query elimination it does should be
insensitive to comments already.

            regards, tom lane