Home > mailing lists

Re: remove duplicated words in comments .. across lines - Mailing list pgsql-hackers

From	Antonin Houska
Subject	Re: remove duplicated words in comments .. across lines
Date	September 9, 2018 12:11:21
Msg-id	13012.1536484281@localhost Whole thread Raw
In response to	remove duplicated words in comments .. across lines (Justin Pryzby <pryzby@telsasoft.com>)
List	pgsql-hackers

Tree view

Justin Pryzby <pryzby@telsasoft.com> wrote:

> Resending to -hackers as I realized this isn't a documentation issue so not
> appropriate or apparently interesting to readers of -doc.
>
> Inspired by David's patch [0], find attached fixing words duplicated, across
> line boundaries.
>
> I should probably just call the algorithm proprietary, but if you really wanted to know, I've suffered again through
sed'sblack/slashes. 
>
> time find . -name '*.c' -o -name '*.h' |xargs sed -srn '/\/\*/!d; :l; /\*\//!{N; b l}; s/\n[[:space:]]*\*/\n/g;
/(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d;s//>>&<</; p' 
>
> Alternately:
> time for f in `find . -name '*.c' -o -name '*.h'`; do x=`<"$f" sed -rn '/\/\*/!d; :l; /\*\//!{N; b l};
s/\n[[:space:]]*\*/\n/g;/(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d; s//>>&<</; p'`; [ -n "$x" ] && echo "$f:" &&
echo"$x"; done |less 

Alternatively you could have used awk as it can maintain variables across
lines. This is a script that I used to find those duplicates in a single file
(Just out of fun, I know that your findings have already been processed.):

BEGIN{prev_line_last_token = NULL}
{
    if (NF > 1 && $1 == "*" && length(prev_line_last_token) > 0)
    {
    if ($2 == prev_line_last_token &&
        # Characters used in ASCII charts are not duplicate words.
        $2 != "|" && $2 != "}")
        # Found a duplicate.
        printf("%s:%s, duplicate token: %s\n", FILENAME, FNR, $2);
    }

    if (NF > 1 && ($1 == "*" || $1 == "/*"))
    prev_line_last_token = $NF;
    else
    {
    # Empty line or not a comment line. Start a new search.
        prev_line_last_token = NULL;
    }
}

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com

pgsql-hackers by date:

From: John Naylor
Date: 09 September 2018, 10:47:50
Subject: Re: pg_ugprade test failure on data set with column with defaultvalue with type bit/varbit

From: Jinhua Luo
Date: 09 September 2018, 17:16:15
Subject: Re: How to find local logical replication origin?

Re: remove duplicated words in comments .. across lines - Mailing list pgsql-hackers

Previous

Next