Justin Pryzby <pryzby@telsasoft.com> wrote:
> Resending to -hackers as I realized this isn't a documentation issue so not
> appropriate or apparently interesting to readers of -doc.
>
> Inspired by David's patch [0], find attached fixing words duplicated, across
> line boundaries.
>
> I should probably just call the algorithm proprietary, but if you really wanted to know, I've suffered again through
sed'sblack/slashes.
>
> time find . -name '*.c' -o -name '*.h' |xargs sed -srn '/\/\*/!d; :l; /\*\//!{N; b l}; s/\n[[:space:]]*\*/\n/g;
/(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d;s//>>&<</; p'
>
> Alternately:
> time for f in `find . -name '*.c' -o -name '*.h'`; do x=`<"$f" sed -rn '/\/\*/!d; :l; /\*\//!{N; b l};
s/\n[[:space:]]*\*/\n/g;/(\<[[:alpha:]]{1,})\>\n[[:space:]]*\<\1\>/!d; s//>>&<</; p'`; [ -n "$x" ] && echo "$f:" &&
echo"$x"; done |less
Alternatively you could have used awk as it can maintain variables across
lines. This is a script that I used to find those duplicates in a single file
(Just out of fun, I know that your findings have already been processed.):
BEGIN{prev_line_last_token = NULL}
{
if (NF > 1 && $1 == "*" && length(prev_line_last_token) > 0)
{
if ($2 == prev_line_last_token &&
# Characters used in ASCII charts are not duplicate words.
$2 != "|" && $2 != "}")
# Found a duplicate.
printf("%s:%s, duplicate token: %s\n", FILENAME, FNR, $2);
}
if (NF > 1 && ($1 == "*" || $1 == "/*"))
prev_line_last_token = $NF;
else
{
# Empty line or not a comment line. Start a new search.
prev_line_last_token = NULL;
}
}
--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26, A-2700 Wiener Neustadt
Web: https://www.cybertec-postgresql.com