Thread: Detecting repeated phrase in a string
Does anyone know how to detect repeated phrase in a string?
Is there any such function?
Regards,
David
On 2021-12-09 12:38:15 +0000, Shaozhong SHI wrote: > Does anyone know how to detect repeated phrase in a string? Use regular expressions with backreferences: bayes=> select regexp_match('foo wikiwiki bar', '(.+)\1'); ╔══════════════╗ ║ regexp_match ║ ╟──────────────╢ ║ {o} ║ ╚══════════════╝ (1 row) "o" is repeated in "foo". bayes=> select regexp_match('fo wikiwiki bar', '(.+)\1'); ╔══════════════╗ ║ regexp_match ║ ╟──────────────╢ ║ {wiki} ║ ╚══════════════╝ (1 row) "wiki" is repeated in "wikiwiki". bayes=> select regexp_match('fo wikiwi bar', '(.+)\1'); ╔══════════════╗ ║ regexp_match ║ ╟──────────────╢ ║ (∅) ║ ╚══════════════╝ (1 row) nothing is repeated. Adjust the expression within parentheses if you want to match somethig more specific than any sequence of one or more characters. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment
Hi, Peter,
How to define word boundary as either by using
^ , space, or $
So that the following can be done
fox fox is a repeat
foxfox is not a repeat but just one word.
Regards,
David
On Thu, 9 Dec 2021 at 13:35, Peter J. Holzer <hjp-pgsql@hjp.at> wrote:
On 2021-12-09 12:38:15 +0000, Shaozhong SHI wrote:
> Does anyone know how to detect repeated phrase in a string?
Use regular expressions with backreferences:
bayes=> select regexp_match('foo wikiwiki bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ {o} ║
╚══════════════╝
(1 row)
"o" is repeated in "foo".
bayes=> select regexp_match('fo wikiwiki bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ {wiki} ║
╚══════════════╝
(1 row)
"wiki" is repeated in "wikiwiki".
bayes=> select regexp_match('fo wikiwi bar', '(.+)\1');
╔══════════════╗
║ regexp_match ║
╟──────────────╢
║ (∅) ║
╚══════════════╝
(1 row)
nothing is repeated.
Adjust the expression within parentheses if you want to match somethig
more specific than any sequence of one or more characters.
hp
--
_ | Peter J. Holzer | Story must make more sense than reality.
|_|_) | |
| | | hjp@hjp.at | -- Charles Stross, "Creative writing
__/ | http://www.hjp.at/ | challenge!"
På torsdag 09. desember 2021 kl. 15:46:05, skrev Shaozhong SHI <shishaozhong@gmail.com>:
Hi, Peter,How to define word boundary as either by using^ , space, or $So that the following can be donefox fox is a repeatfoxfox is not a repeat but just one word.
Do you want repeated phrase (list of words) ore repeated words?
For repeated words (including unicode-chars) you can do:
(\b\p{L}+\b)(?:\s+\1)+
I'm not quite sure how to translate this to PG, but in JAVA it works.
--
Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
Attachment
On 2021-12-09 16:11:31 +0100, Andreas Joseph Krogh wrote: > For repeated words (including unicode-chars) you can do: > > (\b\p{L}+\b)(?:\s+\1)+ > > I'm not quite sure how to translate this to PG, but in JAVA it works. See https://www.postgresql.org/docs/11/functions-matching.html#POSIX-CONSTRAINT-ESCAPES-TABLE hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"