Re: Counting the number of repeated phrases in a column - Mailing list pgsql-general

From Shaozhong SHI
Subject Re: Counting the number of repeated phrases in a column
Date
Msg-id CA+i5JwYRPVhZcpEJ2r2WQEL7t8tPE_4zVeTckU+gDgnrt_9wyw@mail.gmail.com
Whole thread Raw
In response to Counting the number of repeated phrases in a column  (Shaozhong SHI <shishaozhong@gmail.com>)
Responses Counting the number of repeated phrases in a column  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: Counting the number of repeated phrases in a column  (Rob Sargent <robjsargent@gmail.com>)
Re: Counting the number of repeated phrases in a column  (Karsten Hilbert <Karsten.Hilbert@gmx.net>)
List pgsql-general


On Tue, 25 Jan 2022 at 17:10, Shaozhong SHI <shishaozhong@gmail.com> wrote:
There is a short of a function in the standard Postgres to do the following:

It is easy to count the number of occurrence of words, but it is rather difficult to count the number of occurrence of phrases.

For instance:

A cell of value:  'Hello World' means 1 occurrence a phrase.

A cell of value: 'Hello World World Hello' means no occurrence of any repeated phrase.

But, A cell of value: 'Hello World World Hello Hello World' means 2 occurrences of 'Hello World'.

'The City of London, London' also has no occurrences of any repeated phrase.

Anyone has got such a function to check out the number of occurrence of any repeated phrases?

Regards,

David

Hi, All Friends,

Whatever.   Can we try to build a regex for   'The City of London London Great London UK ' ?

It could be something like '[\w\s]+[\s-]+[a-z]+[\s-][\s\w]+'.   [\s-]+[a-z]+[\s-] is catered for some people think that 'City of London' is 'City-of-London' or 'City-of-London'.

Regards,

David

pgsql-general by date:

Previous
From: Mladen Gogala
Date:
Subject: Could not serialize access due to concurrent update
Next
From: "David G. Johnston"
Date:
Subject: Counting the number of repeated phrases in a column