Thread: Shrinking TSvectors
Hi, does anyone have any pointers for shrinking tsvectors I have looked at the contents of some of these fields and they contain many details that are not needed. For example... "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)" I am not interested in keeping the numbers or urls in the indexes. Thanks, Howard.
On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com> wrote:
select strip ('asd:23');
strip
-------
'asd'
(1 row)
Hi,
does anyone have any pointers for shrinking tsvectors
I have looked at the contents of some of these fields and they contain many details that are not needed. For example...
"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)"
I am not interested in keeping the numbers or urls in the indexes.
select strip ('asd:23');
strip
-------
'asd'
(1 row)
Thanks,
Howard.
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general
On 05.04.2016 14:37, Howard News wrote: > Hi, > > does anyone have any pointers for shrinking tsvectors > > I have looked at the contents of some of these fields and they contain > many details that are not needed. For example... > > "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 > '-9972':945 '/partners/application.html':222 > '/partners/program/program-agreement.pdf':271 > '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 > '1':753,771 '12':366 '14':66 (...)" > > I am not interested in keeping the numbers or urls in the indexes. > > Thanks, > > Howard. > > Hello, You need create a new text search configuration. Here is an example of commands: CREATE TEXT SEARCH CONFIGURATION public.english_cfg ( PARSER = default ); ALTER TEXT SEARCH CONFIGURATION public.english_cfg ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, word, hword, hword_part WITH pg_catalog.english_stem; Instead of the "pg_catalog.english_stem" you can use your own dictionary. Lets compare new configuration with the embedded configuration "pg_catalog.english": postgres=# select to_tsvector('english_cfg', 'home -9972 /partners/application.html /partners/program/program-agreement.pdf'); to_tsvector ------------- 'home':1 (1 row) postgres=# select to_tsvector('english', 'home -9972 /partners/application.html /partners/program/program-agreement.pdf'); to_tsvector ----------------------------------------------------------------------------------------------- '-9972':2 '/partners/application.html':3 '/partners/program/program-agreement.pdf':4 'home':1 (1 row) You can get some additional information about configurations using \dF+: postgres=# \dF+ english Text search configuration "pg_catalog.english" Parser: "pg_catalog.default" Token | Dictionaries -----------------+-------------- asciihword | english_stem asciiword | english_stem email | simple file | simple float | simple host | simple hword | english_stem hword_asciipart | english_stem hword_numpart | simple hword_part | english_stem int | simple numhword | simple numword | simple sfloat | simple uint | simple url | simple url_path | simple version | simple word | english_stem postgres=# \dF+ english_cfg Text search configuration "public.english_cfg" Parser: "pg_catalog.default" Token | Dictionaries -----------------+-------------- asciihword | english_stem asciiword | english_stem hword | english_stem hword_asciipart | english_stem hword_part | english_stem word | english_stem -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 05/04/2016 14:44, Oleg Bartunov wrote:
Hi Oleg,On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com> wrote:Hi,
does anyone have any pointers for shrinking tsvectors
I have looked at the contents of some of these fields and they contain many details that are not needed. For example...
"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)"
I am not interested in keeping the numbers or urls in the indexes.
select strip ('asd:23');
strip
-------
'asd'
(1 row)
Is this function documented anywhere?
Howard.
On 05/04/2016 14:44, Oleg Bartunov wrote:Hi Oleg,On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com> wrote:Hi,
does anyone have any pointers for shrinking tsvectors
I have looked at the contents of some of these fields and they contain many details that are not needed. For example...
"'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 '-9972':945 '/partners/application.html':222 '/partners/program/program-agreement.pdf':271 '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 '1':753,771 '12':366 '14':66 (...)"
I am not interested in keeping the numbers or urls in the indexes.
select strip ('asd:23');
strip
-------
'asd'
(1 row)
Is this function documented anywhere?
Howard.
On 04/05/2016 07:37 AM, Howard News wrote: > > > On 05/04/2016 14:44, Oleg Bartunov wrote: >> >> >> On Tue, Apr 5, 2016 at 2:37 PM, Howard News <howardnews@selestial.com >> <mailto:howardnews@selestial.com>> wrote: >> >> Hi, >> >> does anyone have any pointers for shrinking tsvectors >> >> I have looked at the contents of some of these fields and they >> contain many details that are not needed. For example... >> >> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 >> '-873':944 '-9972':945 '/partners/application.html':222 >> '/partners/program/program-agreement.pdf':271 >> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 >> '1':753,771 '12':366 '14':66 (...)" >> >> I am not interested in keeping the numbers or urls in the indexes. >> >> >> >> select strip ('asd:23'); >> strip >> ------- >> 'asd' >> (1 row) >> >> > Hi Oleg, > > Is this function documented anywhere? http://www.postgresql.org/docs/9.5/static/functions-textsearch.html > > Howard. -- Adrian Klaver adrian.klaver@aklaver.com
On 05/04/2016 15:15, Artur Zakirov wrote: > On 05.04.2016 14:37, Howard News wrote: >> Hi, >> >> does anyone have any pointers for shrinking tsvectors >> >> I have looked at the contents of some of these fields and they contain >> many details that are not needed. For example... >> >> "'+1':935,942 '-0500':72 '-0578':932 '-0667':938 '-266':937 '-873':944 >> '-9972':945 '/partners/application.html':222 >> '/partners/program/program-agreement.pdf':271 >> '/partners/reseller.html':181,1073 '01756':50,1083 '07767':54,1087 >> '1':753,771 '12':366 '14':66 (...)" >> >> I am not interested in keeping the numbers or urls in the indexes. >> >> Thanks, >> >> Howard. >> >> > > Hello, > > You need create a new text search configuration. Here is an example of > commands: > > CREATE TEXT SEARCH CONFIGURATION public.english_cfg ( > PARSER = default > ); > ALTER TEXT SEARCH CONFIGURATION public.english_cfg > ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, > word, hword, hword_part > WITH pg_catalog.english_stem; > > Instead of the "pg_catalog.english_stem" you can use your own dictionary. > > Lets compare new configuration with the embedded configuration > "pg_catalog.english": > > postgres=# select to_tsvector('english_cfg', 'home -9972 > /partners/application.html /partners/program/program-agreement.pdf'); > to_tsvector > ------------- > 'home':1 > (1 row) > > postgres=# select to_tsvector('english', 'home -9972 > /partners/application.html /partners/program/program-agreement.pdf'); > to_tsvector > ----------------------------------------------------------------------------------------------- > > '-9972':2 '/partners/application.html':3 > '/partners/program/program-agreement.pdf':4 'home':1 > (1 row) > > > You can get some additional information about configurations using \dF+: > > postgres=# \dF+ english > Text search configuration "pg_catalog.english" > Parser: "pg_catalog.default" > Token | Dictionaries > -----------------+-------------- > asciihword | english_stem > asciiword | english_stem > email | simple > file | simple > float | simple > host | simple > hword | english_stem > hword_asciipart | english_stem > hword_numpart | simple > hword_part | english_stem > int | simple > numhword | simple > numword | simple > sfloat | simple > uint | simple > url | simple > url_path | simple > version | simple > word | english_stem > > postgres=# \dF+ english_cfg > Text search configuration "public.english_cfg" > Parser: "pg_catalog.default" > Token | Dictionaries > -----------------+-------------- > asciihword | english_stem > asciiword | english_stem > hword | english_stem > hword_asciipart | english_stem > hword_part | english_stem > word | english_stem > Thanks Artur, Thats amazing! Postgres never ceases to amaze me. And the same goes for the contributors to this list.