Thread: tsvector not giving expected results on one host
Under PostgreSQL 12, I have a table using tsvector to search the column pkgmessage. It looks like this (not all columns are shown). Table "public.ports" Column | Type | Collation | Nullable | Default pkgmessage | text | | | pkgmessage_textsearchable | tsvector | | | generated always as (to_tsvector('english'::regconfig,pkgmessage)) stored On several servers, it works fine, like this: freshports.devgit=# SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); port_id | element_pathname ---------+---------------------------------------------------------------------- 100421 | /ports/branches/2022Q1/dns/dnsmasq 100428 | /ports/branches/2022Q1/emulators/xsystem35 14686 | /ports/head/sysutils/lmon ... etc On the problem server, production, we get nothing. Nada. freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); port_id | element_pathname ---------+------------------ (0 rows) freshports.org=> However, ilike on the same database does find the matches: freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage ilike '%example%'; port_id | element_pathname ---------+---------------------------------------------------------------------- 34126 | /ports/head/security/pond 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 60310 | /ports/branches/2020Q4/www/gitlab-ce 38345 | /ports/head/www/gitlab ... etc Both database are SQL_ASCII with the same Collate and Ctype settings. Before composing this email, I ran "reindex table concurrently ports;" on the problem database. It did not change the results. prod, has problem List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges ----------------+----------+-----------+-------------+-------------+----------------------- freshports.org | postgres | SQL_ASCII | C | C | dev, no issues: [pg02 dan ~] % psql -l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------------------------+--------------+-----------+---------+-------+----------------------- freshports.devgit | postgres | SQL_ASCII | C | C | Any ideas as to what to search please? Oh, one difference. All the working-as-expected databases are self-hosted on FreeBSD. The problem database is on AWS RDS. -- Dan Langille dan@langille.org
On Sat, Dec 17, 2022, at 1:53 PM, Dan Langille wrote: > Under PostgreSQL 12, I have a table using tsvector to search the column > pkgmessage. > > It looks like this (not all columns are shown). > > > Table "public.ports" > Column | Type | Collation | > Nullable | Default > > pkgmessage | text | | > | > pkgmessage_textsearchable | tsvector | | > | generated always as (to_tsvector('english'::regconfig, > pkgmessage)) stored pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored I see the above should have been included as well. > > On several servers, it works fine, like this: > > freshports.devgit=# SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); > port_id | element_pathname > ---------+---------------------------------------------------------------------- > 100421 | /ports/branches/2022Q1/dns/dnsmasq > 100428 | /ports/branches/2022Q1/emulators/xsystem35 > 14686 | /ports/head/sysutils/lmon > ... etc > > On the problem server, production, we get nothing. Nada. > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('example'); > port_id | element_pathname > ---------+------------------ > (0 rows) > > freshports.org=> > > However, ilike on the same database does find the matches: > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage ilike '%example%'; > port_id | element_pathname > ---------+---------------------------------------------------------------------- > 34126 | /ports/head/security/pond > 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 > 60310 | /ports/branches/2020Q4/www/gitlab-ce > 38345 | /ports/head/www/gitlab > ... etc Let's look at that first entry on the problem database: freshports.org=> select pkgmessage_textsearchable from ports where id = 34126; pkgmessage_textsearchable ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19'loader.conf':16 'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2 'yes':20 (1 row) freshports.org=> select pkgmessage_textsearchable2 from ports where id = 34126; pkgmessage_textsearchable2 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 'use':2 'yes':20 (1 row) freshports.org=> From a database which runs this query with expected results: freshports.devgit=# select pkgmessage_textsearchable2 from ports where id = 34126; pkgmessage_textsearchable2 ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19 'loader.conf':16'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmp':32 'tmpfs':9,18,24,30,31 'use':2 'yes':20 (1 row) freshports.devgit=# select pkgmessage_textsearchable from ports where id = 34126; pkgmessage_textsearchable ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- '/tmp':32 '1':10 '2':21 '3':33 '5':17 'client':6,36 'directori':25 'exampl':27 'follow':13,38 'instruct':40 'line':14 'load':19'loader.conf':16 'mount':22,28 'place':11 'pond':3,35,41 'requir':8 'run':34 'tmpfs':9,18,24,30,31 'use':2 'yes':20 (1 row) freshports.devgit=# If I run my query with 'exampl', it finds what I expected, including 'pond' from above freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable2 @@ websearch_to_tsquery('exampl'); port_id | element_pathname ---------+---------------------------------------------------------------------- 34126 | /ports/head/security/pond 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 60310 | /ports/branches/2020Q4/www/gitlab-ce 38345 | /ports/head/www/gitlab 46842 | /ports/branches/2018Q1/mail/postfix-sasl This is the same on both hosts: freshports.org=> show default_text_search_config ; default_text_search_config ---------------------------- pg_catalog.simple -- Dan Langille dan@langille.org
"Dan Langille" <dan@langille.org> writes: > pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored That is not likely to play well with this: > freshports.org=> show default_text_search_config ; > default_text_search_config > ---------------------------- > pg_catalog.simple because "english" and "simple" will stem words differently. regression=# select websearch_to_tsquery('english', 'example'); websearch_to_tsquery ---------------------- 'exampl' (1 row) regression=# select websearch_to_tsquery('simple', 'example'); websearch_to_tsquery ---------------------- 'example' (1 row) If what is in your tsvector is 'exampl', then only the first of these will match. So IMO the question is not "why is it failing on prod?", it's "how the heck did it work on the other machine?". You won't get nice results if websearch_to_tsquery is using a different TS configuration than to_tsvector did. regards, tom lane
On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: > "Dan Langille" <dan@langille.org> writes: >> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored > > That is not likely to play well with this: > >> freshports.org=> show default_text_search_config ; >> default_text_search_config >> ---------------------------- >> pg_catalog.simple > > because "english" and "simple" will stem words differently. > > regression=# select websearch_to_tsquery('english', 'example'); > websearch_to_tsquery > ---------------------- > 'exampl' > (1 row) > > regression=# select websearch_to_tsquery('simple', 'example'); > websearch_to_tsquery > ---------------------- > 'example' > (1 row) > > If what is in your tsvector is 'exampl', then only the first of > these will match. So IMO the question is not "why is it failing > on prod?", it's "how the heck did it work on the other machine?". > You won't get nice results if websearch_to_tsquery is using a > different TS configuration than to_tsvector did. I think this shows why we are getting the results we see. Credit to ch on IRC for asking this question. The problem host: freshports.org=> select websearch_to_tsquery('example'); websearch_to_tsquery ---------------------- 'example' (1 row) The hosts on which this search works freshports.devgit=# select websearch_to_tsquery('example'); websearch_to_tsquery ---------------------- 'exampl' (1 row) On that same host: [pg02 dan ~] % sudo grep -i default_text_search_config /var/db/postgres/data12/postgresql.conf default_text_search_config = 'pg_catalog.english' -- Dan Langille dan@langille.org
On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: > On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: >> "Dan Langille" <dan@langille.org> writes: >>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored >> >> That is not likely to play well with this: >> >>> freshports.org=> show default_text_search_config ; >>> default_text_search_config >>> ---------------------------- >>> pg_catalog.simple >> >> because "english" and "simple" will stem words differently. >> >> regression=# select websearch_to_tsquery('english', 'example'); >> websearch_to_tsquery >> ---------------------- >> 'exampl' >> (1 row) >> >> regression=# select websearch_to_tsquery('simple', 'example'); >> websearch_to_tsquery >> ---------------------- >> 'example' >> (1 row) >> >> If what is in your tsvector is 'exampl', then only the first of >> these will match. So IMO the question is not "why is it failing >> on prod?", it's "how the heck did it work on the other machine?". >> You won't get nice results if websearch_to_tsquery is using a >> different TS configuration than to_tsvector did. > > I think this shows why we are getting the results we see. Credit to ch > on IRC for asking this question. > > The problem host: > > freshports.org=> select websearch_to_tsquery('example'); > websearch_to_tsquery > ---------------------- > 'example' > (1 row) Ahh, this explains the differences and as to why it works where it shouldn't? freshports.org=> select setting, source from pg_settings where name = 'default_text_search_config'; setting | source -------------------+--------- pg_catalog.simple | default (1 row) > > The hosts on which this search works > > freshports.devgit=# select websearch_to_tsquery('example'); > websearch_to_tsquery > ---------------------- > 'exampl' > (1 row) freshports.devgit=# select setting, source from pg_settings where name = 'default_text_search_config'; setting | source --------------------+-------------------- pg_catalog.english | configuration file (1 row) At least now I know what I can play with to get all hosts in sync. -- Dan Langille dan@langille.org
On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote: > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: >>> "Dan Langille" <dan@langille.org> writes: >>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored >>> >>> That is not likely to play well with this: >>> >>>> freshports.org=> show default_text_search_config ; >>>> default_text_search_config >>>> ---------------------------- >>>> pg_catalog.simple >>> >>> because "english" and "simple" will stem words differently. >>> >>> regression=# select websearch_to_tsquery('english', 'example'); >>> websearch_to_tsquery >>> ---------------------- >>> 'exampl' >>> (1 row) >>> >>> regression=# select websearch_to_tsquery('simple', 'example'); >>> websearch_to_tsquery >>> ---------------------- >>> 'example' >>> (1 row) >>> >>> If what is in your tsvector is 'exampl', then only the first of >>> these will match. So IMO the question is not "why is it failing >>> on prod?", it's "how the heck did it work on the other machine?". >>> You won't get nice results if websearch_to_tsquery is using a >>> different TS configuration than to_tsvector did. >> >> I think this shows why we are getting the results we see. Credit to ch >> on IRC for asking this question. >> >> The problem host: >> >> freshports.org=> select websearch_to_tsquery('example'); >> websearch_to_tsquery >> ---------------------- >> 'example' >> (1 row) > > Ahh, this explains the differences and as to why it works where it shouldn't? > > freshports.org=> select setting, source from pg_settings where name = > 'default_text_search_config'; > setting | source > -------------------+--------- > pg_catalog.simple | default > (1 row) > > >> >> The hosts on which this search works >> >> freshports.devgit=# select websearch_to_tsquery('example'); >> websearch_to_tsquery >> ---------------------- >> 'exampl' >> (1 row) > > > freshports.devgit=# select setting, source from pg_settings where name > = 'default_text_search_config'; > setting | source > --------------------+-------------------- > pg_catalog.english | configuration file > (1 row) > > > At least now I know what I can play with to get all hosts in sync. Here we go, on the problem database, create a new field, based on simple, not english. ALTER TABLE public.ports ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage,'/'::text, ' '::text))) stored; Index it: CREATE INDEX ports_pkgmessage_textsearchable3_idx ON public.ports USING gin (pkgmessage_textsearchable3) TABLESPACE pg_default; CREATE INDEX query it: freshports.org=> SELECT id as port_id, element_pathname(element_id) FROM ports WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example'); port_id | element_pathname ---------+---------------------------------------------------------------------- 34126 | /ports/head/security/pond 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 60310 | /ports/branches/2020Q4/www/gitlab-ce 38345 | /ports/head/www/gitlab 46842 | /ports/branches/2018Q1/mail/postfix-sasl 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod Success. Thank you Mr Lane. -- Dan Langille dan@langille.org
Dan, it's always good to specify configuration name in a query to avoid recheck, since websearch_to_tsquery(regconfig, text) is immutable, while websearch_to_tsquery(text) is stable. See the difference: [local]:5433 oleg@oleg=# explain (analyze,costs off) select title from apod where websearch_to_tsquery('simple','galaxies') @@ fts; QUERY PLAN --------------------------------------------------------------------------------------- Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1) Recheck Cond: ('''galaxies'''::tsquery @@ fts) -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (fts @@ '''galaxies'''::tsquery) Planning Time: 0.134 ms Execution Time: 0.022 ms (6 rows) Time: 0.369 ms [local]:5433 oleg@oleg=# explain (analyze,costs off) select title from apod where websearch_to_tsquery('galaxies') @@ fts; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1) Filter: (websearch_to_tsquery('galaxies'::text) @@ fts) Heap Blocks: exact=276 -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059 rows=493 loops=1) Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text)) Planning Time: 0.125 ms Execution Time: 1.518 ms (7 rows) On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@langille.org> wrote: > > On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote: > > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: > >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: > >>> "Dan Langille" <dan@langille.org> writes: > >>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored > >>> > >>> That is not likely to play well with this: > >>> > >>>> freshports.org=> show default_text_search_config ; > >>>> default_text_search_config > >>>> ---------------------------- > >>>> pg_catalog.simple > >>> > >>> because "english" and "simple" will stem words differently. > >>> > >>> regression=# select websearch_to_tsquery('english', 'example'); > >>> websearch_to_tsquery > >>> ---------------------- > >>> 'exampl' > >>> (1 row) > >>> > >>> regression=# select websearch_to_tsquery('simple', 'example'); > >>> websearch_to_tsquery > >>> ---------------------- > >>> 'example' > >>> (1 row) > >>> > >>> If what is in your tsvector is 'exampl', then only the first of > >>> these will match. So IMO the question is not "why is it failing > >>> on prod?", it's "how the heck did it work on the other machine?". > >>> You won't get nice results if websearch_to_tsquery is using a > >>> different TS configuration than to_tsvector did. > >> > >> I think this shows why we are getting the results we see. Credit to ch > >> on IRC for asking this question. > >> > >> The problem host: > >> > >> freshports.org=> select websearch_to_tsquery('example'); > >> websearch_to_tsquery > >> ---------------------- > >> 'example' > >> (1 row) > > > > Ahh, this explains the differences and as to why it works where it shouldn't? > > > > freshports.org=> select setting, source from pg_settings where name = > > 'default_text_search_config'; > > setting | source > > -------------------+--------- > > pg_catalog.simple | default > > (1 row) > > > > > >> > >> The hosts on which this search works > >> > >> freshports.devgit=# select websearch_to_tsquery('example'); > >> websearch_to_tsquery > >> ---------------------- > >> 'exampl' > >> (1 row) > > > > > > freshports.devgit=# select setting, source from pg_settings where name > > = 'default_text_search_config'; > > setting | source > > --------------------+-------------------- > > pg_catalog.english | configuration file > > (1 row) > > > > > > At least now I know what I can play with to get all hosts in sync. > > Here we go, on the problem database, create a new field, based on simple, not english. > > ALTER TABLE public.ports > ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage,'/'::text, ' '::text))) stored; > > Index it: > > CREATE INDEX ports_pkgmessage_textsearchable3_idx > ON public.ports USING gin > (pkgmessage_textsearchable3) > TABLESPACE pg_default; > CREATE INDEX > > query it: > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > FROM ports > WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example'); > port_id | element_pathname > ---------+---------------------------------------------------------------------- > 34126 | /ports/head/security/pond > 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 > 60310 | /ports/branches/2020Q4/www/gitlab-ce > 38345 | /ports/head/www/gitlab > 46842 | /ports/branches/2018Q1/mail/postfix-sasl > 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod > > Success. Thank you Mr Lane. > > -- > Dan Langille > dan@langille.org > > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
I On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote: > > Dan, > > it's always good to specify configuration name in a query to avoid > recheck, since > websearch_to_tsquery(regconfig, text) is immutable, while > websearch_to_tsquery(text) is stable. immutable function calculates once in planning time, but stable function calculates during running time, so the difference may be very big depending on how many tuples found. > > See the difference: > > [local]:5433 oleg@oleg=# explain (analyze,costs off) select title > from apod where websearch_to_tsquery('simple','galaxies') @@ fts; > QUERY PLAN > --------------------------------------------------------------------------------------- > Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1) > Recheck Cond: ('''galaxies'''::tsquery @@ fts) > -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007 > rows=0 loops=1) > Index Cond: (fts @@ '''galaxies'''::tsquery) > Planning Time: 0.134 ms > Execution Time: 0.022 ms > (6 rows) > > Time: 0.369 ms > [local]:5433 oleg@oleg=# explain (analyze,costs off) select title > from apod where websearch_to_tsquery('galaxies') @@ fts; > QUERY PLAN > ----------------------------------------------------------------------------------------- > Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1) > Filter: (websearch_to_tsquery('galaxies'::text) @@ fts) > Heap Blocks: exact=276 > -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059 > rows=493 loops=1) > Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text)) > Planning Time: 0.125 ms > Execution Time: 1.518 ms > (7 rows) > > On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@langille.org> wrote: > > > > On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote: > > > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: > > >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: > > >>> "Dan Langille" <dan@langille.org> writes: > > >>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored > > >>> > > >>> That is not likely to play well with this: > > >>> > > >>>> freshports.org=> show default_text_search_config ; > > >>>> default_text_search_config > > >>>> ---------------------------- > > >>>> pg_catalog.simple > > >>> > > >>> because "english" and "simple" will stem words differently. > > >>> > > >>> regression=# select websearch_to_tsquery('english', 'example'); > > >>> websearch_to_tsquery > > >>> ---------------------- > > >>> 'exampl' > > >>> (1 row) > > >>> > > >>> regression=# select websearch_to_tsquery('simple', 'example'); > > >>> websearch_to_tsquery > > >>> ---------------------- > > >>> 'example' > > >>> (1 row) > > >>> > > >>> If what is in your tsvector is 'exampl', then only the first of > > >>> these will match. So IMO the question is not "why is it failing > > >>> on prod?", it's "how the heck did it work on the other machine?". > > >>> You won't get nice results if websearch_to_tsquery is using a > > >>> different TS configuration than to_tsvector did. > > >> > > >> I think this shows why we are getting the results we see. Credit to ch > > >> on IRC for asking this question. > > >> > > >> The problem host: > > >> > > >> freshports.org=> select websearch_to_tsquery('example'); > > >> websearch_to_tsquery > > >> ---------------------- > > >> 'example' > > >> (1 row) > > > > > > Ahh, this explains the differences and as to why it works where it shouldn't? > > > > > > freshports.org=> select setting, source from pg_settings where name = > > > 'default_text_search_config'; > > > setting | source > > > -------------------+--------- > > > pg_catalog.simple | default > > > (1 row) > > > > > > > > >> > > >> The hosts on which this search works > > >> > > >> freshports.devgit=# select websearch_to_tsquery('example'); > > >> websearch_to_tsquery > > >> ---------------------- > > >> 'exampl' > > >> (1 row) > > > > > > > > > freshports.devgit=# select setting, source from pg_settings where name > > > = 'default_text_search_config'; > > > setting | source > > > --------------------+-------------------- > > > pg_catalog.english | configuration file > > > (1 row) > > > > > > > > > At least now I know what I can play with to get all hosts in sync. > > > > Here we go, on the problem database, create a new field, based on simple, not english. > > > > ALTER TABLE public.ports > > ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage,'/'::text, ' '::text))) stored; > > > > Index it: > > > > CREATE INDEX ports_pkgmessage_textsearchable3_idx > > ON public.ports USING gin > > (pkgmessage_textsearchable3) > > TABLESPACE pg_default; > > CREATE INDEX > > > > query it: > > > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > > FROM ports > > WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example'); > > port_id | element_pathname > > ---------+---------------------------------------------------------------------- > > 34126 | /ports/head/security/pond > > 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 > > 60310 | /ports/branches/2020Q4/www/gitlab-ce > > 38345 | /ports/head/www/gitlab > > 46842 | /ports/branches/2018Q1/mail/postfix-sasl > > 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod > > > > Success. Thank you Mr Lane. > > > > -- > > Dan Langille > > dan@langille.org > > > > > > > -- > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Oleg Bartunov wrote on 12/21/22 12:31 PM:
I am glad to hear this because one of my decisions was:
* change the query to include configuration name
* change the default configuration name
I think I'm going to start using websearch_to_tsquery(regconfig, text)
Looking at my data, the largest tsvector collection has 453 elements. About 40 have more
than 170. Total number of rows is about 2400.
Another issue discussed on IRC: why store my tsvector values? Why not just index them instead?
At present: I have this column:
pkgmessage_textsearchable | tsvector | | | generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored
with this index: ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable)
Instead, I could replace that column and index with this index:
"testing" gin (to_tsvector('english'::regconfig, pkgmessage))
Simple testing showed it was comparable if not slightly faster.
The plan now: implement the index on to_tsvector, not a column, and start specifying the configuration. That's in the near future.
I've written up this journey at https://news.freshports.org/2022/12/18/when-tsvector-was-working-as-expected-on-most-hosts-but-not-one/
thank you
I On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote:Dan, it's always good to specify configuration name in a query to avoid recheck, since websearch_to_tsquery(regconfig, text) is immutable, while websearch_to_tsquery(text) is stable.
I am glad to hear this because one of my decisions was:
* change the query to include configuration name
* change the default configuration name
I think I'm going to start using websearch_to_tsquery(regconfig, text)
immutable function calculates once in planning time, but stable function calculates during running time, so the difference may be very big depending on how many tuples found.
Looking at my data, the largest tsvector collection has 453 elements. About 40 have more
than 170. Total number of rows is about 2400.
Another issue discussed on IRC: why store my tsvector values? Why not just index them instead?
At present: I have this column:
pkgmessage_textsearchable | tsvector | | | generated always as (to_tsvector('english'::regconfig, pkgmessage)) stored
with this index: ports_pkgmessage_textsearchable_idx" gin (pkgmessage_textsearchable)
Instead, I could replace that column and index with this index:
"testing" gin (to_tsvector('english'::regconfig, pkgmessage))
Simple testing showed it was comparable if not slightly faster.
The plan now: implement the index on to_tsvector, not a column, and start specifying the configuration. That's in the near future.
I've written up this journey at https://news.freshports.org/2022/12/18/when-tsvector-was-working-as-expected-on-most-hosts-but-not-one/
thank you
See the difference: [local]:5433 oleg@oleg=# explain (analyze,costs off) select title from apod where websearch_to_tsquery('simple','galaxies') @@ fts; QUERY PLAN --------------------------------------------------------------------------------------- Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1) Recheck Cond: ('''galaxies'''::tsquery @@ fts) -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007 rows=0 loops=1) Index Cond: (fts @@ '''galaxies'''::tsquery) Planning Time: 0.134 ms Execution Time: 0.022 ms (6 rows) Time: 0.369 ms [local]:5433 oleg@oleg=# explain (analyze,costs off) select title from apod where websearch_to_tsquery('galaxies') @@ fts; QUERY PLAN ----------------------------------------------------------------------------------------- Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1) Filter: (websearch_to_tsquery('galaxies'::text) @@ fts) Heap Blocks: exact=276 -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059 rows=493 loops=1) Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text)) Planning Time: 0.125 ms Execution Time: 1.518 ms (7 rows)