Re: tsvector not giving expected results on one host - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: tsvector not giving expected results on one host |
Date | |
Msg-id | CAF4Au4zY9nA27diCUgg_=e+R5y5Xj0hLiTzh9yLSzw4kTpyuOA@mail.gmail.com Whole thread Raw |
In response to | Re: tsvector not giving expected results on one host (Oleg Bartunov <obartunov@postgrespro.ru>) |
Responses |
Re: tsvector not giving expected results on one host
|
List | pgsql-general |
I On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote: > > Dan, > > it's always good to specify configuration name in a query to avoid > recheck, since > websearch_to_tsquery(regconfig, text) is immutable, while > websearch_to_tsquery(text) is stable. immutable function calculates once in planning time, but stable function calculates during running time, so the difference may be very big depending on how many tuples found. > > See the difference: > > [local]:5433 oleg@oleg=# explain (analyze,costs off) select title > from apod where websearch_to_tsquery('simple','galaxies') @@ fts; > QUERY PLAN > --------------------------------------------------------------------------------------- > Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1) > Recheck Cond: ('''galaxies'''::tsquery @@ fts) > -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007 > rows=0 loops=1) > Index Cond: (fts @@ '''galaxies'''::tsquery) > Planning Time: 0.134 ms > Execution Time: 0.022 ms > (6 rows) > > Time: 0.369 ms > [local]:5433 oleg@oleg=# explain (analyze,costs off) select title > from apod where websearch_to_tsquery('galaxies') @@ fts; > QUERY PLAN > ----------------------------------------------------------------------------------------- > Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1) > Filter: (websearch_to_tsquery('galaxies'::text) @@ fts) > Heap Blocks: exact=276 > -> Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059 > rows=493 loops=1) > Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text)) > Planning Time: 0.125 ms > Execution Time: 1.518 ms > (7 rows) > > On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@langille.org> wrote: > > > > On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote: > > > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote: > > >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote: > > >>> "Dan Langille" <dan@langille.org> writes: > > >>>> pkgmessage_textsearchable2 | tsvector | | | generated always as (to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored > > >>> > > >>> That is not likely to play well with this: > > >>> > > >>>> freshports.org=> show default_text_search_config ; > > >>>> default_text_search_config > > >>>> ---------------------------- > > >>>> pg_catalog.simple > > >>> > > >>> because "english" and "simple" will stem words differently. > > >>> > > >>> regression=# select websearch_to_tsquery('english', 'example'); > > >>> websearch_to_tsquery > > >>> ---------------------- > > >>> 'exampl' > > >>> (1 row) > > >>> > > >>> regression=# select websearch_to_tsquery('simple', 'example'); > > >>> websearch_to_tsquery > > >>> ---------------------- > > >>> 'example' > > >>> (1 row) > > >>> > > >>> If what is in your tsvector is 'exampl', then only the first of > > >>> these will match. So IMO the question is not "why is it failing > > >>> on prod?", it's "how the heck did it work on the other machine?". > > >>> You won't get nice results if websearch_to_tsquery is using a > > >>> different TS configuration than to_tsvector did. > > >> > > >> I think this shows why we are getting the results we see. Credit to ch > > >> on IRC for asking this question. > > >> > > >> The problem host: > > >> > > >> freshports.org=> select websearch_to_tsquery('example'); > > >> websearch_to_tsquery > > >> ---------------------- > > >> 'example' > > >> (1 row) > > > > > > Ahh, this explains the differences and as to why it works where it shouldn't? > > > > > > freshports.org=> select setting, source from pg_settings where name = > > > 'default_text_search_config'; > > > setting | source > > > -------------------+--------- > > > pg_catalog.simple | default > > > (1 row) > > > > > > > > >> > > >> The hosts on which this search works > > >> > > >> freshports.devgit=# select websearch_to_tsquery('example'); > > >> websearch_to_tsquery > > >> ---------------------- > > >> 'exampl' > > >> (1 row) > > > > > > > > > freshports.devgit=# select setting, source from pg_settings where name > > > = 'default_text_search_config'; > > > setting | source > > > --------------------+-------------------- > > > pg_catalog.english | configuration file > > > (1 row) > > > > > > > > > At least now I know what I can play with to get all hosts in sync. > > > > Here we go, on the problem database, create a new field, based on simple, not english. > > > > ALTER TABLE public.ports > > ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig, translate(pkgmessage,'/'::text, ' '::text))) stored; > > > > Index it: > > > > CREATE INDEX ports_pkgmessage_textsearchable3_idx > > ON public.ports USING gin > > (pkgmessage_textsearchable3) > > TABLESPACE pg_default; > > CREATE INDEX > > > > query it: > > > > freshports.org=> SELECT id as port_id, element_pathname(element_id) > > FROM ports > > WHERE pkgmessage_textsearchable3 @@ websearch_to_tsquery('example'); > > port_id | element_pathname > > ---------+---------------------------------------------------------------------- > > 34126 | /ports/head/security/pond > > 74559 | /ports/branches/2015Q3/emulators/linux_base-c6 > > 60310 | /ports/branches/2020Q4/www/gitlab-ce > > 38345 | /ports/head/www/gitlab > > 46842 | /ports/branches/2018Q1/mail/postfix-sasl > > 51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod > > > > Success. Thank you Mr Lane. > > > > -- > > Dan Langille > > dan@langille.org > > > > > > > -- > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-general by date: