Re: tsvector not giving expected results on one host - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: tsvector not giving expected results on one host
Date
Msg-id CAF4Au4zY9nA27diCUgg_=e+R5y5Xj0hLiTzh9yLSzw4kTpyuOA@mail.gmail.com
Whole thread Raw
In response to Re: tsvector not giving expected results on one host  (Oleg Bartunov <obartunov@postgrespro.ru>)
Responses Re: tsvector not giving expected results on one host
List pgsql-general
I

On Wed, Dec 21, 2022 at 1:12 PM Oleg Bartunov <obartunov@postgrespro.ru> wrote:
>
> Dan,
>
> it's always good to specify configuration name in a query to avoid
> recheck,  since
> websearch_to_tsquery(regconfig, text) is immutable, while
> websearch_to_tsquery(text) is stable.

immutable function calculates once in planning time, but stable
function calculates during running time,
so the difference may be very big depending on how many tuples found.

>
> See the difference:
>
> [local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
> from apod where  websearch_to_tsquery('simple','galaxies') @@ fts;
>                                       QUERY PLAN
> ---------------------------------------------------------------------------------------
>  Bitmap Heap Scan on apod (actual time=0.008..0.008 rows=0 loops=1)
>    Recheck Cond: ('''galaxies'''::tsquery @@ fts)
>    ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.007..0.007
> rows=0 loops=1)
>          Index Cond: (fts @@ '''galaxies'''::tsquery)
>  Planning Time: 0.134 ms
>  Execution Time: 0.022 ms
> (6 rows)
>
> Time: 0.369 ms
> [local]:5433 oleg@oleg=# explain (analyze,costs off)  select title
> from apod where  websearch_to_tsquery('galaxies') @@ fts;
>                                        QUERY PLAN
> -----------------------------------------------------------------------------------------
>  Bitmap Heap Scan on apod (actual time=0.107..1.463 rows=493 loops=1)
>    Filter: (websearch_to_tsquery('galaxies'::text) @@ fts)
>    Heap Blocks: exact=276
>    ->  Bitmap Index Scan on gin_apod_fts_idx (actual time=0.059..0.059
> rows=493 loops=1)
>          Index Cond: (fts @@ websearch_to_tsquery('galaxies'::text))
>  Planning Time: 0.125 ms
>  Execution Time: 1.518 ms
> (7 rows)
>
> On Sat, Dec 17, 2022 at 11:34 PM Dan Langille <dan@langille.org> wrote:
> >
> > On Sat, Dec 17, 2022, at 3:22 PM, Dan Langille wrote:
> > > On Sat, Dec 17, 2022, at 3:14 PM, Dan Langille wrote:
> > >> On Sat, Dec 17, 2022, at 2:55 PM, Tom Lane wrote:
> > >>> "Dan Langille" <dan@langille.org> writes:
> > >>>>  pkgmessage_textsearchable2 | tsvector                 |           |          | generated always as
(to_tsvector('english'::regconfig,translate(pkgmessage, '/'::text, ' '::text))) stored
 
> > >>>
> > >>> That is not likely to play well with this:
> > >>>
> > >>>> freshports.org=> show  default_text_search_config ;
> > >>>>  default_text_search_config
> > >>>> ----------------------------
> > >>>>  pg_catalog.simple
> > >>>
> > >>> because "english" and "simple" will stem words differently.
> > >>>
> > >>> regression=# select websearch_to_tsquery('english', 'example');
> > >>>  websearch_to_tsquery
> > >>> ----------------------
> > >>>  'exampl'
> > >>> (1 row)
> > >>>
> > >>> regression=# select websearch_to_tsquery('simple', 'example');
> > >>>  websearch_to_tsquery
> > >>> ----------------------
> > >>>  'example'
> > >>> (1 row)
> > >>>
> > >>> If what is in your tsvector is 'exampl', then only the first of
> > >>> these will match.  So IMO the question is not "why is it failing
> > >>> on prod?", it's "how the heck did it work on the other machine?".
> > >>> You won't get nice results if websearch_to_tsquery is using a
> > >>> different TS configuration than to_tsvector did.
> > >>
> > >> I think this shows why we are getting the results we see.  Credit to ch
> > >> on IRC for asking this question.
> > >>
> > >> The problem host:
> > >>
> > >> freshports.org=> select websearch_to_tsquery('example');
> > >>  websearch_to_tsquery
> > >> ----------------------
> > >>  'example'
> > >> (1 row)
> > >
> > > Ahh, this explains the differences and as to why it works where it shouldn't?
> > >
> > > freshports.org=> select setting, source from pg_settings where name =
> > > 'default_text_search_config';
> > >       setting      | source
> > > -------------------+---------
> > >  pg_catalog.simple | default
> > > (1 row)
> > >
> > >
> > >>
> > >> The hosts on which this search works
> > >>
> > >> freshports.devgit=# select websearch_to_tsquery('example');
> > >>  websearch_to_tsquery
> > >> ----------------------
> > >>  'exampl'
> > >> (1 row)
> > >
> > >
> > > freshports.devgit=# select setting, source from pg_settings where name
> > > = 'default_text_search_config';
> > >       setting       |       source
> > > --------------------+--------------------
> > >  pg_catalog.english | configuration file
> > > (1 row)
> > >
> > >
> > > At least now I know what I can play with to get all hosts in sync.
> >
> > Here we go, on the problem database, create a new field, based on simple, not english.
> >
> > ALTER TABLE public.ports
> >     ADD COLUMN pkgmessage_textsearchable3 tsvector generated always as (to_tsvector('simple'::regconfig,
translate(pkgmessage,'/'::text, ' '::text))) stored;
 
> >
> > Index it:
> >
> > CREATE INDEX ports_pkgmessage_textsearchable3_idx
> >     ON public.ports USING gin
> >     (pkgmessage_textsearchable3)
> >     TABLESPACE pg_default;
> > CREATE INDEX
> >
> > query it:
> >
> > freshports.org=> SELECT id as port_id, element_pathname(element_id)
> >   FROM ports
> >  WHERE pkgmessage_textsearchable3  @@ websearch_to_tsquery('example');
> >  port_id |                           element_pathname
> > ---------+----------------------------------------------------------------------
> >    34126 | /ports/head/security/pond
> >    74559 | /ports/branches/2015Q3/emulators/linux_base-c6
> >    60310 | /ports/branches/2020Q4/www/gitlab-ce
> >    38345 | /ports/head/www/gitlab
> >    46842 | /ports/branches/2018Q1/mail/postfix-sasl
> >    51532 | /ports/branches/2019Q1/graphics/drm-legacy-kmod
> >
> > Success. Thank you Mr Lane.
> >
> > --
> >   Dan Langille
> >   dan@langille.org
> >
> >
>
>
> --
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: dropped default locale
Next
From: Alicja Kucharczyk
Date:
Subject: Re: Regd. the Implementation of Wallet (in Oracle) config equivalent in postgreSQL whilst the database migration