On Tue, May 11, 2021 at 5:34 AM Bruce Momjian <bruce@momjian.us> wrote:
> On Mon, May 10, 2021 at 04:02:27PM +0300, Alexander Korotkov wrote:
> > Hi, Bruce!
> >
> > On Mon, May 10, 2021 at 9:03 AM Bruce Momjian <bruce@momjian.us> wrote:
> > > I have committed the first draft of the PG 14 release notes. You can
> > > see the most current build of them here:
> > >
> > > https://momjian.us/pgsql_docs/release-14.html
> > >
> > > I need clarification on many items, and the document still needs its
> > > items properly ordered, and markup added. I also expect a lot of
> > > feedback.
> > >
> > > I plan to work on completing this document this coming week in
> > > preparation for beta next week.
> >
> > Thank you very much for your work!
> >
> > Let me provide a missing description for the items related to me.
> >
> > * Improve handling of compound words in to_tsquery() and
> > websearch_to_tsquery() (Alexander Korotkov)
> > Compound words are now transformed into parts connected with phrase
> > search operators. For example, to_tsquery('pg_class') becomes 'pg <->
> > class' instead of 'pg & class'. This eliminates bug of handling
> > compound words connected with the phrase operator and makes the search
> > of compound words more strict.
>
> OK, what symbols trigger this change? Underscore? What else?
Any symbol, which is recognized as a separator by full-text parser,
but not tsquery parser. Fulltext search is extensible and allowing
pluggable parsers. In principle, we could dig the exact set of
symbols, but I'm not sure this worth the effort.
> You are
> saying the previous code allowed 'pg' and 'class' anywhere in the
> string, while the new code requires them to be adjacent, which more
> closely matches the pattern.
Yes, that's it.
> > * Fix extra distance in phrase operators for quoted text in
> > websearch_to_tsquery() (Alexander Korotkov)
> > For example, websearch_to_tsquery('english', '"aaa: bbb"') becomes
> > 'aaa <> bbb' instead of 'aaa <2> bbb'.
>
> So colon and space were considered to be two tokens between 'aaa' and
> 'bbb', while is really only one because both tokens are discarded? Is
> this true of any discarded tokens, e.g. ''"aaa ?:, bbb"'?
Yes, that's true for any discarded tokens.
------
Regards,
Alexander Korotkov