Home > mailing lists

Re: Text search lexer's handling of hyphens and negatives - Mailing list pgsql-general

From	Alan Hodgson
Subject	Re: Text search lexer's handling of hyphens and negatives
Date	October 16, 2019 15:50:59
Msg-id	45772fae97692f848fa102b0ddb92ad0c6d6b5bc.camel@lists.simkin.ca Whole thread Raw
In response to	Re: Text search lexer's handling of hyphens and negatives (raylu <lurayl@gmail.com>)
List	pgsql-general

Tree view

On Tue, 2019-10-15 at 20:34 -0700, raylu wrote:

On Tue, Oct 15, 2019 at 3:35 PM Alan Hodgson <

ahodgson@lists.simkin.ca

> wrote:

My company has found the pg_trm extension to be more useful for partial text searches than the full text functions. I don't know specifically how it might help with your hyphens but it would be worth testing. The docs actually suggest using them in conjunction in some cases.

We actually do use pg_trgm already for the names/titles of things.

Indexing the content with a trigram index and then doing

LOWER(content) LIKE '%789-xyz%' would certainly work, but

1. we'd have to do a little bit of finagling if we wanted to match on

word boundaries (don't match '6789-xyza' in the above example)

2. trigram indexes are pretty huge for long documents, which is why we

currently only use them for names/titles

We may give up and just use pg_trgm for contents if nothing else works

out but it feels like the text search lexer is _so_ close to what we

want.

Maybe you could have a trigger pull out those specific hypenated references into a separate column when the document is added or updated, and store/index those separately?

pgsql-general by date:

From: Adrian Klaver
Date: 16 October 2019, 14:45:39
Subject: Re: Analyze and vaccum

From: Alexander Pyhalov
Date: 16 October 2019, 16:29:55
Subject: PostgreSQL memory usage

Re: Text search lexer's handling of hyphens and negatives - Mailing list pgsql-general

Previous

Next