Full text search tsv column aproach vs concat confusion - Mailing list pgsql-general

From cen
Subject Full text search tsv column aproach vs concat confusion
Date
Msg-id b2bdd937-1827-e8d2-7da4-a866ef7ed646@gmail.com
Whole thread Raw
Responses Re: Full text search tsv column aproach vs concat confusion  (Artur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-general

Hi

I am seeking some clarification in regard to full text search across multiple tables and what the best approach is. Documentation talks about two approaches when it comes to building a document: on-the-fly concat of columns and a dedicated tsv column approach. Let's say I want to perform a search for "txt1 & txt2 & txt3" on columns table1.col1, table1.col2 and table2.col1. I see the following solutions:

1. Concat all three into a document and perform a FTS.

SELECT * FROM (

SELECT to_tsvector(table1.col1) ||       to_tsvector(table1.col2) ||      to_tsvector(table2.col1) as document FROM table1 LEFT JOIN table2 ON table1.table2_id=table2.id

) subquery WHERE subquery.document @@ to_tsquery(unaccent(?));

2. Create a tsv column in each table, concat tsv columns and perform FTS on that.

SELECT * FROM table1 LEFT JOIN table2 ON table1.table2_id=table2.id WHERE table1.tsv || tale2.tsv @@ to_tsquery(unaccent(?));

3. Have a tsv column only in table1 and insert table2.col1 to the tsv via triggers. Works but seems very hacky.


It seems to me that option #2 is fast and easy to implement but I am not sure what the concat of tsvs really means from index usage and performance standpoint. Option #1 is the most flexible and I'd use that all the time if it was not THAT much slower than tsv column approacj. Documentation on TSV columns states: "Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches."

The question is, how much faster are tsv columns really? Are there any benchmarks about this? If the performance difference is negligible I'd advocate that using tsv columns is a waste of time and space in most general cases. But since there is no information on how much faster it's hard to decide.


Best regards,
Klemen




pgsql-general by date:

Previous
From: "Charles Clavadetscher"
Date:
Subject: Re: Request to share information regarding errors
Next
From: Merlin Moncure
Date:
Subject: Re: Change column type from int to bigint - quickest way