Thread: Full text search tsv column aproach vs concat confusion

Full text search tsv column aproach vs concat confusion

From
cen
Date:

Hi

I am seeking some clarification in regard to full text search across multiple tables and what the best approach is. Documentation talks about two approaches when it comes to building a document: on-the-fly concat of columns and a dedicated tsv column approach. Let's say I want to perform a search for "txt1 & txt2 & txt3" on columns table1.col1, table1.col2 and table2.col1. I see the following solutions:

1. Concat all three into a document and perform a FTS.

SELECT * FROM (

SELECT to_tsvector(table1.col1) ||       to_tsvector(table1.col2) ||      to_tsvector(table2.col1) as document FROM table1 LEFT JOIN table2 ON table1.table2_id=table2.id

) subquery WHERE subquery.document @@ to_tsquery(unaccent(?));

2. Create a tsv column in each table, concat tsv columns and perform FTS on that.

SELECT * FROM table1 LEFT JOIN table2 ON table1.table2_id=table2.id WHERE table1.tsv || tale2.tsv @@ to_tsquery(unaccent(?));

3. Have a tsv column only in table1 and insert table2.col1 to the tsv via triggers. Works but seems very hacky.


It seems to me that option #2 is fast and easy to implement but I am not sure what the concat of tsvs really means from index usage and performance standpoint. Option #1 is the most flexible and I'd use that all the time if it was not THAT much slower than tsv column approacj. Documentation on TSV columns states: "Another advantage is that searches will be faster, since it will not be necessary to redo the to_tsvector calls to verify index matches."

The question is, how much faster are tsv columns really? Are there any benchmarks about this? If the performance difference is negligible I'd advocate that using tsv columns is a waste of time and space in most general cases. But since there is no information on how much faster it's hard to decide.


Best regards,
Klemen




Re: Full text search tsv column aproach vs concat confusion

From
Artur Zakirov
Date:
Hi,

On 16.11.2016 11:54, cen wrote:
> Hi
>
> I am seeking some clarification in regard to full text search across
> multiple tables and what the best approach is. Documentation talks about
> two approaches when it comes to building a document: on-the-fly concat
> of columns and a dedicated tsv column approach. Let's say I want to
> perform a search for |"txt1 & txt2 & txt3" on columns table1.col1,
> table1.col2 and table2.col1. I see the following solutions:|
>
> |1. Concat all three into a document and perform a FTS.|
>
> |SELECT * FROM (
> |
>
> |SELECTto_tsvector(table1.col1)||to_tsvector(table1.col2)||to_tsvector(table2.col1)asdocument
> F|||ROM table1 LEFTJOINtable2 ONtable1.table2_id=table2.id| ) subquery
|||WHEREsubquery.document@@to_tsquery(unaccent(?));|| 
>
> |2. Create a tsv column in each table, concat tsv columns and perform
> FTS on that.|
>
> |SELECT*FROMtable1 LEFTJOINtable2 ONtable1.table2_id=table2.id
> WHEREtable1.tsv ||tale2.tsv @@to_tsquery(unaccent(?));|
>
> |3. Have a tsv column only in table1 and insert table2.col1 to the tsv
> via triggers. Works but seems very hacky.|
>
> |
> |
>
> |It seems to me that option #2 is fast and easy to implement but I am
> not sure what the concat of tsvs really means from index usage and
> performance standpoint. Option #1 is the most flexible and I'd use that
> all the time if it was not THAT much slower than tsv column approacj.
> Documentation on TSV columns states: "||Another advantage is that
> searches will be faster, since it will not be necessary to redo the
> to_tsvector calls to verify index matches."
> |
>
> The question is, how much faster are tsv columns really? Are there any
> benchmarks about this? If the performance difference is negligible I'd
> advocate that using tsv columns is a waste of time and space in most
> general cases. But since there is no information on how much faster it's
> hard to decide.
>

I haven't any such benchmarks. But if you have a real database, you can
perform tests using it on your solutions. Because it depends on your
task and what you need.

By the way, I suppose it is better to use COALESCE() function if your
columns could have NULL value:

SELECT * FROM (

SELECT to_tsvector(coalesce(table1.col1,'')) ||
        to_tsvector(coalesce(table1.col2,'')) ||
        to_tsvector(coalesce(table2.col1,'')) as document FROM table1
LEFT JOIN table2 ON table1.table2_id=table2.id

) subquery WHERE subquery.document @@ to_tsquery(unaccent(?));

And specifying a text search configuration makes queries a little bit
faster:

... to_tsvector('english', coalesce(table1.col1,'')) ...

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company