Re: Find similar records (compare tsvectors) - Mailing list pgsql-general

From Patrick Dung
Subject Re: Find similar records (compare tsvectors)
Date
Msg-id 484316264.2758984.1425650759221.JavaMail.yahoo@mail.yahoo.com
Whole thread Raw
In response to Find similar records (compare tsvectors)  (Patrick Dung <patrick_dkt@yahoo.com.hk>)
Responses Re: Find similar records (compare tsvectors)  (Oleg Bartunov <obartunov@gmail.com>)
List pgsql-general
Resend.

How to quickly compare the similarity of two tsvector?


On Monday, March 2, 2015 11:01 PM, Patrick Dung <patrick_dkt@yahoo.com.hk> wrote:


Hello,

I had a database with articles or attachment stored in bytea format.
I also had a trigger: it insert/update the tsv column when a record is added/updated.
The tsv column had a GIN index.
With this setting, I can do very fast keyword search on the tsv.

Suppose I had a specific record (id=100000).
How to list similar records based on ranking?
In that case, I had to compare a tsvector with another tsvector.

I had this SQL which make the original tsv as a text and then to tsquery, Then I can compare a tsv and a tsquery.
SELECT ts_rank(i.tsv, replace(strip(original.tsv)::text, ' ', '|')::tsquery) as similarity, i.company, i.industry, i.post_timestamp, i.id FROM items i, (SELECT tsv, id FROM items WHERE id=100000) AS original WHERE i.id != original.id ORDER BY similarity;

items table:
id bigint
company varchar
industry varchar
description varchar
post_timestamp timestamp
attachment bytea
tsv tsvector

The problem is that this is very slow.
Any comment?

Thank and regards,
Patrick


pgsql-general by date:

Previous
From: Kevin Grittner
Date:
Subject: Re: #PERSONAL# Reg: date going as 01/01/0001
Next
From: Alvaro Herrera
Date:
Subject: Re: VACUUM FULL doesn't reduce table size