Home > mailing lists

Re: Similarity search for sentences - Mailing list pgsql-general

From	Rémi Cura
Subject	Re: Similarity search for sentences
Date	December 5, 2013 12:13:01
Msg-id	CAJvUf_tb_bdk4nCMHMfRB2XvFTUxzYW0ho6kL5ALGp2y3nKxvg@mail.gmail.com Whole thread
In response to	Similarity search for sentences ("Janek Sendrowski" <janek12@web.de>)
List	pgsql-general

Tree view

May be totally a bad idea :

explode your sentence into(sentence_number, one_word), n times , (makes a big table, you may want to partition)

then, classic index on sentence number, and on the one world (btree if you make = comparison , more subtel if you do "like 'word' ")

depending on perf, it could be wort it to regroup by words :

sentence_number[], on_word

Then you could try array or hstore on sentence_number[] ?

Cheers,

Rémi-C

2013/12/5 Janek Sendrowski <janek12@web.de>

Hi,

I have tables with millions of sentences. Each row contains a sentence. It is natural language and every language is possible, but the sentences of one table have the same language.
I have to do a similarity search on them. It has to be very fast, because I have to search for a few hundert sentences many times.
The search shouldn't be context-based. It should just get sentences with similar words(maybe stemmed).

I already had a try with gist/gin-index-based trigramm search (pg_trgm extension), fulltextsearch (tsearch2 extension) and a pivot-based indexing (Fixed Query Array), but it's all to slow or not suitable.
Soundex and Metaphone aren't suitable, as well.

I'm already working on this project since a long time, but without any success.
Do any of you have an idea?

I would be very thankful for help.

Janek Sendrowski

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

From: "Janek Sendrowski"
Date: 05 December 2013, 11:51:59
Subject: Similarity search for sentences

From: 吕晓旭
Date: 05 December 2013, 13:27:35
Subject: Fwd: Help！Why CPU Usage and LoadAverage Jump up Suddenly

Re: Similarity search for sentences - Mailing list pgsql-general

Previous

Next