Re: Combine Top-k with similarity search extensions - Mailing list pgsql-novice

From 马修/Матвей/Mateo/Matt Buse
Subject Re: Combine Top-k with similarity search extensions
Date
Msg-id 20151120190010.1b404da78c54173e561a97484ed23969.a018702cca.wbe@email14.secureserver.net
Whole thread Raw
In response to Combine Top-k with similarity search extensions  (Shmagi Kavtaradze <kavtaradze.s@gmail.com>)
List pgsql-novice
Dump to file, run shell script or C program to sort (sort -u).  Searches and
comparisons work much better on sorted sets.

Matt
-------- Original Message --------
Subject: Re: [NOVICE] Combine Top-k with similarity search extensions
From: Shmagi Kavtaradze <kavtaradze.s@gmail.com>
Date: Fri, November 20, 2015 8:13 am
To: tim.child@comcast.net
Cc: pgsql-novice@postgresql.org

It will add complexity and also no idea how to do it. Is there any alternative?

On Fri, Nov 20, 2015 at 5:00 PM, <tim.child@comcast.net> wrote:
Shmagi,

Take the first 20  text characters and compute and store the CRC32 or MD5  of that value.  That value acts as a signature. You can then find all distinct signatures,  or all rows with duplicate signatures for further analysis  You could event try building a signature on the full text string.




From: "Shmagi Kavtaradze" <kavtaradze.s@gmail.com>
To: pgsql-novice@postgresql.org
Sent: Friday, November 20, 2015 2:21:36 AM
Subject: [NOVICE] Combine Top-k with similarity search extensions

I am performing similarity check over a column in a table with about 3500 entries. Column is populated with text data from text file. Performing a check results in 3500 * 3500 rows and it takes forever to calculate for my virtual machine. Is there any way to calculate for top-k results, to decrease amount and time needed? What I mean is that, for example when checking two sentences, if first several words does not match, to stop checking that sentences and move on. 


pgsql-novice by date:

Previous
From: tim.child@comcast.net
Date:
Subject: Re: Combine Top-k with similarity search extensions
Next
From: JORGE MALDONADO
Date:
Subject: Advice about a parent-child relation design