Re: Combine Top-k with similarity search extensions - Mailing list pgsql-novice

From 马修/Матвей/Mateo/Matt Buse
Subject Re: Combine Top-k with similarity search extensions
Date
Msg-id 20151120190010.1b404da78c54173e561a97484ed23969.a018702cca.wbe@email14.secureserver.net
Whole thread Raw
In response to Combine Top-k with similarity search extensions  (Shmagi Kavtaradze <kavtaradze.s@gmail.com>)
List pgsql-novice
<span style="font-family:Verdana; color:#000000; font-size:12pt;"><div>Dump to file, run shell script or C program to
sort(sort -u).  Searches and</div><div>comparisons work much better on sorted sets.</div><div><br /></div><div>Matt<br
/></div><blockquoteid="replyBlockquote" style="border-left: 2px solid blue; margin-left: 8px; padding-left: 8px;
font-size:10pt;color:black; font-family:verdana;" webmail="1"><div id="wmQuoteWrapper"> -------- Original Message
--------<br/> Subject: Re: [NOVICE] Combine Top-k with similarity search extensions<br /> From: Shmagi Kavtaradze
<<ahref="mailto:kavtaradze.s@gmail.com">kavtaradze.s@gmail.com</a>><br /> Date: Fri, November 20, 2015 8:13 am<br
/>To: <a href="mailto:tim.child@comcast.net">tim.child@comcast.net</a><br /> Cc: <a
href="mailto:pgsql-novice@postgresql.org">pgsql-novice@postgresql.org</a><br/><br /><div dir="ltr">It will add
complexityand also no idea how to do it. Is there any alternative?</div><div class="gmail_extra"><br /><div
class="gmail_quote">OnFri, Nov 20, 2015 at 5:00 PM, <span dir="ltr"><<a href="mailto:tim.child@comcast.net"
target="_blank">tim.child@comcast.net</a>></span>wrote:<br /><blockquote class="gmail_quote" style="margin:0 0 0
.8ex;border-left:1px#ccc solid;padding-left:1ex"><div><div
style="font-family:Arial;font-size:12pt;color:#000000"><div>Shmagi,<br/></div><div><br /></div><div>Take the first 20 
textcharacters and compute and store the CRC32 or MD5  of that value.  That value acts as a signature. You can then
findall distinct signatures,  or all rows with duplicate signatures for further analysis  You could event try building
asignature on the full text string.<br /></div><div><br /></div><div><br /></div><div><br /></div><hr /><div
style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt"><b>From:
</b>"ShmagiKavtaradze" <<a href="mailto:kavtaradze.s@gmail.com" target="_blank">kavtaradze.s@gmail.com</a>><br
/><b>To:</b><a href="mailto:pgsql-novice@postgresql.org" target="_blank">pgsql-novice@postgresql.org</a><br /><b>Sent:
</b>Friday,November 20, 2015 2:21:36 AM<br /><b>Subject: </b>[NOVICE] Combine Top-k with similarity search
extensions<spanclass=""><br /><div><br /></div><div dir="ltr">I am performing similarity check over a column in a table
withabout 3500 entries. Column is populated with text data from text file. Performing a check results in 3500 * 3500
rowsand it takes forever to calculate for my virtual machine. Is there any way to calculate for top-k results, to
decreaseamount and time needed? What I mean is that, for example when checking two sentences, if first several words
doesnot match, to stop checking that sentences and move on. </div></span></div><div><br
/></div></div></div></blockquote></div><br/></div></div></blockquote></span> 

pgsql-novice by date:

Previous
From: tim.child@comcast.net
Date:
Subject: Re: Combine Top-k with similarity search extensions
Next
From: JORGE MALDONADO
Date:
Subject: Advice about a parent-child relation design