Home > mailing lists

Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support) - Mailing list pgsql-hackers

From	Gordon Mohr
Subject	Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support)
Date	October 27, 2013 06:34:20
Msg-id	526C3EB7.3080002@xavvy.com Whole thread Raw
In response to	Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support) (Alvaro Herrera <alvherre@2ndquadrant.com>)
List	pgsql-hackers

Tree view

On 10/23/13 9:05 PM, Alvaro Herrera wrote:
> Gordon Mohr wrote:
>
>> Thanks for this! I decided to give the patch a try at the bleeding
>> edge with some high-dimensional vectors, specifically the 1.4
>> million 1000-dimensional Freebase entity vectors from the Google
>> 'word2vec' project:
>>
>> https://code.google.com/p/word2vec/#Pre-trained_entity_vectors_with_Freebase_naming
>>
>> Unfortunately, here's what I found:
>
> I wonder if these results would improve with this patch:
> http://www.postgresql.org/message-id/EFEDC2BF-AB35-4E2C-911F-FC88DA6473D7@gmail.com

Thanks for the pointer; I'd missed that relevant update from Stas 
Kelvich. I applied that patch, and reindexed.

On the 100-dimension, 850K vector set:

indexing:  1137s (vs. 1344s)
DATA size: 4.7G (vs 5.0G)
top-11-nearest-neighbor query: 32s (vs ~57s)

On the 500-dimension, 100K vector set:

indexing: 756s (vs. 977s)
DATA size: 4.5G (vs. 4.8G)
top-11-nearest-neighbor query: 18s (vs ~46s)

So, moderate (5-20%) improvements in indexing time and size, and larger 
(40-60%) speedups in index-assisted (<->) queries... but those 
index-assisted queries are still ~10X+ slower than the sequence-scan 
(distance_euclid()) queries, so the existence of the knn-GIST index is 
still harming rather than hurting performance.

Will update if my understanding changes; still interested to hear if 
I've missed a key factor/switch needed for these indexes to work well.

- Gordon Mohr

pgsql-hackers by date:

From: Rodolfo Campero
Date: 27 October 2013, 06:33:43
Subject: PL/Python: domain over array support

From: Pavel Stehule
Date: 27 October 2013, 12:41:14
Subject: Re: proposal: lob conversion functionality

Re: high-dimensional knn-GIST tests (was Re: Cube extension kNN support) - Mailing list pgsql-hackers

Previous

Next