Re: CUBE_MAX_DIM - Mailing list pgsql-hackers

From Alastair McKinley
Subject Re: CUBE_MAX_DIM
Date
Msg-id PR1PR02MB534067DDB48CCDC51456CB69E3920@PR1PR02MB5340.eurprd02.prod.outlook.com
Whole thread Raw
In response to Re: CUBE_MAX_DIM  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
> From: Tom Lane <tgl@sss.pgh.pa.us>
> Sent: 25 June 2020 17:43
>
> Alastair McKinley <a.mckinley@analyticsengines.com> writes:
> > I know that Cube in it's current form isn't suitable for nearest-neighbour searching these vectors in their raw
form(I have tried recompilation with higher CUBE_MAX_DIM myself), but conceptually kNN GiST searches using Cubes can be
usefulfor these applications.  There are other pre-processing techniques that can be used to improved the speed of the
search,but it still ends up with a kNN search in a high-ish dimensional space. 
>
> Is there a way to fix the numerical instability involved?  If we could do
> that, then we'd definitely have a use-case justifying the work to make
> cube toastable.

I am not that familiar with the nature of the numerical instability, but it might be worth noting for additional
contextthat for the NN use case: 

- The value of each dimension is likely to be between 0 and 1
- The L1 distance is meaningful for high numbers of dimensions, which *possibly* suffers less from the numeric issues
thaneuclidean distance. 

The numerical stability isn't the only issue for high dimensional kNN, the GiST search performance currently degrades
withincreasing N towards sequential scan performance, although maybe they are related? 

>                         regards, tom lane

Best regards,
Alastair


pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Default setting for enable_hashagg_disk
Next
From: Jeff Davis
Date:
Subject: Re: Default setting for enable_hashagg_disk