Re: Could use some advice on search architecture - Mailing list pgsql-general

From Marc Mamin
Subject Re: Could use some advice on search architecture
Date
Msg-id B6F6FD62F2624C4C9916AC0175D56D8828A66565@jenmbs01.ad.intershop.net
Whole thread Raw
In response to Re: Could use some advice on search architecture  (Andy Colson <andy@squeakycode.net>)
List pgsql-general
>On 04/19/2014 06:26 AM, Ron Pasch wrote:
>>  > - It should be possible to search for products and provide properties
>>  > that the product SHOULD have, not must have.
>>>
>>> I don't understand this.  Say you have a sprocket in red and green. Do you want to search for:
>>>
>>> select * from product where name = 'sprocket' and (color = 'red' or color = 'green')
>>>
>>> Or do you want something else?  Does the user say they'd "prefer" blue, but will take whatever else you have?
>>>
>>> Do you search for some properties exactly and some "preferred"?
>>>
>>> Perhaps you could describe a little more how you want to query the database?  Or, maybe, what your user's are
searchingfor? 
>>>
>>> -Andy
>>>
>>
>> Yes, the user can prefer certain properties and the products that match most of the properties should be in the top
ofthe results, but if a product doesn't match all of them but just some of them, they should still be returned, but
lowerin the results. 
>>
>> I'm seriously wondering if doing this solely with postgres is even possible without having long execution times.
I'vedone some tests with 5 million records and just doing the "or" construction you mentioned above, which resulted in
600to 900 ms queries and returning only those records of which all properties match at least one selected value. 

 I don't think that OR clauses are the right way as the aim is to count how many attributes do match the search.
 basically a standard approach would look like:

     SELECT pID, sum(match) as matches FROM
     (
     selct pID, 1 as match from Products where color ='pink'
     UNION ALL
     selct pID, 1 as match from Products where size ='XXL'
     ...
     )foo
    GROUP BY pID order by matches DESC

How many distinct attributes are involved ? ( 15 colors + 9 sizes + ....)

Marc

>>
>> I was thinking that perhaps using a search engine like lucene or sphinx would be more appropriate, but then I wonder
whatI would exactly be indexing and how I would be querying that, but that's a question for a different mailing list
;-)
>
>Please keep the list cc'd, so others can help as well.
>
>Yeah, doing a bunch of or's is gonna have to test all 5 million products.
>
>I wonder if there is a way we can treat this like a two step process.
>
>1) cut down the number of products
>
>2) sort them by #matches, popularity, etc
>
>You've talked about #2, but how about #1.  Is there any way to either include or exclude a product?  Users don't just
askfor red, they ask for "tires (maybe red)".  Not all 5 million products are tires, right? 
>
>-Andy


pgsql-general by date:

Previous
From: Andy Colson
Date:
Subject: Re: Could use some advice on search architecture
Next
From: Torsten Förtsch
Date:
Subject: Disable an index temporarily