Re: Could use some advice on search architecture - Mailing list pgsql-general

From Andy Colson
Subject Re: Could use some advice on search architecture
Date
Msg-id 53529070.3000307@squeakycode.net
Whole thread Raw
In response to Could use some advice on search architecture  (Ron Pasch <postgresql@ronpasch.nl>)
Responses Re: Could use some advice on search architecture  (Marc Mamin <M.Mamin@intershop.de>)
List pgsql-general
On 04/19/2014 06:26 AM, Ron Pasch wrote:
>  > - It should be possible to search for products and provide properties
>  > that the product SHOULD have, not must have.
>>
>> I don't understand this.  Say you have a sprocket in red and green. Do you want to search for:
>>
>> select * from product where name = 'sprocket' and (color = 'red' or color = 'green')
>>
>> Or do you want something else?  Does the user say they'd "prefer" blue, but will take whatever else you have?
>>
>> Do you search for some properties exactly and some "preferred"?
>>
>> Perhaps you could describe a little more how you want to query the database?  Or, maybe, what your user's are
searchingfor? 
>>
>> -Andy
>>
>>
>
> Yes, the user can prefer certain properties and the products that match most of the properties should be in the top
ofthe results, but if a product doesn't match all of them but just some of them, they should still be returned, but
lowerin the results. 
>
> I'm seriously wondering if doing this solely with postgres is even possible without having long execution times. I've
donesome tests with 5 million records and just doing the "or" construction you mentioned above, which resulted in 600
to900 ms queries and returning only those records of which all properties match at least one selected value. 
>
> I was thinking that perhaps using a search engine like lucene or sphinx would be more appropriate, but then I wonder
whatI would exactly be indexing and how I would be querying that, but that's a question for a different mailing list
;-)

Please keep the list cc'd, so others can help as well.

Yeah, doing a bunch of or's is gonna have to test all 5 million products.

I wonder if there is a way we can treat this like a two step process.

1) cut down the number of products

2) sort them by #matches, popularity, etc

You've talked about #2, but how about #1.  Is there any way to either include or exclude a product?  Users don't just
askfor red, they ask for "tires (maybe red)".  Not all 5 million products are tires, right? 

-Andy



pgsql-general by date:

Previous
From: Dorian Hoxha
Date:
Subject: Re: Could use some advice on search architecture
Next
From: Marc Mamin
Date:
Subject: Re: Could use some advice on search architecture