Re: combined indexes with Gist - planner issues? - Mailing list pgsql-hackers

From Hans-Juergen Schoenig -- PostgreSQL
Subject Re: combined indexes with Gist - planner issues?
Date
Msg-id 4A9BE0D5.7060106@cybertec.at
Whole thread Raw
In response to Re: combined indexes with Gist - planner issues?  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: combined indexes with Gist - planner issues?  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
List pgsql-hackers
Martijn van Oosterhout wrote:
> On Mon, Aug 31, 2009 at 04:06:22PM +0200, Hans-Juergen Schoenig -- PostgreSQL wrote:
>   
>> ok, i thought it would be something gist specific i was not aware of.
>> the golden question now is: i am looking for the cheapest products given  
>> a certain text in an insane amount of data.
>> how to do it? other quals which could narrow down the amount of data  
>> would not help.
>>
>> i cannot see an option with regular "weapons" ...
>> maybe you can an idea how to fix core to make it work? maybe there is a  
>> mechanism we could need.
>> we really have to make this work - no matter what it takes.
>> we are willing to put effort into that.
>>     
>
> The way I usually attack such a problem is to think of a data
> structure+algorithm that could produce the output you want. Once you've
> got that it's usually clear how you can make postgres do it and what
> changes would need to be made.
>
> At first glance I don't see any nice data structure specific for your
> problem. But it occurs to me that maybe you could just have a (btree)
> index on the price and just scan in asceding order until you have
> enough records. Expensive if the first record is expensive.
>
> Another possibility is to change your query to use the price in the
> GiST index: execute multiple queries of the form:
>
> ... AND display_price >= 0.01 and display_price < 1;
> ... AND display_price >= 1 and display_price < 10;
>
>   

hello ...

i had a similar idea here but the problem is: prices will pretty much 
depends on products.
to get to some critical example: "book" is a horribly frequent word and 
you will find just too many in a too narrow price range.
using a price index is alone is not a good idea. how many products which 
cost USD 9.95 do you know and how many of them are books? :(
i did some experiments which PL/proxy to scale out a little and i wrote 
some C code to explicitly cache data from the start and so on.
this is all shit, however - it is too much data and I have too many request.
i don't want to fallback to some java-based stuff such as solr. it would 
totally ruin my credibility and the stand postgres has at this customer.
whatever it takes - a PG based solution has to be found and implemented.

my knowledge of how gist works internally is not too extensive. any 
"kickstart" idea would be appreciated.
   many thanks,
      hans

-- 
Cybertec Schoenig & Schoenig GmbH
Reyergasse 9 / 2
A-2700 Wiener Neustadt
Web: www.postgresql-support.de



pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: combined indexes with Gist - planner issues?
Next
From: Andrew Dunstan
Date:
Subject: Re: Feature request : add REMAP_SCHEMA-like option to pg_restore