Re: to pickle or not to pickle - Mailing list pgsql-general

From Lincoln Yeoh
Subject Re: to pickle or not to pickle
Date
Msg-id 3.0.5.32.20000605162841.00864100@pop.mecomb.po.my
Whole thread Raw
In response to to pickle or not to pickle  (Marc Tardif <intmktg@CAM.ORG>)
Responses Re: to pickle or not to pickle  (Richard Moon <richard@dcs.co.uk>)
List pgsql-general
At 11:56 AM 31-05-2000 -0400, Marc Tardif wrote:
>I'm writing a search engine using python and postgresql which requires to
>store a temporary list of results in an sql table for each request. This
>list will contain at least 50 records and could grow to about 300. My
>options are either to pickle the list and store a single entry or use the
>postgresql COPY command (as opposed to INSERT which would be too slow) to
>store each of the temporary records.

Are you trying to do:

"showing 20 results" click next/previous for next/previous 20.

Whatever it is, I don't think you should use COPY.

The way I did it was to just do the query again, and only display the
relevant results, using offset and window values.

Not as efficient, but:
1) I wanted to know how many rows there were- so if I used SELECT .. LIMIT,
I'd have to do a SELECT count first, but AFAIK, Postgresql has not special
optimizations for SELECT count (not even sure if other databases would be
faster for _my_ SELECT count).

2) I didn't want to deal with cleaning up the cache/pickles... My app was
web based, so I don't know when the users have left. Say I expire the
cache/pickles after 15 minutes. If I have 100 searches per minute, I'd end
up having 1500 pickles at a time 8*). Not really a big problem nowadays,
but I didn't think it was worth dealing with.

3) It wasn't really a search engine- different results for different users,
different ways of sorting stuff etc.

But if your search engine returns the same result given the same query no
matter who the user is, the cache thing could be good. May mean a redesign-
have a cache table storing queries and results (and expiry). You will
probably require regular vacuuming, since the cache table will be changing
quite often.

e.g. each row:
query string, result1,result2, sequence, total results, expiry time.

By storing the total results you can use Postgresql's LIMIT feature more
intelligently. You can probably afford to waste the 4 bytes per row, and
keep everything in one table for speed.

Cheerio,

Link.


pgsql-general by date:

Previous
From: Lincoln Yeoh
Date:
Subject: Re: query optimiser changes 6.5->7.0
Next
From: Marcin Inkielman
Date:
Subject: Re: index problem