Home > mailing lists

Re: to pickle or not to pickle - Mailing list pgsql-general

From	Lincoln Yeoh
Subject	Re: to pickle or not to pickle
Date	June 5, 2000 07:29:10
Msg-id	3.0.5.32.20000605162841.00864100@pop.mecomb.po.my Whole thread Raw
In response to	to pickle or not to pickle (Marc Tardif <intmktg@CAM.ORG>)
Responses	Re: to pickle or not to pickle
List	pgsql-general

Tree view

At 11:56 AM 31-05-2000 -0400, Marc Tardif wrote:
>I'm writing a search engine using python and postgresql which requires to
>store a temporary list of results in an sql table for each request. This
>list will contain at least 50 records and could grow to about 300. My
>options are either to pickle the list and store a single entry or use the
>postgresql COPY command (as opposed to INSERT which would be too slow) to
>store each of the temporary records.

Are you trying to do:

"showing 20 results" click next/previous for next/previous 20.

Whatever it is, I don't think you should use COPY.

The way I did it was to just do the query again, and only display the
relevant results, using offset and window values.

Not as efficient, but:
1) I wanted to know how many rows there were- so if I used SELECT .. LIMIT,
I'd have to do a SELECT count first, but AFAIK, Postgresql has not special
optimizations for SELECT count (not even sure if other databases would be
faster for _my_ SELECT count).

2) I didn't want to deal with cleaning up the cache/pickles... My app was
web based, so I don't know when the users have left. Say I expire the
cache/pickles after 15 minutes. If I have 100 searches per minute, I'd end
up having 1500 pickles at a time 8*). Not really a big problem nowadays,
but I didn't think it was worth dealing with.

3) It wasn't really a search engine- different results for different users,
different ways of sorting stuff etc.

But if your search engine returns the same result given the same query no
matter who the user is, the cache thing could be good. May mean a redesign-
have a cache table storing queries and results (and expiry). You will
probably require regular vacuuming, since the cache table will be changing
quite often.

e.g. each row:
query string, result1,result2, sequence, total results, expiry time.

By storing the total results you can use Postgresql's LIMIT feature more
intelligently. You can probably afford to waste the 4 bytes per row, and
keep everything in one table for speed.

Cheerio,

Link.

pgsql-general by date:

From: Lincoln Yeoh
Date: 05 June 2000, 07:02:03
Subject: Re: query optimiser changes 6.5->7.0

From: Marcin Inkielman
Date: 05 June 2000, 07:53:51
Subject: Re: index problem

Re: to pickle or not to pickle - Mailing list pgsql-general

Previous

Next