Re: to pickle or not to pickle - Mailing list pgsql-general

From Jurgen Defurne
Subject Re: to pickle or not to pickle
Date
Msg-id 3938A2A4.69CA09AC@glo.be
Whole thread Raw
In response to to pickle or not to pickle  (Marc Tardif <intmktg@CAM.ORG>)
List pgsql-general
Marc Tardif wrote:

> I'm writing a search engine using python and postgresql which requires to
> store a temporary list of results in an sql table for each request. This
> list will contain at least 50 records and could grow to about 300. My
> options are either to pickle the list and store a single entry or use the
> postgresql COPY command (as opposed to INSERT which would be too slow) to
> store each of the temporary records.
>

> You are writing a search engine : does that mean that you need to search
> the
> web and that you want to store your temporary results in a table, OR
> does that mean that you are writing a QUERY screen, from which you
> generate a SELECT statement to query your POSTGRES database ?
>
> Also what size are your tuples ?
>
> Do you need these temporary results within the same program, or do you
> need to pass them somewhere to another program ?
>

>
> Question is, how can I make an educated decision on which option to
> select? What kind of questions should I be asking myself? Should I
> actually go through the trouble of implementing both alternatives and
> profiling each seperately? If so, how can I predict what will happen under
> a heavy load which is hard to simulate when benchmarking each option?
>

Always go for a simple solution. This may (paradoxically) need some more
study. One of the first questions you should ask yourself, is it really
necessary to store this temporary result ? If so, then why take the pickle
option ? Pickling is meant for persistent data, which is really more a
mechanism
to store data between sessions. Maybe you should consider the option which
is used in traditional IT : just store your data in a sequential file. Much
less
overhead, because your OS handles it directly.

Concerning the benchmarking, it seems as if the only way to do this is to
automatically start scripts which do what needs to be done and then
measure what happens : nr of processes, CPU and IO-load.

Jurgen Defurne
defurnj@glo.be



pgsql-general by date:

Previous
From: Jurgen Defurne
Date:
Subject: Re: Operations widh CURSORS
Next
From: Tatsuo Ishii
Date:
Subject: Re: Saving MPEG video ???