Home > mailing lists

Re: performance issue using DBI - Mailing list pgsql-general

From	Joshua b. Jore
Subject	Re: performance issue using DBI
Date	June 6, 2002 12:38:34
Msg-id	Pine.BSO.4.44.0206060832210.2703-100000@kitten.greentechnologist.org Whole thread Raw
In response to	performance issue using DBI ("Nicolas Nolst" <nnolst@hotmail.com>)
Responses	Re: performance issue using DBI
List	pgsql-general

Tree view

How much data is 20,000 lines? 20KB? 100MB? You might do well to just
process this in memory and then just COPY the right data to the table.
This gets away from doing the SELECT/INSERT/SELECT/INSERT thing which is
going to be painful for batch jobs.

See, the thing is that on every insert the indexes have to be updated.
They won't be used until after you VACUUM the tables so it does no good
inside your transaction. Drop the indexes.

Also, leave off with your use of currval/nextval. The point of a serial
type is that the column increments each time automagically. For what you
are doing you could turn that serial into an integer, create your sequence
separately, exec nextval('yourseq') and then just use the value you
retrieved. Don't use currval since some other process might alter the
sequence between the time you call nextval and currval. Just store the
value.

I think this problem is better solved on the perl side than on the
PostgreSQL side. Consider using data structures like so. You can probably
create a better structure since you know your data and I don't.

sessions
{ remote_ip => { phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               },
                 phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               }
               },
  remote_ip => { phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               },
                 phone_type => { phone_number => session_id,
                                 phone_number => session_id
                               }
               }
}

actions - now session_id is the array offset.
[ [ url, timestamp ],
  [ url, timestamp ],
  [ url, timestamp ] ],
[ [ url, timestamp ],
  [ url, timestamp ],
  [ url, timestamp ] ],
[ [ url, timestamp ],
  [ url, timestamp ],
  [ url, timestamp ] ],


> If the session already exists I add a line in the table actions with a INSERT
>
> If the session doesn't exist or if the criteria is true, I add a line in the
> table sessions with an INSERT and then add  a line with a INSERT in the table actions (I use nextval and currval).
>
> I have put indexes on sessions(session_id), sessions(msisdn),
> actions(session_id) and actions(timestamp). I process one log file of about 20000 lines every day. All the lines are
> processed in one transaction (autocommit set to 0).
>
> My problem is that populating my database is slower  when the data gets bigger
> and the performance falls dramatically. I thought that is would be improve with
> my indexes but the problem still persists.
>
> Could you please give me some clues that could solve this issue.
>
> Thanks.
>
>
>
> Nicolas Nolst
> [belgium_gs.gif]
>
>
______________________________________________________________________________________________________________________________
> MSN Photos is the easiest way to share and print your photos: Click Here
>
>

pgsql-general by date:

From: "Kasper (swebase)"
Date: 06 June 2002, 11:02:21
Subject: pg_shadow

From: Stephan Szabo
Date: 06 June 2002, 14:06:04
Subject: Re: pg_shadow

Re: performance issue using DBI - Mailing list pgsql-general

Previous

Next