Re: performance issue using DBI - Mailing list pgsql-general
From | Joshua b. Jore |
---|---|
Subject | Re: performance issue using DBI |
Date | |
Msg-id | Pine.BSO.4.44.0206060832210.2703-100000@kitten.greentechnologist.org Whole thread Raw |
In response to | performance issue using DBI ("Nicolas Nolst" <nnolst@hotmail.com>) |
Responses |
Re: performance issue using DBI
|
List | pgsql-general |
How much data is 20,000 lines? 20KB? 100MB? You might do well to just process this in memory and then just COPY the right data to the table. This gets away from doing the SELECT/INSERT/SELECT/INSERT thing which is going to be painful for batch jobs. See, the thing is that on every insert the indexes have to be updated. They won't be used until after you VACUUM the tables so it does no good inside your transaction. Drop the indexes. Also, leave off with your use of currval/nextval. The point of a serial type is that the column increments each time automagically. For what you are doing you could turn that serial into an integer, create your sequence separately, exec nextval('yourseq') and then just use the value you retrieved. Don't use currval since some other process might alter the sequence between the time you call nextval and currval. Just store the value. I think this problem is better solved on the perl side than on the PostgreSQL side. Consider using data structures like so. You can probably create a better structure since you know your data and I don't. sessions { remote_ip => { phone_type => { phone_number => session_id, phone_number => session_id }, phone_type => { phone_number => session_id, phone_number => session_id } }, remote_ip => { phone_type => { phone_number => session_id, phone_number => session_id }, phone_type => { phone_number => session_id, phone_number => session_id } } } actions - now session_id is the array offset. [ [ url, timestamp ], [ url, timestamp ], [ url, timestamp ] ], [ [ url, timestamp ], [ url, timestamp ], [ url, timestamp ] ], [ [ url, timestamp ], [ url, timestamp ], [ url, timestamp ] ], > If the session already exists I add a line in the table actions with a INSERT > > If the session doesn't exist or if the criteria is true, I add a line in the > table sessions with an INSERT and then add a line with a INSERT in the table actions (I use nextval and currval). > > I have put indexes on sessions(session_id), sessions(msisdn), > actions(session_id) and actions(timestamp). I process one log file of about 20000 lines every day. All the lines are > processed in one transaction (autocommit set to 0). > > My problem is that populating my database is slower when the data gets bigger > and the performance falls dramatically. I thought that is would be improve with > my indexes but the problem still persists. > > Could you please give me some clues that could solve this issue. > > Thanks. > > > > Nicolas Nolst > [belgium_gs.gif] > > ______________________________________________________________________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: Click Here > >
pgsql-general by date: