Thread: performance issue using DBI
<div style="background-color:"><div></div><div></div>Hi all,<br /><br />I have developped a perl script to populate a databasewith two tables: sessions<br />and actions.<br /><br />the table actions contains the following columns: session_id,url, timestamp.<br />The column session_id references to the table sessions.<br /><br />the table sessions containsthe following columns: session_id, remote_ip,<br />phone_type, phone_number. The column session_id is serial.<br/><br />The lines of the table actions which are part of the same session have the same<br />session_id.<br /><br/>There are then more lines in the table actions than in the table session.<br /><br /><br />To fill the two tables,I first need to know if the session already exists for a<br />certain phone_type, a certain remote_ip and a certainphone_number:<br /><br />SELECT session_id FROM sessions WHERE (phone_number = ?) AND (remote_ip = ?)<br />AND (phone_type= ?) ORDER BY session_id;<br /><br />I also need to apply a criteria to know if I have to add a new entry in the<br/>table sessions or not:<br /><br />SELECT (max(timestamp) + ?)<? FROM actions WHERE (session_id = ?);<br /><br/><br /><br />If the session already exists I add a line in the table actions with a INSERT<br /><br />If the sessiondoesn't exist or if the criteria is true, I add a line in the<br />table sessions with an INSERT and then add a linewith a INSERT in the table actions (I use nextval and currval).<br /><br />I have put indexes on sessions(session_id),sessions(msisdn),<br />actions(session_id) and actions(timestamp). I process one log file of about 20000lines every day. All the lines are processed in one transaction (autocommit set to 0).<br /><br />My problem is thatpopulating my database is slower when the data gets bigger<br />and the performance falls dramatically. I thought thatis would be improve with<br />my indexes but the problem still persists.<br /><br />Could you please give me some cluesthat could solve this issue.<br /><br />Thanks.<br /><br /><br /><br />Nicolas Nolst <div></div><div></div><img src="http://www.3dflags/World/Gif/belgium_gs.gif"/><div></div></div><br clear="all" /><hr />MSN Photos is the easiest wayto share and print your photos: <a href="http://g.msn.com/1HM500901/157">Click Here</a><br />
How much data is 20,000 lines? 20KB? 100MB? You might do well to just process this in memory and then just COPY the right data to the table. This gets away from doing the SELECT/INSERT/SELECT/INSERT thing which is going to be painful for batch jobs. See, the thing is that on every insert the indexes have to be updated. They won't be used until after you VACUUM the tables so it does no good inside your transaction. Drop the indexes. Also, leave off with your use of currval/nextval. The point of a serial type is that the column increments each time automagically. For what you are doing you could turn that serial into an integer, create your sequence separately, exec nextval('yourseq') and then just use the value you retrieved. Don't use currval since some other process might alter the sequence between the time you call nextval and currval. Just store the value. I think this problem is better solved on the perl side than on the PostgreSQL side. Consider using data structures like so. You can probably create a better structure since you know your data and I don't. sessions { remote_ip => { phone_type => { phone_number => session_id, phone_number => session_id }, phone_type => { phone_number => session_id, phone_number => session_id } }, remote_ip => { phone_type => { phone_number => session_id, phone_number => session_id }, phone_type => { phone_number => session_id, phone_number => session_id } } } actions - now session_id is the array offset. [ [ url, timestamp ], [ url, timestamp ], [ url, timestamp ] ], [ [ url, timestamp ], [ url, timestamp ], [ url, timestamp ] ], [ [ url, timestamp ], [ url, timestamp ], [ url, timestamp ] ], > If the session already exists I add a line in the table actions with a INSERT > > If the session doesn't exist or if the criteria is true, I add a line in the > table sessions with an INSERT and then add a line with a INSERT in the table actions (I use nextval and currval). > > I have put indexes on sessions(session_id), sessions(msisdn), > actions(session_id) and actions(timestamp). I process one log file of about 20000 lines every day. All the lines are > processed in one transaction (autocommit set to 0). > > My problem is that populating my database is slower when the data gets bigger > and the performance falls dramatically. I thought that is would be improve with > my indexes but the problem still persists. > > Could you please give me some clues that could solve this issue. > > Thanks. > > > > Nicolas Nolst > [belgium_gs.gif] > > ______________________________________________________________________________________________________________________________ > MSN Photos is the easiest way to share and print your photos: Click Here > >
On Thu, 2002-06-06 at 14:55, Joshua b. Jore wrote:u > Don't use currval since some other process might alter the > sequence between the time you call nextval and currval. This is wrong. currval() will always return the last serial assigned in *the*same*session*. It is not affected by other users at all. -- Oliver Elphick Oliver.Elphick@lfix.co.uk Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C "Therefore being justified by faith, we have peace with God through our Lord Jesus Christ." Romans 5:1
Attachment
Oh ok. So I was thinking of the way that sequences cut across transactions and I misinterpreted it. Thanks for the clarification. Joshua b. Jore ; http://www.greentechnologist.org ; 1121 1233 1311 200 1201 1302 1211 200 1201 1303 200 1300 1233 1313 1211 1302 1212 1311 1230 200 1201 1303 200 1321 1233 1311 1302 200 1211 1232 1211 1231 1321 200 1310 1220 1221 1232 1223 1303 200 1321 1233 1311 200 1201 1302 1211 232 200 1112 1233 1310 1211 200 1013 1302 1211 1211 1232 201 22 On 6 Jun 2002, Oliver Elphick wrote: > On Thu, 2002-06-06 at 14:55, Joshua b. Jore wrote:u > > Don't use currval since some other process might alter the > > sequence between the time you call nextval and currval. > > This is wrong. currval() will always return the last serial assigned in > *the*same*session*. It is not affected by other users at all. > > -- > Oliver Elphick Oliver.Elphick@lfix.co.uk > Isle of Wight http://www.lfix.co.uk/oliver > GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839 932A 614D 4C34 3E1D 0C1C > > "Therefore being justified by faith, we have peace with > God through our Lord Jesus Christ." Romans 5:1 >
On 6 Jun 2002 at 17:14, Oliver Elphick wrote: > On Thu, 2002-06-06 at 14:55, Joshua b. Jore wrote:u > > Don't use currval since some other process might alter the > > sequence between the time you call nextval and currval. > > This is wrong. currval() will always return the last serial assigned in > *the*same*session*. It is not affected by other users at all. Folks, here's a short test which might help. Note the BEGIN. $ psql testing testing=# create sequence test; CREATE testing=# select currval('test'); ERROR: test.currval is not yet defined in this session testing=# select setval('test', 1); setval -------- 1 (1 row) testing=# select currval('test'); currval --------- 1 (1 row) testing=# select currval('test'); currval --------- 1 (1 row) testing=# select currval('test'); currval --------- 1 (1 row) testing=# select currval('test'); currval --------- 1 (1 row) testing=# Then, in another window, I did this: $ psql testing # select nextval('test'); nextval --------- 2 (1 row) testing=# select nextval('test'); nextval --------- 3 (1 row) testing=# select nextval('test'); nextval --------- 4 (1 row) testing=# Then back to the other window: testing=# select currval('test'); currval --------- 1 (1 row) testing=# select nextval('test'); nextval --------- 5 (1 row) testing=# cheers FWIW: I always use nextval when looking for a new ID. -- Dan Langille
i'm not a db expert but i think that a transaction with more than 20.000 operations cant be quick anyway.
I think you have to work in other ways, maybe reading one line at a time, doing the operations, committing, deleting the line from the file.
to remember all this transactions i think postgres must swap more then a 386 with 4MB Ram!
Bye
Giorgio
----- Original Message -----From: Nicolas NolstSent: Thursday, June 06, 2002 11:40 AMSubject: [ADMIN] performance issue using DBIHi all,
I have developped a perl script to populate a database with two tables: sessions
and actions.
the table actions contains the following columns: session_id, url, timestamp.
The column session_id references to the table sessions.
the table sessions contains the following columns: session_id, remote_ip,
phone_type, phone_number. The column session_id is serial.
The lines of the table actions which are part of the same session have the same
session_id.
There are then more lines in the table actions than in the table session.
To fill the two tables, I first need to know if the session already exists for a
certain phone_type, a certain remote_ip and a certain phone_number:
SELECT session_id FROM sessions WHERE (phone_number = ?) AND (remote_ip = ?)
AND (phone_type = ?) ORDER BY session_id;
I also need to apply a criteria to know if I have to add a new entry in the
table sessions or not:
SELECT (max(timestamp) + ?)<? FROM actions WHERE (session_id = ?);
If the session already exists I add a line in the table actions with a INSERT
If the session doesn't exist or if the criteria is true, I add a line in the
table sessions with an INSERT and then add a line with a INSERT in the table actions (I use nextval and currval).
I have put indexes on sessions(session_id), sessions(msisdn),
actions(session_id) and actions(timestamp). I process one log file of about 20000 lines every day. All the lines are processed in one transaction (autocommit set to 0).
My problem is that populating my database is slower when the data gets bigger
and the performance falls dramatically. I thought that is would be improve with
my indexes but the problem still persists.
Could you please give me some clues that could solve this issue.
Thanks.
Nicolas Nolst
MSN Photos is the easiest way to share and print your photos: Click Here