Thread: Insert performance
Dear all, After many tests and doc reading, i finally try to get help from you... Here is my problem. With some heavy insert into a simple BD (one table, no indexes) i can't get better perf than 8000 inserts/sec. I'm testing it using a simple C software which use libpq and which use: - Insert prepared statement (to avoid too many request parsing on the server) - transaction of 100000 inserts My server which has the following config: - 3G RAM - Pentium D - 64 bits, 3Ghz - database data on hardware raid 0 disks - x_log (WAL logs) on an other single hard drive The server only use 30% of the CPU, 10% of disk access and not much RAM... So i'm wondering where could be the bottle neck and why i can't get better performance ? I really need to use inserts and i can't change it to use COPY... Any advice is welcome. Sorry in advance for my bad understanding of database ! Thanks in advance. Regards, Joël.W
hatman wrote: > Dear all, > > After many tests and doc reading, i finally try to get help from > you... > > Here is my problem. With some heavy insert into a simple BD (one > table, no indexes) i can't get better perf than 8000 inserts/sec. I'm > testing it using a simple C software which use libpq and which use: > - Insert prepared statement (to avoid too many request parsing on the > server) > - transaction of 100000 inserts Are each of the INSERTs in their own transaction? If so, you'll be limited by the speed of the disk the WAL is running on. That means you have two main options: 1. Have multiple connections inserting simultaneously. 2. Batch your inserts together, from 10 to 10,000 per transaction. Are either of those possible? -- Richard Huxton Archonet Ltd
joël Winteregg wrote: > Hi Richard, > >>> Here is my problem. With some heavy insert into a simple BD (one >>> table, no indexes) i can't get better perf than 8000 inserts/sec. I'm >>> testing it using a simple C software which use libpq and which use: >>> - Insert prepared statement (to avoid too many request parsing on the >>> server) >>> - transaction of 100000 inserts >> Are each of the INSERTs in their own transaction? >> > > No, as said above transactions are made of 100000 inserts... Hmm - I read that as just meaning "inserted 100000 rows". You might find that smaller batches provide peak performance. >> If so, you'll be limited by the speed of the disk the WAL is running on. >> >> That means you have two main options: >> 1. Have multiple connections inserting simultaneously. > > Yes, you're right. That what i have been testing and what provide the > best performance ! I saw that postgresql frontend was using a lot of CPU > and not both of them (i'm using a pentium D, dual core). To the opposit, > the postmaster process use not much resources. Using several client, > both CPU are used and i saw an increase of performance (about 18000 > inserts/sec). > > So i think my bottle neck is more the CPU speed than the disk speed, > what do you think ? Well, I think it's fair to say it's not disk. Let's see - the original figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds plausible to me for a round-trip to process a simple command - are you running the client app on the same machine, or is it over the network? Two other things to bear in mind: 1. If you're running 8.2 you can have multiple sets of values in an INSERT http://www.postgresql.org/docs/8.2/static/sql-insert.html 2. You can do a COPY from libpq - is it really not possible? -- Richard Huxton Archonet Ltd
joël Winteregg wrote: > >>> No, as said above transactions are made of 100000 inserts... >> Hmm - I read that as just meaning "inserted 100000 rows". You might find >> that smaller batches provide peak performance. > > Ahh ok ;-) sorry for my bad english... (yeah, i have been testing > several transaction size 10000, 20000 and 100000) Not your bad English, my poor reading :-) >>>> If so, you'll be limited by the speed of the disk the WAL is running on. >>>> >>>> That means you have two main options: >>>> 1. Have multiple connections inserting simultaneously. >>> Yes, you're right. That what i have been testing and what provide the >>> best performance ! I saw that postgresql frontend was using a lot of CPU >>> and not both of them (i'm using a pentium D, dual core). To the opposit, >>> the postmaster process use not much resources. Using several client, >>> both CPU are used and i saw an increase of performance (about 18000 >>> inserts/sec). >>> >>> So i think my bottle neck is more the CPU speed than the disk speed, >>> what do you think ? >> Well, I think it's fair to say it's not disk. Let's see - the original >> figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds >> plausible to me for a round-trip to process a simple command - are you >> running the client app on the same machine, or is it over the network? > > I did both test. On the local machine (using UNIX sockets) i can reach > 18000 insert/sec with 10 clients and prepared statements. The same test > using clients on the remote machine provide me 13000 inserts/sec. OK, so we know what the overhead for network connections is. > Now, with multiple client (multi-threaded inserts) my both CPU are quite > well used (both arround 90%) so i maybe think that disk speeds are now > my bottleneck. What do you think ? or maybe i will need a better CPU ? > >> Two other things to bear in mind: >> 1. If you're running 8.2 you can have multiple sets of values in an INSERT >> http://www.postgresql.org/docs/8.2/static/sql-insert.html > > Yeah, i'm running the 8.2.3 version ! i didn't know about multiple > inserts sets ! Thanks for the tip ;-) Ah-ha! Give it a go, it's designed for this sort of situation. Not sure it'll manage thousands of value clauses, but working up from 10 perhaps. I've not tested it for performance, so I'd be interesting in knowing how it compares to your other results. >> 2. You can do a COPY from libpq - is it really not possible? >> > > Not really but i have been testing it and inserts are flying (about > 100000 inserts/sec) !! What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? -- Richard Huxton Archonet Ltd
* Richard Huxton <dev@archonet.com> [070306 12:22]: > >>2. You can do a COPY from libpq - is it really not possible? > >> > >Not really but i have been testing it and inserts are flying (about > >100000 inserts/sec) !! > > What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? Well, there are some issues. First your client needs to support it. E.g. psycopg2 supports only some specific CSV formatting in it's methods. (plus I had sometimes random psycopg2 crashes, but guarding against these is cheap compared to the speedup from COPY versus INSERT) Plus you need to be sure that your data will apply cleanly (which in my app was not the case), or you need to code a fallback that localizes the row that doesn't work. And the worst thing is, that it ignores RULES on the tables, which sucks if you use them ;) (e.g. table partitioning). Andreas
Andreas Kostyrka wrote: > * Richard Huxton <dev@archonet.com> [070306 12:22]: >>>> 2. You can do a COPY from libpq - is it really not possible? >>>> >>> Not really but i have been testing it and inserts are flying (about >>> 100000 inserts/sec) !! >> What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? > Well, there are some issues. First your client needs to support it. > E.g. psycopg2 supports only some specific CSV formatting in it's > methods. (plus I had sometimes random psycopg2 crashes, but guarding against > these is cheap compared to the speedup from COPY versus INSERT) > Plus you need to be sure that your data will apply cleanly (which in > my app was not the case), or you need to code a fallback that > localizes the row that doesn't work. > > And the worst thing is, that it ignores RULES on the tables, which > sucks if you use them ;) (e.g. table partitioning). Ah, but two things deal with these issues: 1. Joel is using libpq 2. COPY into a holding table, tidy data and INSERT ... SELECT -- Richard Huxton Archonet Ltd
* Richard Huxton <dev@archonet.com> [070306 13:47]: > Andreas Kostyrka wrote: > >* Richard Huxton <dev@archonet.com> [070306 12:22]: > >>>>2. You can do a COPY from libpq - is it really not possible? > >>>> > >>>Not really but i have been testing it and inserts are flying (about > >>>100000 inserts/sec) !! > >>What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? > >Well, there are some issues. First your client needs to support it. > >E.g. psycopg2 supports only some specific CSV formatting in it's > >methods. (plus I had sometimes random psycopg2 crashes, but guarding against > >these is cheap compared to the speedup from COPY versus INSERT) > >Plus you need to be sure that your data will apply cleanly (which in > >my app was not the case), or you need to code a fallback that > >localizes the row that doesn't work. > >And the worst thing is, that it ignores RULES on the tables, which > >sucks if you use them ;) (e.g. table partitioning). > > Ah, but two things deal with these issues: > 1. Joel is using libpq > 2. COPY into a holding table, tidy data and INSERT ... SELECT Clearly COPY is the way for bulk loading data, BUT you asked, so I wanted to point out some problems and brittle points with COPY. (and the copy into the holding table doesn't solve completly the problem with the dirty inconsistent data) Andreas
>>> 1. If you're running 8.2 you can have multiple sets of values in an >>> INSERT >>> http://www.postgresql.org/docs/8.2/static/sql-insert.html >> >> >> Yeah, i'm running the 8.2.3 version ! i didn't know about multiple >> inserts sets ! Thanks for the tip ;-) > No kidding --- thanks for the tip from me as well !!! I didn't know this was possible (though I read in the docs that it is ANSI SQL standard), and I'm also having a similar situation. Two related questions: 1) What about atomicity? Is it strictly equivalent to having multiple insert statements inside a transaction? (I assume it should be) 2) What about the issue with excessive locking for foreign keys when inside a transaction? Has that issue disappeared in 8.2? And if not, would it affect similarly in the case of multiple-row inserts? In case you have no clue what I'm referring to: Say that we have a table A, with one foreign key constraint to table B --- last time I checked, there was an issue that whenever inserting or updating table A (inside a transacion), postgres sets an exclusive access lock on the referenced row on table B --- this is overkill, and the correct thing to do would be to set a read-only lock (so that no-one else can *modify or remove* the referenced row while the transaction has not been finished). This caused unnecessary deadlock situations --- even though no-one is modifying table B (which is enough to guarantee that concurrent transactions would be ok), a second transacion would fail to set the exclusive access lock, since someone already locked it. My solution was to sort the insert statements by the referenced value on table B. (I hope the above explanation clarifies what I'm trying to say) I wonder if I should still do the same if I go with a multiple-row insert instead of multiple insert statements inside a transaction. Thanks, Carlos --
I only know to answer your no. 2: > 2) What about the issue with excessive locking for foreign keys when > inside a transaction? Has that issue disappeared in 8.2? And if not, > would it affect similarly in the case of multiple-row inserts? The exclusive lock is gone already starting with 8.0 IIRC, a non-exclusive lock on the parent row is used instead. Thing is that this is still too strong ;-) The proper lock would be one which only prevents modification of the parent key, other updates would be safe on the same row. In any case, the current behavior is much better than what was before. Cheers, Csaba.
Csaba Nagy wrote: >I only know to answer your no. 2: > > >>2) What about the issue with excessive locking for foreign keys when >>inside a transaction? Has that issue disappeared in 8.2? And if not, >>would it affect similarly in the case of multiple-row inserts? >> >> > >The exclusive lock is gone already starting with 8.0 IIRC, a >non-exclusive lock on the parent row is used instead. Thing is that this >is still too strong ;-) > >The proper lock would be one which only prevents modification of the >parent key, other updates would be safe on the same row. > >In any case, the current behavior is much better than what was before. > > *Much* better, I would say --- though you're still correct in that it is still not the right thing to do. In particular, with the previous approach. there was a serious performance hit when concurrent transactions reference the same keys --- that is, after having taken measures to avoid deadlocks, some transactions would have to *wait* (for no good reason) until the other transaction is completed and the exclusive-access lock is released. For high-traffic databases this can be a quite severe performance hit. I'm glad it has been fixed, even if only partially. Thanks, Carlos --
Hi Andreas, Thanks for the info about COPY !! On Mar 6, 1:23 pm, andr...@kostyrka.org (Andreas Kostyrka) wrote: > * Richard Huxton <d...@archonet.com> [070306 12:22]:> >>2. You can do a COPY from libpq - is it really not possible? > > > >Not really but i have been testing it and inserts are flying (about > > >100000 inserts/sec) !! > > > What's the problem with the COPY? Could you COPY into one table then insert from that to your target table? > > Well, there are some issues. First your client needs to support it. > E.g. psycopg2 supports only some specific CSV formatting in it's > methods. (plus I had sometimes random psycopg2 crashes, but guarding against > these is cheap compared to the speedup from COPY versus INSERT) > Plus you need to be sure that your data will apply cleanly (which in > my app was not the case), or you need to code a fallback that > localizes the row that doesn't work. > > And the worst thing is, that it ignores RULES on the tables, which > sucks if you use them ;) (e.g. table partitioning). Ok, but what about constraints (foreign keys and SERIAL id) using a copy statement ? do we need to handle auto-generated id (SERIAL) manually ? Thanks for your feedback. Regards, Joël
Hi Richard, > > >>> No, as said above transactions are made of 100000 inserts... > >> Hmm - I read that as just meaning "inserted 100000 rows". You might find > >> that smaller batches provide peak performance. > > > Ahh ok ;-) sorry for my bad english... (yeah, i have been testing > > several transaction size 10000, 20000 and 100000) > > Not your bad English, my poor reading :-) > > > > >>>> If so, you'll be limited by the speed of the disk the WAL is running on. > > >>>> That means you have two main options: > >>>> 1. Have multiple connections inserting simultaneously. > >>> Yes, you're right. That what i have been testing and what provide the > >>> best performance ! I saw that postgresql frontend was using a lot of CPU > >>> and not both of them (i'm using a pentium D, dual core). To the opposit, > >>> the postmaster process use not much resources. Using several client, > >>> both CPU are used and i saw an increase of performance (about 18000 > >>> inserts/sec). > > >>> So i think my bottle neck is more the CPU speed than the disk speed, > >>> what do you think ? > >> Well, I think it's fair to say it's not disk. Let's see - the original > >> figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds > >> plausible to me for a round-trip to process a simple command - are you > >> running the client app on the same machine, or is it over the network? > > > I did both test. On the local machine (using UNIX sockets) i can reach > > 18000 insert/sec with 10 clients and prepared statements. The same test > > using clients on the remote machine provide me 13000 inserts/sec. > > OK, so we know what the overhead for network connections is. > > > Now, with multiple client (multi-threaded inserts) my both CPU are quite > > well used (both arround 90%) so i maybe think that disk speeds are now > > my bottleneck. What do you think ? or maybe i will need a better CPU ? > > >> Two other things to bear in mind: > >> 1. If you're running 8.2 you can have multiple sets of values in an INSERT > >>http://www.postgresql.org/docs/8.2/static/sql-insert.html > > > Yeah, i'm running the 8.2.3 version ! i didn't know about multiple > > inserts sets ! Thanks for the tip ;-) > > Ah-ha! Give it a go, it's designed for this sort of situation. Not sure > it'll manage thousands of value clauses, but working up from 10 perhaps. > I've not tested it for performance, so I'd be interesting in knowing how > it compares to your other results. Yeah, as soon as possible i will give it a try ! Thanks for the feedback ;-) > > >> 2. You can do a COPY from libpq - is it really not possible? > > > Not really but i have been testing it and inserts are flying (about > > 100000 inserts/sec) !! > > What's the problem with the COPY? Could you COPY into one table then > insert from that to your target table? The main problem comes from our "real time" needs. We are getting information as a data flow from several application and we need to store them in the DB without buffering them too much... I have been testing the COPY using several statement (i mean using copy to add only a few rows to a specific table and then using it on an other table to add a few rows, etc...) and the perf are as bad as an insert ! COPY seems to be designed to add many many rows to the same table and not a few rows to several tables... So that's my main problem. Regards, Joël
Hi and thanks for your quick answer :-) > > > >>> Here is my problem. With some heavy insert into a simple BD (one > >>> table, no indexes) i can't get better perf than 8000 inserts/sec. I'm > >>> testing it using a simple C software which use libpq and which use: > >>> - Insert prepared statement (to avoid too many request parsing on the > >>> server) > >>> - transaction of 100000 inserts > >> Are each of the INSERTs in their own transaction? > >> > > > > No, as said above transactions are made of 100000 inserts... > > Hmm - I read that as just meaning "inserted 100000 rows". You might find > that smaller batches provide peak performance. > Ahh ok ;-) sorry for my bad english... (yeah, i have been testing several transaction size 10000, 20000 and 100000) > >> If so, you'll be limited by the speed of the disk the WAL is running on. > >> > >> That means you have two main options: > >> 1. Have multiple connections inserting simultaneously. > > > > Yes, you're right. That what i have been testing and what provide the > > best performance ! I saw that postgresql frontend was using a lot of CPU > > and not both of them (i'm using a pentium D, dual core). To the opposit, > > the postmaster process use not much resources. Using several client, > > both CPU are used and i saw an increase of performance (about 18000 > > inserts/sec). > > > > So i think my bottle neck is more the CPU speed than the disk speed, > > what do you think ? > > Well, I think it's fair to say it's not disk. Let's see - the original > figure was 8000 inserts/sec, which is 0.125ms per insert. That sounds > plausible to me for a round-trip to process a simple command - are you > running the client app on the same machine, or is it over the network? I did both test. On the local machine (using UNIX sockets) i can reach 18000 insert/sec with 10 clients and prepared statements. The same test using clients on the remote machine provide me 13000 inserts/sec. Now, with multiple client (multi-threaded inserts) my both CPU are quite well used (both arround 90%) so i maybe think that disk speeds are now my bottleneck. What do you think ? or maybe i will need a better CPU ? > > Two other things to bear in mind: > 1. If you're running 8.2 you can have multiple sets of values in an INSERT > http://www.postgresql.org/docs/8.2/static/sql-insert.html > Yeah, i'm running the 8.2.3 version ! i didn't know about multiple inserts sets ! Thanks for the tip ;-) > 2. You can do a COPY from libpq - is it really not possible? > Not really but i have been testing it and inserts are flying (about 100000 inserts/sec) !!
Hi Richard, > > > > Here is my problem. With some heavy insert into a simple BD (one > > table, no indexes) i can't get better perf than 8000 inserts/sec. I'm > > testing it using a simple C software which use libpq and which use: > > - Insert prepared statement (to avoid too many request parsing on the > > server) > > - transaction of 100000 inserts > > Are each of the INSERTs in their own transaction? > No, as said above transactions are made of 100000 inserts... > If so, you'll be limited by the speed of the disk the WAL is running on. > > That means you have two main options: > 1. Have multiple connections inserting simultaneously. Yes, you're right. That what i have been testing and what provide the best performance ! I saw that postgresql frontend was using a lot of CPU and not both of them (i'm using a pentium D, dual core). To the opposit, the postmaster process use not much resources. Using several client, both CPU are used and i saw an increase of performance (about 18000 inserts/sec). So i think my bottle neck is more the CPU speed than the disk speed, what do you think ? I use 2 disks (raid 0) for the data and a single disk for pg_xlog. > 2. Batch your inserts together, from 10 to 10,000 per transaction. > Yes, that's what i'm doing. Thanks a lot for the advices ! regards, Joël