Thread: PostgreSQL Write Performance
I am not sure whether I am doing the correct benchmarking way. I have the following table ; CREATE TABLE measurement_type ( measurement_type_id bigserial NOT NULL, measurement_type_name text NOT NULL, CONSTRAINT pk_measurement_type_id PRIMARY KEY (measurement_type_id), CONSTRAINT measurement_type_measurement_type_name_key UNIQUE (measurement_type_name) ) I make the following single write operation through pgAdmin : INSERT INTO measurement_type ( measurement_type_name ) VALUES('Width'); It takes 16ms to write a single row according to "Query Editor" (bottom right corner) Am I doing the correct way to benchmarking? I am not sure whether this is expected performance? For me, I am expecting thetime measurement is in nano seconds :p Thanks and Regards Yan Cheng CHEOK
On Mon, Jan 4, 2010 at 8:36 PM, Yan Cheng Cheok <yccheok@yahoo.com> wrote: > I am not sure whether I am doing the correct benchmarking way. > > I have the following table ; > > CREATE TABLE measurement_type > ( > measurement_type_id bigserial NOT NULL, > measurement_type_name text NOT NULL, > CONSTRAINT pk_measurement_type_id PRIMARY KEY (measurement_type_id), > CONSTRAINT measurement_type_measurement_type_name_key UNIQUE (measurement_type_name) > ) > > I make the following single write operation through pgAdmin : > > INSERT INTO measurement_type ( measurement_type_name ) > VALUES('Width'); > > It takes 16ms to write a single row according to "Query Editor" (bottom right corner) > > Am I doing the correct way to benchmarking? I am not sure whether this is expected performance? For me, I am expectingthe time measurement is in nano seconds :p It would be great if a hard drive could seek write, acknowledge the write and the OS could tell the db about it in nano seconds. However, some part of that chain would have to be lieing to do that. It takes at LEAST a full rotation of a hard drive to commit a single change to a database, usually more. Given that the fastest HDs are 15k RPM right now, you're looking at 250 revolutions per second, or 1/250th of a second minimum to commit a transaction. Now, the good news is that if you make a bunch of inserts in the same transaction a lot of them can get committed together to the disk at the same time, and the aggregate speed will be, per insert, much faster.
Instead of sending 1000++ INSERT statements in one shot, which will requires my application to keep track on the INSERT statement. Is it possible that I can tell PostgreSQL, "OK. I am sending you INSERT statement. But do not perform any actual right operation. Only perform actual write operationwhen the pending statement had reached 1000" Thanks and Regards Yan Cheng CHEOK --- On Tue, 1/5/10, Scott Marlowe <scott.marlowe@gmail.com> wrote: > From: Scott Marlowe <scott.marlowe@gmail.com> > Subject: Re: [GENERAL] PostgreSQL Write Performance > To: "Yan Cheng Cheok" <yccheok@yahoo.com> > Cc: pgsql-general@postgresql.org > Date: Tuesday, January 5, 2010, 11:45 AM > On Mon, Jan 4, 2010 at 8:36 PM, Yan > Cheng Cheok <yccheok@yahoo.com> > wrote: > > I am not sure whether I am doing the correct > benchmarking way. > > > > I have the following table ; > > > > CREATE TABLE measurement_type > > ( > > measurement_type_id bigserial NOT NULL, > > measurement_type_name text NOT NULL, > > CONSTRAINT pk_measurement_type_id PRIMARY KEY > (measurement_type_id), > > CONSTRAINT > measurement_type_measurement_type_name_key UNIQUE > (measurement_type_name) > > ) > > > > I make the following single write operation through > pgAdmin : > > > > INSERT INTO measurement_type ( measurement_type_name > ) > > VALUES('Width'); > > > > It takes 16ms to write a single row according to > "Query Editor" (bottom right corner) > > > > Am I doing the correct way to benchmarking? I am not > sure whether this is expected performance? For me, I am > expecting the time measurement is in nano seconds :p > > It would be great if a hard drive could seek write, > acknowledge the > write and the OS could tell the db about it in nano > seconds. However, > some part of that chain would have to be lieing to do > that. It takes > at LEAST a full rotation of a hard drive to commit a single > change to > a database, usually more. Given that the fastest HDs > are 15k RPM > right now, you're looking at 250 revolutions per second, or > 1/250th of > a second minimum to commit a transaction. > > Now, the good news is that if you make a bunch of inserts > in the same > transaction a lot of them can get committed together to the > disk at > the same time, and the aggregate speed will be, per insert, > much > faster. > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Yan Cheng Cheok wrote: > Instead of sending 1000++ INSERT statements in one shot, which will requires my application to keep track on the INSERTstatement. > > Is it possible that I can tell PostgreSQL, > > "OK. I am sending you INSERT statement. But do not perform any actual right operation. Only perform actual write operationwhen the pending statement had reached 1000" > You can turn off synchronous_commit to get something like that: http://www.postgresql.org/docs/current/static/runtime-config-wal.html This should make your single-record INSERT time drop dramatically. Note that you'll be introducing a possibility of some data loss from the latest insert(s) if the server crashes in this situation. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of Yan Cheng Cheok > Sent: Monday, January 04, 2010 9:05 PM > To: Scott Marlowe > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL Write Performance > > Instead of sending 1000++ INSERT statements in one shot, which will > requires my application to keep track on the INSERT statement. > > Is it possible that I can tell PostgreSQL, > > "OK. I am sending you INSERT statement. But do not perform any actual > right operation. Only perform actual write operation when the pending > statement had reached 1000" You might use the copy command instead of insert, which is far faster. If you want the fastest possible inserts, then probably copy is the way to go instead of insert. Here is copy command via API: http://www.postgresql.org/docs/current/static/libpq-copy.html Here is copy command via SQL: http://www.postgresql.org/docs/8.4/static/sql-copy.html You might (instead) use this sequence: 1. Begin transaction 2. Prepare 3. Insert 1000 times 4. Commit If the commit fails, you will have to redo the entire set of 1000 or otherwise handle the error. Or something along those lines. Of course, when information is not written to disk, what will happen on power failure? If you do something cheesy like turning fsync off to speed up inserts, then you will have trouble if you lose power. What is the actual problem you are trying to solve?
>> What is the actual problem you are trying to solve? I am currently developing a database system for a high speed measurement machine. The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement resultfor every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds,so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finishwriting, before performing measurement on next unit) Previously, we are using flat file.. However, using flat file is quite a mess, when come to generating reports to customers. Thanks and Regards Yan Cheng CHEOK --- On Tue, 1/5/10, Dann Corbit <DCorbit@connx.com> wrote: > From: Dann Corbit <DCorbit@connx.com> > Subject: Re: [GENERAL] PostgreSQL Write Performance > To: "Yan Cheng Cheok" <yccheok@yahoo.com> > Cc: pgsql-general@postgresql.org > Date: Tuesday, January 5, 2010, 2:03 PM > > -----Original Message----- > > From: pgsql-general-owner@postgresql.org > [mailto:pgsql-general- > > owner@postgresql.org] > On Behalf Of Yan Cheng Cheok > > Sent: Monday, January 04, 2010 9:05 PM > > To: Scott Marlowe > > Cc: pgsql-general@postgresql.org > > Subject: Re: [GENERAL] PostgreSQL Write Performance > > > > Instead of sending 1000++ INSERT statements in one > shot, which will > > requires my application to keep track on the INSERT > statement. > > > > Is it possible that I can tell PostgreSQL, > > > > "OK. I am sending you INSERT statement. But do not > perform any actual > > right operation. Only perform actual write operation > when the pending > > statement had reached 1000" > > You might use the copy command instead of insert, which is > far faster. > If you want the fastest possible inserts, then probably > copy is the way > to go instead of insert. > Here is copy command via API: > http://www.postgresql.org/docs/current/static/libpq-copy.html > Here is copy command via SQL: > http://www.postgresql.org/docs/8.4/static/sql-copy.html > > > You might (instead) use this sequence: > > 1. Begin transaction > 2. Prepare > 3. Insert 1000 times > 4. Commit > > If the commit fails, you will have to redo the entire set > of 1000 or > otherwise handle the error. > > Or something along those lines. Of course, when > information is not > written to disk, what will happen on power failure? > If you do something > cheesy like turning fsync off to speed up inserts, then you > will have > trouble if you lose power. > > What is the actual problem you are trying to solve? > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
> -----Original Message----- > From: Yan Cheng Cheok [mailto:yccheok@yahoo.com] > Sent: Monday, January 04, 2010 11:30 PM > To: Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL Write Performance > > >> What is the actual problem you are trying to solve? > > I am currently developing a database system for a high speed > measurement machine. > > The time taken to perform measurement per unit is in term of ~30 > milliseconds. We need to record down the measurement result for every > single unit. Hence, the time taken by record down the measurement > result shall be far more less than milliseconds, so that it will have > nearly 0 impact on the machine speed (If not, machine need to wait for > database to finish writing, before performing measurement on next unit) > > Previously, we are using flat file.. However, using flat file is quite > a mess, when come to generating reports to customers. Does the data volume build continuously so that the file becomes arbitrarily large, or can you archive data that is more than X days old? What is the format of a record? What is the format of the incoming data? Do you need indexes on these records? Is the stream of incoming data continuous around the clock, or are there periods when there is no incoming data?
On 5 Jan 2010, at 8:30, Yan Cheng Cheok wrote: >>> What is the actual problem you are trying to solve? > > I am currently developing a database system for a high speed measurement machine. > > The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement resultfor every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds,so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finishwriting, before performing measurement on next unit) > > Previously, we are using flat file.. However, using flat file is quite a mess, when come to generating reports to customers. That flat file can help you with clustering INSERTs together and also means you'll have all your measurements ready for asingle INSERT. The latter is useful if you're planning on using arrays to store your measurements, as updating array valuesrequires the entire array to be rewritten to the database. I don't know how your measurements would arrive withoutthe flat file, but I wouldn't be surprised if the measurements for a single unit would come out at different pointsin time, which would be a bit painful with arrays (not quite as much with a measurements table though). A safe approach (with minimal risk of data loss) would be to split your flat file every n units (earlier in this thread anumber of n=1000 was mentioned) and store that data using COPY in the format COPY expects. You will probably also want tokeep a queue-table (which is just a normal table, but it's used like a queue) with the names of the flat files that needprocessing. I haven't done this kind of thing before, but I envision it something like this: CREATE TABLE unit ( id bigserial NOT NULL, date date NOT NULL DEFAULT CURRENT_DATE, measured text[], measurements numeric(4,3)[] ); CREATE TABLE queue ( file text NOT NULL, definitive boolean DEFAULT False ); ---file-2010-01-05-00000001--- /* Update in it's own transaction so that we know we tried to process this file * even if the transaction rolls back. */ UPDATE queue SET definitive = True WHERE file = 'file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00000001'; /* Start work */ BEGIN; COPY unit FROM STDIN; 1 {Width,Height} {0.001,0.021} 2 {Width,Height} {0.002,0.019} ... 999 {Width,Height} {0.000,0.018} \. /* This file was processed and can be removed from the queue */ DELETE FROM queue WHERE file='file-2010-01-05-00000001'; COMMIT; /* This will probably be the name of the next flat file, but we don't know that * for sure yet. It needs to be outside the transaction as otherwise CURRENT_DATE * will have the date of the start of the transaction and we need to know what the * next batch will be regardless of whether this one succeeded. */ INSERT INTO queue (file, definitive) VALUES ('file-' || to_char(CURRENT_DATE, 'YYYY-MM-DD') || '-00001000', False); ---end of file--- You'd need a little program (a script will probably work) to read that queue table and send the commands in those files tothe database. Don't forget that at the start the queue table will be empty ;) I recall some of this lists' members wroteup a webpage about how to implement queue-tables reliably. Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4b431caa10731320433375!
On 5/01/2010 3:30 PM, Yan Cheng Cheok wrote: >>> What is the actual problem you are trying to solve? > > I am currently developing a database system for a high speed measurement machine. > > The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement resultfor every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds,so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finishwriting, before performing measurement on next unit) The commit_delay and synchronous_commit pararmeters may help you if you want to do each insert as a separate transaction. Note that with these parameters there's some risk of very recently committed data being lost if the server OS crashes or the server hardware is powered off/power-cycled unexpectedly. PostgreSQL its self crashing shouldn't cause loss of the committed data, though. Alternately, you can accumulate small batches of measurements in your app and do multi-valued INSERTs once you have a few (say 10) collected up. You'd have to be prepared to lose those if the app crashed though. Another option is to continue using your flat file, and have a "reader" process tailing the flat file and inserting new records into the database as they become available in the flat file. The reader could batch inserts intelligently, keep a record on disk of its progress, rotate the flat file periodically, etc. -- Craig Ringer
> You might use the copy command instead of insert, which is far faster. > If you want the fastest possible inserts, then probably copy is the way > to go instead of insert. > Here is copy command via API: > http://www.postgresql.org/docs/current/static/libpq-copy.html > Here is copy command via SQL: > http://www.postgresql.org/docs/8.4/static/sql-copy.html > Is there a command like COPY which will insert the data but skip all triggers and optionally integrity checks.
Tim Uckun wrote: > Is there a command like COPY which will insert the data but skip all > triggers and optionally integrity checks. > Nope, skipping integrity checks is MySQL talk. When doing a bulk loading job, it may make sense to drop constraints and triggers though; there's more notes on this and related techniques at http://www.postgresql.org/docs/current/interactive/populate.html Another thing you can do is defer your constraints: http://www.postgresql.org/docs/current/static/sql-set-constraints.html so that they execute in a more efficient block. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Hi, On Tuesday 05 January 2010 04:36:10 Yan Cheng Cheok wrote: > I make the following single write operation through pgAdmin : ... > It takes 16ms to write a single row according to "Query Editor" (bottom > right corner) In my experience the times presented by pgadmin vary wildly and seldomly do represent an accurate timing. Andres
On 6/01/2010 6:21 AM, Tim Uckun wrote: >> You might use the copy command instead of insert, which is far faster. >> If you want the fastest possible inserts, then probably copy is the way >> to go instead of insert. >> Here is copy command via API: >> http://www.postgresql.org/docs/current/static/libpq-copy.html >> Here is copy command via SQL: >> http://www.postgresql.org/docs/8.4/static/sql-copy.html >> > > Is there a command like COPY which will insert the data but skip all > triggers and optionally integrity checks. No. If you don't want triggers and integrity checks to fire, don't define them in the first place. Technically you *can* disable triggers, including RI checks, but it's VERY unwise and almost completely defeats the purpose of having the checks. In most such situations you're much better off dropping the constraints then adding them again at the end of the load. -- Craig Ringer
> > Technically you *can* disable triggers, including RI checks, but it's VERY > unwise and almost completely defeats the purpose of having the checks. In > most such situations you're much better off dropping the constraints then > adding them again at the end of the load. I know that the SQL server bulk loader defaults to not firing the triggers and I was hoping there would be an option on the COPY command to accomplish the same thing. pg_dump has a --disable-triggers option too. It seems to me that the COPY FROM should have an option that bypasses the triggers as a convience. Both the SQL server team and the postgres team have obviously recognized that there will be situations where the DBA will want to bulk load data without firing off a set of triggers for every insert. It doesn't seem like an outrageous expectation that the COPY command or something similar should have that option.
On Jan 5, 2010, at 3:46 PM, Tim Uckun wrote: > pg_dump has a --disable-triggers option too. [...] > It doesn't seem like an outrageous expectation that the COPY command > or something similar should have that option. Well, whether an expectation is "outrageous" or not is a matter of viewpoint. The principle is that pg_dump and COPY have fundamentally different use cases. pg_dump is intended to restore a backup of data that was already in the database, and presumably was already validated by the appropriate constraints and triggers. COPY is used to create new records in a database, from arbitrary data, which may not be valid based on the database's vision of data consistency. -- -- Christophe Pettus xof@thebuild.com
Tim Uckun wrote: >> Technically you *can* disable triggers, including RI checks, but it's VERY >> unwise and almost completely defeats the purpose of having the checks. In >> most such situations you're much better off dropping the constraints then >> adding them again at the end of the load. > > > I know that the SQL server bulk loader defaults to not firing the > triggers and I was hoping there would be an option on the COPY command > to accomplish the same thing. > > pg_dump has a --disable-triggers option too. pg_restore? Yes, it does. However, it knows that the data you're loading came from a PostgreSQL database where those triggers have already fired when the data was originally inserted. It can trust that the data is OK. I don't personally think that COPY should make it easy to disable triggers. You can do it if you want to (see the manual for how) but in almost all cases its much wiser to drop triggers and constraints instead. Rather than disabling RI constraints, it'd be desirable for COPY to have an option that *deferred* RI constraint checking, then re-checked the constraints for all rows at once and rolled back the COPY if any failed. That'd be a lot faster for many uses, but (importantly) would preserve referential integrity. While COPY doesn't offer an easy way to do that, you can emulate the behavior by: - Beginning a transaction - Disabling RI constraints - Running COPY - Re-enabling RI constraints - Re-checking the RI constraints and raising an exception to abort the transaction if the checks fail. Unfortunately you can't (yet) do this with deferrable RI constraints, because Pg isn't clever enough to notice when large numbers of individual checks have built up and merge them into a single re-check query that verifies the whole constraint in one pass. So you have to do that yourself. > It seems to me that the COPY FROM should have an option that bypasses > the triggers as a convience. I, for one, would loudly and firmly resist the addition of such a feature. Almost-as-fast options such as intelligent re-checking of deferred constraints would solve the problem better and much more safely. If you really want the dangerous way you can already get it, it's just a bit more work to disable the triggers yourself, and by doing so you're saying "I understand what I am doing and take responsibility for the dangers inherent in doing so". If you really want to do that, look at the manual for how to disable triggers, but understand that you are throwing away the database's data integrity protection by doing it. -- Craig Ringer
> I, for one, would loudly and firmly resist the addition of such a > feature. Almost-as-fast options such as intelligent re-checking of Even if it was not the default behavior? > > If you really want to do that, look at the manual for how to disable > triggers, but understand that you are throwing away the database's data > integrity protection by doing it. > I guess it's a matter of philosophy. I kind of think as the DBA I should be the final authority in determining what is right and wrong. It's my data after all. Yes I would expect pg to perform every check I specify and execute every trigger I write but if I want I should be able to bypass those things "just this once". As you point out I can already do this by manually going through and disabling every trigger or even dropping the triggers. Many people have said I could drop the constraints and re-set them up. The fact that the COPY command does not have a convenient way for me to do this doesn't prevent me from "shooting myself in the foot" if I want to. It would just be a flag. If you want you can enable it, if you don't they no harm no foul. Anyway this is getting offtopic. I got my question answered. COPY does not do this. If I want to do it I have to manually iterate through all the triggers and disable them or drop them before running copy.
Tim Uckun wrote: >> I, for one, would loudly and firmly resist the addition of such a >> feature. Almost-as-fast options such as intelligent re-checking of > > Even if it was not the default behavior? Even if it was called COPY (PLEASE BREAK MY DATABASE) FROM ... ... because there are *better* ways to do it that are safe for exposure to normal users, and existing ways to do it the dangerous way if you really want to. > I guess it's a matter of philosophy. I kind of think as the DBA I > should be the final authority in determining what is right and wrong. > It's my data after all. Yes I would expect pg to perform every check I > specify and execute every trigger I write but if I want I should be > able to bypass those things "just this once". > > As you point out I can already do this by manually going through and > disabling every trigger or even dropping the triggers. You could alternately dynamically query pg_catalog and use PL/PgSQL (or SQL generated by your app) to issue the appropriate statements to control the triggers. > Many people > have said I could drop the constraints and re-set them up. The fact > that the COPY command does not have a convenient way for me to do this > doesn't prevent me from "shooting myself in the foot" if I want to. I think that it would be desirable for COPY to provide a convenient way to drop and re-create constraints, or (preferably) just disable them while it ran then re-check them before returning success. Think: COPY (BATCH_RI_CHECKS) FROM ... The thing you might have missed is that dropping and re-creating constraints is different to disabling them (as you're requesting that COPY do). When the constraints are re-created, the creation will *fail* if the constraint is violated, aborting the whole operation if you're sensible enough to do it in a single transaction. At no point can you end up with a constraint in place promising RI that is in fact violated by the data. By contrast, if you disable triggers and constraints, re-enabling them does *not* re-check any constraints. So it's much, MUCH more dangerous, since it can let bad data into the DB silently. So if you disable constraints you MUST re-check them after re-enabling them and abort the transaction if they're violated, or must be utterly certain that the data you inserted/altered/deleted was really safe. > It would just be a flag. If you want you can enable it, if you don't > they no harm no foul. Unfortunately my experience has been that many users (a) often don't read documentation and (b) just try different things until something they do makes the error "go away". They then get five steps down the track and post a question about a query that doesn't work correctly (because they broke their data) and it takes forever to get to the bottom of it. You're clearly a fairly experienced and responsible DBA who'd look something up before using it and would be careful to preserve RI manually in these situations. Some people who use PG aren't (an amazing number of them just installed it to run their poker card counting software!), and I don't think it's wise to make dangerous things *too* obvious. They need to be there and available, but not staring you in the face. I don't advocate the GNOME philosophy of "make it impossible if it's not suitable for a user who's using a computer for the first time" ... but I don't like the "nuke my data" button to be on the default desktop either ;-) -- Craig Ringer
Thanks for the information. I wrote a plan c program to test the performance. Its time measurement is very MUCH differentfrom pgAdmin. Thanks and Regards Yan Cheng CHEOK --- On Wed, 1/6/10, Andres Freund <andres@anarazel.de> wrote: > From: Andres Freund <andres@anarazel.de> > Subject: Re: [GENERAL] PostgreSQL Write Performance > To: pgsql-general@postgresql.org > Cc: "Yan Cheng Cheok" <yccheok@yahoo.com> > Date: Wednesday, January 6, 2010, 6:49 AM > Hi, > > On Tuesday 05 January 2010 04:36:10 Yan Cheng Cheok wrote: > > > I make the following single write operation through > pgAdmin : > ... > > It takes 16ms to write a single row according to > "Query Editor" (bottom > > right corner) > In my experience the times presented by pgadmin vary wildly > and seldomly do > represent an accurate timing. > > Andres >
Thanks for the information. I perform benchmarking on a very simple table, on local database. (1 table, 2 fields with 1 isbigserial, another is text) ==================================================================== INSERT INTO measurement_type(measurement_type_name) VALUES ('Hello') ==================================================================== I turn synchronous_commit to off. To write a single row(local database), the time taken is in between 0.1ms and 0.5ms I try to compare this with flat text file. To write a single row(file), the time taken is in between 0.005ms and 0.05ms The different is big. Is this the expected result? Are you guys also getting the similar result? I know there shall be some overhead to write to database compared to flat text file. (network communication, interpretationof SQL statement...) However, Is there any way to further improve so that PostgreSQL write performance is nearto file? If not, I need to plan another strategy, to migrate my flat text file system, into PostgreSQL system smoothly. Thanks and Regards Yan Cheng CHEOK --- On Tue, 1/5/10, Craig Ringer <craig@postnewspapers.com.au> wrote: > From: Craig Ringer <craig@postnewspapers.com.au> > Subject: Re: [GENERAL] PostgreSQL Write Performance > To: "Yan Cheng Cheok" <yccheok@yahoo.com> > Cc: "Dann Corbit" <DCorbit@connx.com>, pgsql-general@postgresql.org > Date: Tuesday, January 5, 2010, 7:20 PM > On 5/01/2010 3:30 PM, Yan Cheng Cheok > wrote: > >>> What is the actual problem you are trying to > solve? > > > > I am currently developing a database system for a high > speed measurement machine. > > > > The time taken to perform measurement per unit is in > term of ~30 milliseconds. We need to record down the > measurement result for every single unit. Hence, the time > taken by record down the measurement result shall be far > more less than milliseconds, so that it will have nearly 0 > impact on the machine speed (If not, machine need to wait > for database to finish writing, before performing > measurement on next unit) > > The commit_delay and synchronous_commit pararmeters may > help you if you want to do each insert as a separate > transaction. Note that with these parameters there's some > risk of very recently committed data being lost if the > server OS crashes or the server hardware is powered > off/power-cycled unexpectedly. PostgreSQL its self crashing > shouldn't cause loss of the committed data, though. > > Alternately, you can accumulate small batches of > measurements in your app and do multi-valued INSERTs once > you have a few (say 10) collected up. You'd have to be > prepared to lose those if the app crashed though. > > Another option is to continue using your flat file, and > have a "reader" process tailing the flat file and inserting > new records into the database as they become available in > the flat file. The reader could batch inserts intelligently, > keep a record on disk of its progress, rotate the flat file > periodically, etc. > > -- > Craig Ringer > > -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
Tim Uckun wrote: > > Is there a command like COPY which will insert the data but skip all > triggers and optionally integrity checks. > I'm curious if it would be worth COPYing the data into dummy tables with no constraints, and then using INSERT INTO ... SELECT statements to feed from those tables into the real ones, which would check constraints and such but as a set. I've done it that way in SQL Server before, but I'm much less experienced with PostgreSQL. -- Stephen Cook
> -----Original Message----- > From: Yan Cheng Cheok [mailto:yccheok@yahoo.com] > Sent: Tuesday, January 05, 2010 10:30 PM > To: Craig Ringer > Cc: Dann Corbit; pgsql-general@postgresql.org > Subject: Re: [GENERAL] PostgreSQL Write Performance > > Thanks for the information. I perform benchmarking on a very simple > table, on local database. (1 table, 2 fields with 1 is bigserial, > another is text) > > ==================================================================== > INSERT INTO measurement_type(measurement_type_name) VALUES ('Hello') > ==================================================================== > > I turn synchronous_commit to off. > > To write a single row(local database), the time taken is in between > 0.1ms and 0.5ms > > I try to compare this with flat text file. > > To write a single row(file), the time taken is in between 0.005ms and > 0.05ms > > The different is big. Is this the expected result? Are you guys also > getting the similar result? Ten to one is not surprising. You will save quite a bit by doing a prepare and a large number of inserts in a single transaction. > I know there shall be some overhead to write to database compared to > flat text file. (network communication, interpretation of SQL > statement...) However, Is there any way to further improve so that > PostgreSQL write performance is near to file? > > If not, I need to plan another strategy, to migrate my flat text file > system, into PostgreSQL system smoothly. Did you try the copy command? It will be very much faster.
On Tue, 2010-01-05 at 22:29 -0800, Yan Cheng Cheok wrote: > Thanks for the information. I perform benchmarking on a very simple table, on local database. (1 table, 2 fields with 1is bigserial, another is text) > > ==================================================================== > INSERT INTO measurement_type(measurement_type_name) VALUES ('Hello') > ==================================================================== > > I turn synchronous_commit to off. > > To write a single row(local database), the time taken is in between 0.1ms and 0.5ms x32 improvement sounds pretty good, IMHO > I try to compare this with flat text file. > > To write a single row(file), the time taken is in between 0.005ms and 0.05ms > > The different is big. Is this the expected result? Are you guys also getting the similar result? > > I know there shall be some overhead to write to database compared to flat text file. (network communication, interpretationof SQL statement...) However, Is there any way to further improve so that PostgreSQL write performance is nearto file? Postgres provides * data validation on input * foreign keys to cross-validate fields in your data input * transactions to allow roll-back of failed data * auto-generation timestamps, if required * auto-generated, easily restartable generation of unique keys * crash recovery of your database changes (apart from very recent data) * multiple concurrent writers, so multiple measurement machines can record to just a single database * readers do not block writers * SQL interface to run analytical queries against data * allows you to index data to allow fast retrieval of results * triggers to allow you to augment your data collection application with additional features over time ... If you don't want these features, then yes, they are overheads. If you're interested in funding feature development, I'd be more than happy to investigate further optimising your specific case. I think an auto-batching mode for INSERT statements should be possible, so that we save up consecutive SQL statements and apply them as a single transaction without requiring any client changes. -- Simon Riggs www.2ndQuadrant.com
On Wed, 2010-01-06 at 15:30 +1300, Tim Uckun wrote: > > I, for one, would loudly and firmly resist the addition of such a > > feature. Almost-as-fast options such as intelligent re-checking of > > Even if it was not the default behavior? > > > > > If you really want to do that, look at the manual for how to disable > > triggers, but understand that you are throwing away the database's data > > integrity protection by doing it. > > > > I guess it's a matter of philosophy. I kind of think as the DBA I > should be the final authority in determining what is right and wrong. > It's my data after all. Yes I would expect pg to perform every check I > specify and execute every trigger I write but if I want I should be > able to bypass those things "just this once". > > As you point out I can already do this by manually going through and > disabling every trigger or even dropping the triggers. Many people > have said I could drop the constraints and re-set them up. The fact > that the COPY command does not have a convenient way for me to do this > doesn't prevent me from "shooting myself in the foot" if I want to. > It would just be a flag. If you want you can enable it, if you don't > they no harm no foul. The reason we don't do this is because COPY can be run concurrently with queries, which is different to most other load utilities. The only time it is safe to disable triggers is in a transaction in which we either create the table or truncate it. That path is already optimised. We did discuss a version of COPY that locks the table to allow various performance optimisations, but that prevented running multiple COPYs concurrently and the loss in performance from doing that was more than the expected gain from the optimisation, so we didn't bother. > Anyway this is getting offtopic. I got my question answered. COPY does > not do this. If I want to do it I have to manually iterate through all > the triggers and disable them or drop them before running copy. There is a command to disable all triggers at once. -- Simon Riggs www.2ndQuadrant.com
Tim Uckun <timuckun@gmail.com> writes: > Is there a command like COPY which will insert the data but skip all > triggers and optionally integrity checks. pg_bulkload does that AFAIK. http://pgbulkload.projects.postgresql.org/ Regards, -- dim
On Thu, Jan 7, 2010 at 3:13 AM, Dimitri Fontaine <dfontaine@hi-media.com> wrote: > Tim Uckun <timuckun@gmail.com> writes: >> Is there a command like COPY which will insert the data but skip all >> triggers and optionally integrity checks. > > pg_bulkload does that AFAIK. > That's a great utility. Unfortunately since it bypasses the WAL I can't use it for this project but thanks for the pointer.
I am having the table with 1 million rows. I know there can be multiple "YanChengCHEOK". But in certain situation, I will be only interested in 1 "YanChengCHEOK". I try to perform SELECT query. SemiconductorInspection=# SELECT measurement_type_id FROM measurement_type WHERE measurement_type_name='YanChengCHEOK'; measurement_type_id --------------------- 1 (1 row) Time: 331.057 ms I try to have it in stored procedures. DECLARE _measurement_type_id int8; BEGIN SELECT measurement_type_id INTO _measurement_type_id FROM measurement_type WHERE measurement_type_name='YanChengCHEOK'; RAISE NOTICE '%', _measurement_type_id; return 1; end; It tools me only 1.018ms High chance that PostgreSQL stop looking further, when it found that the variable int8 had been fill in with at most 1 value. Without using stored procedure, how can I send a SQL statement to PostgreSQL, to tell it that I need only 1 measurement_type_id,to speed up the SELECT speed. Thanks and Regards Yan Cheng CHEOK
On Wed, 6 Jan 2010 17:45:31 -0800 (PST) Yan Cheng Cheok <yccheok@yahoo.com> wrote: > situation, I will be only interested in 1 "YanChengCHEOK". > SELECT measurement_type_id INTO _measurement_type_id FROM > measurement_type WHERE measurement_type_name='YanChengCHEOK'; LIMIT 1 Is that what you were looking for? -- Ivan Sergio Borgonovo http://www.webthatworks.it
Yan Cheng Cheok wrote: > The time taken to perform measurement per unit is in term of ~30 milliseconds. We need to record down the measurement resultfor every single unit. Hence, the time taken by record down the measurement result shall be far more less than milliseconds,so that it will have nearly 0 impact on the machine speed (If not, machine need to wait for database to finishwriting, before performing measurement on next unit) > Saving a piece of data to a hard disk permanently takes a few milliseconds. As pointed out already, exactly how many depends on the drive, but it's probably going to be 8ms or longer on your system. There are a few options here: 1) Add a battery-backed write caching controller to your system. Then the battery will help make sure the data doesn't get lost even if the power goes out before the driver writes it out. This will cost you around $300. 2) Use some other type of faster storage, such as a SSD drive that has a battery on it to cache any unfinished writes. Probably also going to be around that price, the cheaper SSDs (and some of the expensive ones) don't take data integrity very seriously. 3) Write the data to a flat file. Periodically import the results into the database in a batch. The thing you should realize is that using (3) is going to put you in a position where it's possible you've told the machine the measurement was saved, but if the system crashes it won't actually be in the database. If you're saving to a flat file now, you're already in this position--you can't write to a flat file and make sure the result is on disk in less than around 8ms either, you just probably haven't tested that out yet. Just because the write has returned successfully, that doesn't mean it's really stored permanently. Power the system off in the window between that write and when the memory cache goes out to disk, and you'll discover the data missing from the file after the system comes back up. If you're OK with the possibility of losing a measurement in the case of a system crash, then you should just write measurements to a series of flat files, then have another process altogether (one that isn't holding up the machine) load those files into the database. The fact that it takes a few ms to write to disk is a physical limitation you can't get around without using more expensive hardware to improve the situation. If you haven't been seeing that in your app already, I assure you it's just because you haven't looked for the issue before--this limitation on disk write speed has been there all along, the database is just forcing you to address it. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Thanks for the valuable advice! Will take them into consideration seriously.. From my point of view, my current requirement is limited by so-called "overhead" during communication with database. Seethe following result from SQL Shell : SemiconductorInspection=# \timing on Timing is on. SemiconductorInspection=# ; Time: 0.660 ms SemiconductorInspection=# ; Time: 0.517 ms SemiconductorInspection=# ; Time: 2.249 ms SemiconductorInspection=# I assume there shall be no hard disc activity involved, as I am sending "empty" SQL statement over. Thanks and Regards Yan Cheng CHEOK --- On Thu, 1/7/10, Greg Smith <greg@2ndquadrant.com> wrote: > From: Greg Smith <greg@2ndquadrant.com> > Subject: Re: [GENERAL] PostgreSQL Write Performance > To: "Yan Cheng Cheok" <yccheok@yahoo.com> > Cc: "Dann Corbit" <DCorbit@connx.com>, pgsql-general@postgresql.org > Date: Thursday, January 7, 2010, 12:49 PM > Yan Cheng Cheok wrote: > > The time taken to perform measurement per unit is in > term of ~30 milliseconds. We need to record down the > measurement result for every single unit. Hence, the time > taken by record down the measurement result shall be far > more less than milliseconds, so that it will have nearly 0 > impact on the machine speed (If not, machine need to wait > for database to finish writing, before performing > measurement on next unit) > > > > Saving a piece of data to a hard disk permanently takes a > few milliseconds. As pointed out already, exactly how > many depends on the drive, but it's probably going to be 8ms > or longer on your system. There are a few options > here: > > 1) Add a battery-backed write caching controller to your > system. Then the battery will help make sure the data > doesn't get lost even if the power goes out before the > driver writes it out. This will cost you around $300. > > 2) Use some other type of faster storage, such as a SSD > drive that has a battery on it to cache any unfinished > writes. Probably also going to be around that price, > the cheaper SSDs (and some of the expensive ones) don't take > data integrity very seriously. > > 3) Write the data to a flat file. Periodically import > the results into the database in a batch. > > The thing you should realize is that using (3) is going to > put you in a position where it's possible you've told the > machine the measurement was saved, but if the system crashes > it won't actually be in the database. If you're saving > to a flat file now, you're already in this position--you > can't write to a flat file and make sure the result is on > disk in less than around 8ms either, you just probably > haven't tested that out yet. Just because the write > has returned successfully, that doesn't mean it's really > stored permanently. Power the system off in the window > between that write and when the memory cache goes out to > disk, and you'll discover the data missing from the file > after the system comes back up. > > If you're OK with the possibility of losing a measurement > in the case of a system crash, then you should just write > measurements to a series of flat files, then have another > process altogether (one that isn't holding up the machine) > load those files into the database. The fact that it > takes a few ms to write to disk is a physical limitation you > can't get around without using more expensive hardware to > improve the situation. If you haven't been seeing that > in your app already, I assure you it's just because you > haven't looked for the issue before--this limitation on disk > write speed has been there all along, the database is just > forcing you to address it. > > -- Greg Smith > 2ndQuadrant Baltimore, MD > PostgreSQL Training, Services and Support > greg@2ndQuadrant.com > www.2ndQuadrant.com > >
Greg Smith <greg@2ndquadrant.com> writes: > If you're OK with the possibility of losing a measurement in the case of a > system crash Then I'd say use synchronous_commit = off for the transactions doing that, trading durability (the 'D' of ACID) against write performances. That requires 8.3 at least, and will not fsync() before telling the client the commit is done. Regards, -- dim
On 01/06/2010 08:49 PM, Greg Smith wrote: > Yan Cheng Cheok wrote: >> The time taken to perform measurement per unit is in term of ~30 >> milliseconds. We need to record down the measurement result for every >> single unit. Hence, the time taken by record down the measurement >> result shall be far more less than milliseconds, so that it will have >> nearly 0 impact on the machine speed (If not, machine need to wait for >> database to finish writing, before performing measurement on next unit) > Saving a piece of data to a hard disk permanently takes a few > milliseconds. As pointed out already, exactly how many depends on the > drive, but it's probably going to be 8ms or longer on your system. > There are a few options here: > 3) Write the data to a flat file. Periodically import the results into > the database in a batch. > If you're OK with the possibility of losing a measurement in the case of > a system crash, then you should just write measurements to a series of > flat files, then have another process altogether (one that isn't holding > up the machine) load those files into the database. The fact that it At my last company we built a system for use in semiconductor/flat-panel display equipment and faced a very similar issue -- namely we needed to collect 40+ parameters at 6 kHz from one source, and another 200+ at 6 kHz from a second source (and then sync them so they could be properly analyzed). Our solution was similar to #3, except we didn't bother with the flat file. We basically had a C++ process "catch" the incoming stream of data into a memory buffer, and periodically bulk copy it into the database using libpq COPY functions (see: http://www.postgresql.org/docs/8.4/interactive/libpq-copy.html) HTH, Joe