Thread: How long should it take to insert 200,000 records?
I have a pl/pgsql function that is inserting 200,000 records for testing purposes. What is the expected time frame for this operation on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is a 2ghz cpu. So far I've been sitting here for about 2 million ms waiting for it to complete, and I'm not sure how many inserts postgres is doing per second. regards, karen
"Karen Hill" <karen_hill22@yahoo.com> writes: > I have a pl/pgsql function that is inserting 200,000 records for > testing purposes. What is the expected time frame for this operation > on a pc with 1/2 a gig of ram and a 7200 RPM disk? I think you have omitted a bunch of relevant facts. Bare INSERT is reasonably quick: regression=# create table foo (f1 int); CREATE TABLE regression=# \timing Timing is on. regression=# insert into foo select x from generate_series(1,200000) x; INSERT 0 200000 Time: 5158.564 ms regression=# (this on a not-very-fast machine) but if you weigh it down with a ton of index updates, foreign key checks, etc, it could get slow ... also you haven't mentioned what else that plpgsql function is doing. regards, tom lane
On Tue, 2007-02-06 at 01:35, Karen Hill wrote: > [snip] So far I've been sitting here for about 2 million ms > waiting for it to complete, and I'm not sure how many inserts postgres > is doing per second. One way is to run analyze verbose on the target table and see how many pages it has, and then do it again 1 minute later and check how many pages it grew. Then multiply the page increase by the record per page ratio you can get from the same analyze's output, and you'll get an estimated growth rate. Of course this will only work if you didn't have lots of free space in the table to start with... if you do have lots of free space, you still can estimate the growth based on the analyze results, but it will be more complicated. In any case, it would be very nice to have more tools to attach to running queries and see how they are doing... starting with what exactly they are doing (are they in RI checks maybe ?), the actual execution plan they are using, how much they've done from their work... it would help a lot debugging performance problems. Cheers, Csaba.
On Mon, 2007-02-05 at 18:35, Karen Hill wrote: > I have a pl/pgsql function that is inserting 200,000 records for > testing purposes. What is the expected time frame for this operation > on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is > a 2ghz cpu. So far I've been sitting here for about 2 million ms > waiting for it to complete, and I'm not sure how many inserts postgres > is doing per second. That really depends. Doing 200,000 inserts as individual transactions will be fairly slow. Since PostgreSQL generally runs in autocommit mode, this means that if you didn't expressly begin a transaction, you are in fact inserting each row as a transaction. i.e. this: for (i=0;i<200000;i++){ insert into table abc values ('test',123); } Is functionally equivalent to: for (i=0;i<200000;i++){ begin; insert into table abc values ('test',123); commit; } However, you can add begin / end pairs outside the loop like this: begin; for (i=0;i<200000;i++){ insert into table abc values ('test',123); } commit; and it should run much faster.
On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > On Mon, 2007-02-05 at 18:35, Karen Hill wrote: > > I have a pl/pgsql function that is inserting 200,000 records for > > testing purposes. What is the expected time frame for this operation > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is > > a 2ghz cpu. So far I've been sitting here for about 2 million ms > > waiting for it to complete, and I'm not sure how many inserts postgres > > is doing per second. > > That really depends. Doing 200,000 inserts as individual transactions > will be fairly slow. Since PostgreSQL generally runs in autocommit > mode, this means that if you didn't expressly begin a transaction, you > are in fact inserting each row as a transaction. i.e. this: I think OP is doing insertion inside a pl/pgsql loop...transaction is implied here. For creating test data, generate_series or insert...select is obviously the way to go. If that's unsuitable for some reason, I would suggest RAISE NOTICE every n records so you can monitor the progress and make sure something is not binding up in a lock or something like that. Be especially wary of degrading performance during the process. Another common problem with poor insert performance is a RI check to an un-indexed column. In-transaction insert performance should be between 1k and 10k records/second in normal situations, meaning if you haven't inserted 1 million records inside of an hour something else is going on. Generally, insertion performance from fastest to slowest is: * insert select generate_series... * insert select * copy * insert (),(),()[...] (at least 10 or preferably 100 insertions) * begin, prepare, n prepared inserts executed, commit * begin, n inserts, commit * plpgsql loop, single inserts * n inserts outside of transaction. The order of which is faster might not be absolutely set in stone (copy might beat insert select for example), but the top 4 methods will always be much faster than the bottom 4. merlin
On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: > On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > > On Mon, 2007-02-05 at 18:35, Karen Hill wrote: > > > I have a pl/pgsql function that is inserting 200,000 records for > > > testing purposes. What is the expected time frame for this operation > > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is > > > a 2ghz cpu. So far I've been sitting here for about 2 million ms > > > waiting for it to complete, and I'm not sure how many inserts postgres > > > is doing per second. > > > > That really depends. Doing 200,000 inserts as individual transactions > > will be fairly slow. Since PostgreSQL generally runs in autocommit > > mode, this means that if you didn't expressly begin a transaction, you > > are in fact inserting each row as a transaction. i.e. this: > > I think OP is doing insertion inside a pl/pgsql loop...transaction is > implied here. Yeah, I noticed that about 10 seconds after hitting send... :)
On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: > > On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > > > On Mon, 2007-02-05 at 18:35, Karen Hill wrote: > > > > I have a pl/pgsql function that is inserting 200,000 records for > > > > testing purposes. What is the expected time frame for this operation > > > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is > > > > a 2ghz cpu. So far I've been sitting here for about 2 million ms > > > > waiting for it to complete, and I'm not sure how many inserts postgres > > > > is doing per second. > > > > > > That really depends. Doing 200,000 inserts as individual transactions > > > will be fairly slow. Since PostgreSQL generally runs in autocommit > > > mode, this means that if you didn't expressly begin a transaction, you > > > are in fact inserting each row as a transaction. i.e. this: > > > > I think OP is doing insertion inside a pl/pgsql loop...transaction is > > implied here. > > Yeah, I noticed that about 10 seconds after hitting send... :) actually, I get the stupid award also because RI check to unindexed column is not possible :) (this haunts deletes, not inserts). merlin
On Tue, 2007-02-06 at 12:01 -0500, Merlin Moncure wrote: > On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > > On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote: > > > On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote: > > > > On Mon, 2007-02-05 at 18:35, Karen Hill wrote: > > > > > I have a pl/pgsql function that is inserting 200,000 records for > > > > > testing purposes. What is the expected time frame for this operation > > > > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? The processor is > > > > > a 2ghz cpu. So far I've been sitting here for about 2 million ms > > > > > waiting for it to complete, and I'm not sure how many inserts postgres > > > > > is doing per second. > > > > > > > > That really depends. Doing 200,000 inserts as individual transactions > > > > will be fairly slow. Since PostgreSQL generally runs in autocommit > > > > mode, this means that if you didn't expressly begin a transaction, you > > > > are in fact inserting each row as a transaction. i.e. this: > > > > > > I think OP is doing insertion inside a pl/pgsql loop...transaction is > > > implied here. > > > > Yeah, I noticed that about 10 seconds after hitting send... :) > > actually, I get the stupid award also because RI check to unindexed > column is not possible :) (this haunts deletes, not inserts). Sure it's possible: CREATE TABLE parent (col1 int4); -- insert many millions of rows into parent CREATE TABLE child (col1 int4 REFERENCES parent(col1)); -- insert many millions of rows into child, very very slowly. - Mark Lewis
On Tue, Feb 06, 2007 at 10:31:26 -0800, Mark Lewis <mark.lewis@mir3.com> wrote: > > Sure it's possible: > > CREATE TABLE parent (col1 int4); > -- insert many millions of rows into parent > CREATE TABLE child (col1 int4 REFERENCES parent(col1)); > -- insert many millions of rows into child, very very slowly. I don't think Postgres allows this. You don't have to have an index in the child table, but do in the parent table. Quote from http://developer.postgresql.org/pgdocs/postgres/sql-createtable.html: The referenced columns must be the columns of a unique or primary key constraint in the referenced table.
On 2/6/07, Mark Lewis <mark.lewis@mir3.com> wrote: > > actually, I get the stupid award also because RI check to unindexed > > column is not possible :) (this haunts deletes, not inserts). > > Sure it's possible: > > CREATE TABLE parent (col1 int4); > -- insert many millions of rows into parent > CREATE TABLE child (col1 int4 REFERENCES parent(col1)); > -- insert many millions of rows into child, very very slowly. the database will not allow you to create a RI link out unless the parent table has a primary key/unique constraint, which the database backs with an index....and you can't even trick it afterwards by dropping the constraint. it's the other direction, when you cascade forwards when you can have a problem. this is most common with a delete, but can also happen on an update of a table's primary key with child tables referencing it. merlin
On Tue, 2007-02-06 at 14:06 -0500, Merlin Moncure wrote: > On 2/6/07, Mark Lewis <mark.lewis@mir3.com> wrote: > > > actually, I get the stupid award also because RI check to unindexed > > > column is not possible :) (this haunts deletes, not inserts). > > > > Sure it's possible: > > > > CREATE TABLE parent (col1 int4); > > -- insert many millions of rows into parent > > CREATE TABLE child (col1 int4 REFERENCES parent(col1)); > > -- insert many millions of rows into child, very very slowly. > > the database will not allow you to create a RI link out unless the > parent table has a primary key/unique constraint, which the database > backs with an index....and you can't even trick it afterwards by > dropping the constraint. > > it's the other direction, when you cascade forwards when you can have > a problem. this is most common with a delete, but can also happen on > an update of a table's primary key with child tables referencing it. > Hmmm, should check my SQL before hitting send I guess. Well, at least you no longer have to wear the stupid award, Merlin :) -- Mark Lewis
On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote: > "Karen Hill" <karen_hil...@yahoo.com> writes: > > I have a pl/pgsql function that is inserting 200,000 records for > > testing purposes. What is the expected time frame for this operation > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? > > I think you have omitted a bunch of relevant facts. Bare INSERT is > reasonably quick: > > regression=# create table foo (f1 int); > CREATE TABLE > regression=# \timing > Timing is on. > regression=# insert into foo select x from generate_series(1,200000) x; > INSERT 0 200000 > Time: 5158.564 ms > regression=# > > (this on a not-very-fast machine) but if you weigh it down with a ton > of index updates, foreign key checks, etc, it could get slow ... > also you haven't mentioned what else that plpgsql function is doing. > The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..200000 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql;
On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote: > "Karen Hill" <karen_hil...@yahoo.com> writes: > > I have a pl/pgsql function that is inserting 200,000 records for > > testing purposes. What is the expected time frame for this operation > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? > > I think you have omitted a bunch of relevant facts. Bare INSERT is > reasonably quick: > > regression=# create table foo (f1 int); > CREATE TABLE > regression=# \timing > Timing is on. > regression=# insert into foo select x from generate_series(1,200000) x; > INSERT 0 200000 > Time: 5158.564 ms > regression=# > > (this on a not-very-fast machine) but if you weigh it down with a ton > of index updates, foreign key checks, etc, it could get slow ... > also you haven't mentioned what else that plpgsql function is doing. > The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..200000 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql;
Karen Hill wrote: > > The postgres version is 8.2.1 on Windows. The pl/pgsql function is > inserting to an updatable view (basically two tables). > > CREATE TABLE foo1 > ( > > > ) ; > > CREATE TABLE foo2 > ( > > ); > > CREATE VIEW viewfoo AS > ( > > ); > CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD > ( > > ); > > CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ > BEGIN > FOR i in 1..200000 LOOP > INSERT INTO viewfoo (x) VALUES (x); > END LOOP; > END; > $$ LANGUAGE plpgsql; > Sorry - but we probably need *still* more detail! - the definition of viewfoo is likely to be critical. For instance a simplified variant of your setup does 200000 inserts in 5s on my PIII tualatin machine: CREATE TABLE foo1 (x INTEGER); CREATE VIEW viewfoo AS SELECT * FROM foo1; CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( INSERT INTO foo1 VALUES (new.x); ) CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..200000 LOOP INSERT INTO viewfoo (x) VALUES (i); END LOOP; END; $$ LANGUAGE plpgsql; postgres=# \timing postgres=# SELECT functionFoo() ; functionfoo ------------- (1 row) Time: 4659.477 ms postgres=# SELECT count(*) FROM viewfoo; count -------- 200000 (1 row) Cheers Mark
"Karen Hill" <karen_hill22@yahoo.com> writes: > On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote: >> I think you have omitted a bunch of relevant facts. > The postgres version is 8.2.1 on Windows. The pl/pgsql function is > inserting to an updatable view (basically two tables). > [ sketch of schema ] I think the problem is probably buried in the parts you left out. Can you show us the full schemas for those tables, as well as the rule definition? The plpgsql function itself can certainly go a lot faster than what you indicated. On my slowest active machine: regression=# create table viewfoo(x int); CREATE TABLE regression=# CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..200000 LOOP INSERT INTO viewfoo (x) VALUES (i); END LOOP; END; $$ LANGUAGE plpgsql; CREATE FUNCTION regression=# \timing Timing is on. regression=# select functionFoo(); functionfoo ------------- (1 row) Time: 16939.667 ms regression=# regards, tom lane
unless you specify otherwiise, every insert carries its own transaction begin/commit. That's a lot of overhead for a single insert, no? Why not use a single transaction for, say, each 1000 inserts? That would strike a nice balance of security with efficiency. pseudo code for the insert: Begin Transaction; FOR i in 1..200000 LOOP INSERT INTO viewfoo (x) VALUES (x); IF i % 1000 = 0 THEN Commit Transaction; Begin Transaction; END IF; END LOOP; Commit Transaction; End This approach should speed up things dramatically. >>> "Karen Hill" <karen_hill22@yahoo.com> 2/6/2007 2:39 PM >>> On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote: > "Karen Hill" <karen_hil...@yahoo.com> writes: > > I have a pl/pgsql function that is inserting 200,000 records for > > testing purposes. What is the expected time frame for this operation > > on a pc with 1/2 a gig of ram and a 7200 RPM disk? > > I think you have omitted a bunch of relevant facts. Bare INSERT is > reasonably quick: > > regression=# create table foo (f1 int); > CREATE TABLE > regression=# \timing > Timing is on. > regression=# insert into foo select x from generate_series(1,200000) x; > INSERT 0 200000 > Time: 5158.564 ms > regression=# > > (this on a not-very-fast machine) but if you weigh it down with a ton > of index updates, foreign key checks, etc, it could get slow ... > also you haven't mentioned what else that plpgsql function is doing. > The postgres version is 8.2.1 on Windows. The pl/pgsql function is inserting to an updatable view (basically two tables). CREATE TABLE foo1 ( ) ; CREATE TABLE foo2 ( ); CREATE VIEW viewfoo AS ( ); CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD ( ); CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$ BEGIN FOR i in 1..200000 LOOP INSERT INTO viewfoo (x) VALUES (x); END LOOP; END; $$ LANGUAGE plpgsql; ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend