Thread: How long should it take to insert 200,000 records?

How long should it take to insert 200,000 records?

From
"Karen Hill"
Date:
I have a pl/pgsql function that is inserting 200,000 records for
testing purposes.  What is the expected time frame for this operation
on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
a 2ghz cpu.  So far I've been sitting here for about 2 million ms
waiting for it to complete, and I'm not sure how many inserts postgres
is doing per second.

regards,
karen


Re: How long should it take to insert 200,000 records?

From
Tom Lane
Date:
"Karen Hill" <karen_hill22@yahoo.com> writes:
> I have a pl/pgsql function that is inserting 200,000 records for
> testing purposes.  What is the expected time frame for this operation
> on a pc with 1/2 a gig of ram and a 7200 RPM disk?

I think you have omitted a bunch of relevant facts.  Bare INSERT is
reasonably quick:

regression=# create table foo (f1 int);
CREATE TABLE
regression=# \timing
Timing is on.
regression=# insert into foo select x from generate_series(1,200000) x;
INSERT 0 200000
Time: 5158.564 ms
regression=#

(this on a not-very-fast machine) but if you weigh it down with a ton
of index updates, foreign key checks, etc, it could get slow ...
also you haven't mentioned what else that plpgsql function is doing.

            regards, tom lane

Re: How long should it take to insert 200,000 records?

From
Csaba Nagy
Date:
On Tue, 2007-02-06 at 01:35, Karen Hill wrote:
> [snip] So far I've been sitting here for about 2 million ms
> waiting for it to complete, and I'm not sure how many inserts postgres
> is doing per second.

One way is to run analyze verbose on the target table and see how many
pages it has, and then do it again 1 minute later and check how many
pages it grew. Then multiply the page increase by the record per page
ratio you can get from the same analyze's output, and you'll get an
estimated growth rate. Of course this will only work if you didn't have
lots of free space in the table to start with... if you do have lots of
free space, you still can estimate the growth based on the analyze
results, but it will be more complicated.


In any case, it would be very nice to have more tools to attach to
running queries and see how they are doing... starting with what exactly
they are doing (are they in RI checks maybe ?), the actual execution
plan they are using, how much they've done from their work... it would
help a lot debugging performance problems.

Cheers,
Csaba.



Re: How long should it take to insert 200,000 records?

From
Scott Marlowe
Date:
On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> I have a pl/pgsql function that is inserting 200,000 records for
> testing purposes.  What is the expected time frame for this operation
> on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
> a 2ghz cpu.  So far I've been sitting here for about 2 million ms
> waiting for it to complete, and I'm not sure how many inserts postgres
> is doing per second.

That really depends.  Doing 200,000 inserts as individual transactions
will be fairly slow.  Since PostgreSQL generally runs in autocommit
mode, this means that if you didn't expressly begin a transaction, you
are in fact inserting each row as a transaction.  i.e. this:

for (i=0;i<200000;i++){
    insert into table abc values ('test',123);
}

Is functionally equivalent to:

for (i=0;i<200000;i++){
    begin;
    insert into table abc values ('test',123);
    commit;
}

However, you can add begin / end pairs outside the loop like this:

begin;
for (i=0;i<200000;i++){
    insert into table abc values ('test',123);
}
commit;

and it should run much faster.

Re: How long should it take to insert 200,000 records?

From
"Merlin Moncure"
Date:
On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> > I have a pl/pgsql function that is inserting 200,000 records for
> > testing purposes.  What is the expected time frame for this operation
> > on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
> > a 2ghz cpu.  So far I've been sitting here for about 2 million ms
> > waiting for it to complete, and I'm not sure how many inserts postgres
> > is doing per second.
>
> That really depends.  Doing 200,000 inserts as individual transactions
> will be fairly slow.  Since PostgreSQL generally runs in autocommit
> mode, this means that if you didn't expressly begin a transaction, you
> are in fact inserting each row as a transaction.  i.e. this:

I think OP is doing insertion inside a pl/pgsql loop...transaction is
implied here.  For creating test data, generate_series or
insert...select is obviously the way to go.  If that's unsuitable for
some reason, I would suggest RAISE NOTICE every n records so you can
monitor the progress and make sure something is not binding up in a
lock or something like that.  Be especially wary of degrading
performance during the process.

Another common problem with poor insert performance is a RI check to
an un-indexed column.  In-transaction insert performance should be
between 1k and 10k records/second in normal situations, meaning if you
haven't inserted 1 million records inside of an hour something else is
going on.

Generally, insertion performance from fastest to slowest is:
* insert select generate_series...
* insert select
* copy
* insert (),(),()[...] (at least 10 or preferably 100 insertions)
* begin, prepare, n prepared inserts executed, commit
* begin, n inserts, commit
* plpgsql loop, single inserts
* n inserts outside of transaction.

The order of which is faster might not be absolutely set in stone
(copy might beat insert select for example), but the top 4 methods
will always be much faster than the bottom 4.

merlin

Re: How long should it take to insert 200,000 records?

From
Scott Marlowe
Date:
On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
> On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> > > I have a pl/pgsql function that is inserting 200,000 records for
> > > testing purposes.  What is the expected time frame for this operation
> > > on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
> > > a 2ghz cpu.  So far I've been sitting here for about 2 million ms
> > > waiting for it to complete, and I'm not sure how many inserts postgres
> > > is doing per second.
> >
> > That really depends.  Doing 200,000 inserts as individual transactions
> > will be fairly slow.  Since PostgreSQL generally runs in autocommit
> > mode, this means that if you didn't expressly begin a transaction, you
> > are in fact inserting each row as a transaction.  i.e. this:
>
> I think OP is doing insertion inside a pl/pgsql loop...transaction is
> implied here.

Yeah, I noticed that about 10 seconds after hitting send... :)

Re: How long should it take to insert 200,000 records?

From
"Merlin Moncure"
Date:
On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
> > On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > > On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> > > > I have a pl/pgsql function that is inserting 200,000 records for
> > > > testing purposes.  What is the expected time frame for this operation
> > > > on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
> > > > a 2ghz cpu.  So far I've been sitting here for about 2 million ms
> > > > waiting for it to complete, and I'm not sure how many inserts postgres
> > > > is doing per second.
> > >
> > > That really depends.  Doing 200,000 inserts as individual transactions
> > > will be fairly slow.  Since PostgreSQL generally runs in autocommit
> > > mode, this means that if you didn't expressly begin a transaction, you
> > > are in fact inserting each row as a transaction.  i.e. this:
> >
> > I think OP is doing insertion inside a pl/pgsql loop...transaction is
> > implied here.
>
> Yeah, I noticed that about 10 seconds after hitting send... :)

actually, I get the stupid award also because RI check to unindexed
column is not possible :)  (this haunts deletes, not inserts).

merlin

Re: How long should it take to insert 200,000 records?

From
Mark Lewis
Date:
On Tue, 2007-02-06 at 12:01 -0500, Merlin Moncure wrote:
> On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > On Tue, 2007-02-06 at 10:40, Merlin Moncure wrote:
> > > On 2/6/07, Scott Marlowe <smarlowe@g2switchworks.com> wrote:
> > > > On Mon, 2007-02-05 at 18:35, Karen Hill wrote:
> > > > > I have a pl/pgsql function that is inserting 200,000 records for
> > > > > testing purposes.  What is the expected time frame for this operation
> > > > > on a pc with 1/2 a gig of ram and a 7200 RPM disk?   The processor is
> > > > > a 2ghz cpu.  So far I've been sitting here for about 2 million ms
> > > > > waiting for it to complete, and I'm not sure how many inserts postgres
> > > > > is doing per second.
> > > >
> > > > That really depends.  Doing 200,000 inserts as individual transactions
> > > > will be fairly slow.  Since PostgreSQL generally runs in autocommit
> > > > mode, this means that if you didn't expressly begin a transaction, you
> > > > are in fact inserting each row as a transaction.  i.e. this:
> > >
> > > I think OP is doing insertion inside a pl/pgsql loop...transaction is
> > > implied here.
> >
> > Yeah, I noticed that about 10 seconds after hitting send... :)
>
> actually, I get the stupid award also because RI check to unindexed
> column is not possible :)  (this haunts deletes, not inserts).

Sure it's possible:

CREATE TABLE parent (col1 int4);
-- insert many millions of rows into parent
CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
-- insert many millions of rows into child, very very slowly.


- Mark Lewis



Re: How long should it take to insert 200,000 records?

From
Bruno Wolff III
Date:
On Tue, Feb 06, 2007 at 10:31:26 -0800,
  Mark Lewis <mark.lewis@mir3.com> wrote:
>
> Sure it's possible:
>
> CREATE TABLE parent (col1 int4);
> -- insert many millions of rows into parent
> CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
> -- insert many millions of rows into child, very very slowly.

I don't think Postgres allows this. You don't have to have an index in the
child table, but do in the parent table.
Quote from http://developer.postgresql.org/pgdocs/postgres/sql-createtable.html:
The referenced columns must be the columns of a unique or primary key
constraint in the referenced table.

Re: How long should it take to insert 200,000 records?

From
"Merlin Moncure"
Date:
On 2/6/07, Mark Lewis <mark.lewis@mir3.com> wrote:
> > actually, I get the stupid award also because RI check to unindexed
> > column is not possible :)  (this haunts deletes, not inserts).
>
> Sure it's possible:
>
> CREATE TABLE parent (col1 int4);
> -- insert many millions of rows into parent
> CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
> -- insert many millions of rows into child, very very slowly.

the database will not allow you to create a RI link out unless the
parent table has a primary key/unique constraint, which the database
backs with an index....and you can't even trick it afterwards by
dropping the constraint.

it's the other direction, when you cascade forwards when you can have
a problem.  this is most common with a delete, but can also happen on
an update of a table's primary key with child tables referencing it.

merlin

Re: How long should it take to insert 200,000 records?

From
Mark Lewis
Date:
On Tue, 2007-02-06 at 14:06 -0500, Merlin Moncure wrote:
> On 2/6/07, Mark Lewis <mark.lewis@mir3.com> wrote:
> > > actually, I get the stupid award also because RI check to unindexed
> > > column is not possible :)  (this haunts deletes, not inserts).
> >
> > Sure it's possible:
> >
> > CREATE TABLE parent (col1 int4);
> > -- insert many millions of rows into parent
> > CREATE TABLE child  (col1 int4 REFERENCES parent(col1));
> > -- insert many millions of rows into child, very very slowly.
>
> the database will not allow you to create a RI link out unless the
> parent table has a primary key/unique constraint, which the database
> backs with an index....and you can't even trick it afterwards by
> dropping the constraint.
>
> it's the other direction, when you cascade forwards when you can have
> a problem.  this is most common with a delete, but can also happen on
> an update of a table's primary key with child tables referencing it.
>

Hmmm, should check my SQL before hitting send I guess.  Well, at least
you no longer have to wear the stupid award, Merlin :)

-- Mark Lewis

Re: How long should it take to insert 200,000 records?

From
"Karen Hill"
Date:
On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote:
> "Karen Hill" <karen_hil...@yahoo.com> writes:
> > I have a pl/pgsql function that is inserting 200,000 records for
> > testing purposes.  What is the expected time frame for this operation
> > on a pc with 1/2 a gig of ram and a 7200 RPM disk?
>
> I think you have omitted a bunch of relevant facts.  Bare INSERT is
> reasonably quick:
>
> regression=# create table foo (f1 int);
> CREATE TABLE
> regression=# \timing
> Timing is on.
> regression=# insert into foo select x from generate_series(1,200000) x;
> INSERT 0 200000
> Time: 5158.564 ms
> regression=#
>
> (this on a not-very-fast machine) but if you weigh it down with a ton
> of index updates, foreign key checks, etc, it could get slow ...
> also you haven't mentioned what else that plpgsql function is doing.
>

The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..200000 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;



Re: How long should it take to insert 200,000 records?

From
"Karen Hill"
Date:
On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote:
> "Karen Hill" <karen_hil...@yahoo.com> writes:
> > I have a pl/pgsql function that is inserting 200,000 records for
> > testing purposes.  What is the expected time frame for this operation
> > on a pc with 1/2 a gig of ram and a 7200 RPM disk?
>
> I think you have omitted a bunch of relevant facts.  Bare INSERT is
> reasonably quick:
>
> regression=# create table foo (f1 int);
> CREATE TABLE
> regression=# \timing
> Timing is on.
> regression=# insert into foo select x from generate_series(1,200000) x;
> INSERT 0 200000
> Time: 5158.564 ms
> regression=#
>
> (this on a not-very-fast machine) but if you weigh it down with a ton
> of index updates, foreign key checks, etc, it could get slow ...
> also you haven't mentioned what else that plpgsql function is doing.
>

The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..200000 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;



Re: How long should it take to insert 200,000 records?

From
Mark Kirkwood
Date:
Karen Hill wrote:

>
> The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
> inserting to an updatable view (basically two tables).
>
> CREATE TABLE foo1
> (
>
>
> ) ;
>
> CREATE TABLE foo2
> (
>
> );
>
> CREATE VIEW viewfoo AS
> (
>
> );
> CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
> (
>
> );
>
> CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
> BEGIN
> FOR i in 1..200000 LOOP
> INSERT INTO viewfoo (x) VALUES (x);
> END LOOP;
> END;
> $$ LANGUAGE plpgsql;
>


Sorry - but we probably need *still* more detail! - the definition of
viewfoo is likely to be critical. For instance a simplified variant of
your setup does 200000 inserts in 5s on my PIII tualatin machine:

CREATE TABLE foo1 (x INTEGER);

CREATE VIEW viewfoo AS SELECT * FROM foo1;

CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(
  INSERT INTO foo1 VALUES (new.x);
)

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
     FOR i in 1..200000 LOOP
         INSERT INTO viewfoo (x) VALUES (i);
     END LOOP;
END;
$$ LANGUAGE plpgsql;


postgres=# \timing
postgres=# SELECT functionFoo() ;
  functionfoo
-------------

(1 row)

Time: 4659.477 ms

postgres=# SELECT count(*) FROM viewfoo;
  count
--------
  200000
(1 row)

Cheers

Mark

Re: How long should it take to insert 200,000 records?

From
Tom Lane
Date:
"Karen Hill" <karen_hill22@yahoo.com> writes:
> On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote:
>> I think you have omitted a bunch of relevant facts.

> The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
> inserting to an updatable view (basically two tables).
> [ sketch of schema ]

I think the problem is probably buried in the parts you left out.  Can
you show us the full schemas for those tables, as well as the rule
definition?  The plpgsql function itself can certainly go a lot faster
than what you indicated.  On my slowest active machine:

regression=# create table viewfoo(x int);
CREATE TABLE
regression=# CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..200000 LOOP
INSERT INTO viewfoo (x) VALUES (i);
END LOOP;
END;
$$ LANGUAGE plpgsql;
CREATE FUNCTION
regression=# \timing
Timing is on.
regression=# select functionFoo();
 functionfoo
-------------

(1 row)

Time: 16939.667 ms
regression=#

            regards, tom lane

Re: How long should it take to insert 200,000 records?

From
"Lou O'Quin"
Date:
unless you specify otherwiise, every insert carries its own transaction
begin/commit.  That's a lot of overhead for a single insert, no?  Why
not use a single transaction for, say, each 1000 inserts?  That would
strike a nice balance of security with efficiency.

pseudo code for the insert:

Begin Transaction;
FOR i in 1..200000 LOOP
  INSERT INTO viewfoo (x) VALUES (x);
  IF i % 1000 = 0 THEN
    Commit Transaction;
    Begin Transaction;
  END IF;
END LOOP;
Commit Transaction;
End


This approach should speed up things dramatically.



>>> "Karen Hill" <karen_hill22@yahoo.com> 2/6/2007 2:39 PM >>>
On Feb 5, 9:33 pm, t...@sss.pgh.pa.us (Tom Lane) wrote:
> "Karen Hill" <karen_hil...@yahoo.com> writes:
> > I have a pl/pgsql function that is inserting 200,000 records for
> > testing purposes.  What is the expected time frame for this
operation
> > on a pc with 1/2 a gig of ram and a 7200 RPM disk?
>
> I think you have omitted a bunch of relevant facts.  Bare INSERT is
> reasonably quick:
>
> regression=# create table foo (f1 int);
> CREATE TABLE
> regression=# \timing
> Timing is on.
> regression=# insert into foo select x from generate_series(1,200000)
x;
> INSERT 0 200000
> Time: 5158.564 ms
> regression=#
>
> (this on a not-very-fast machine) but if you weigh it down with a
ton
> of index updates, foreign key checks, etc, it could get slow ...
> also you haven't mentioned what else that plpgsql function is doing.
>

The postgres version is 8.2.1 on Windows.   The pl/pgsql function is
inserting to an updatable view (basically two tables).

CREATE TABLE foo1
(


) ;

CREATE TABLE foo2
(

);

CREATE VIEW viewfoo AS
(

);
CREATE RULE ruleFoo AS ON INSERT TO viewfoo DO INSTEAD
(

);

CREATE OR REPLACE FUNCTION functionFoo() RETURNS VOID AS $$
BEGIN
FOR i in 1..200000 LOOP
INSERT INTO viewfoo (x) VALUES (x);
END LOOP;
END;
$$ LANGUAGE plpgsql;



---------------------------(end of
broadcast)---------------------------
TIP 6: explain analyze is your friend