Thread: Optimizing huge inserts/copy's

Optimizing huge inserts/copy's

From
Webb Sprague
Date:
Hi all,

Does anybody have any thoughts on optimizing a huge
insert, involving something like 3 million records all
at once?  Should I drop my indices before doing the
copy, and then create them after?  I keep a
tab-delimited file as a buffer, copy it, then do it
again about 400 times.  Each separate buffer is a few
thousand records. 

We do this at night, so it's not the end of the world
if it takes 8 hours, but I would be very grateful for
some good ideas...

Thanks
W

__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/


Re: Optimizing huge inserts/copy's

From
Jie Liang
Date:
Hi, there,

1. use copy ... from '.....';
2. write a PL/pgSQL function and pass multiple records as an array.

However, if your table have a foreign key constraint, it cannot be speed
up,

I have same question as you, my table invloving 9-13 million rows, I
don't
know how can I add a foreign key them also?



Webb Sprague wrote:

> Hi all,
>
> Does anybody have any thoughts on optimizing a huge
> insert, involving something like 3 million records all
> at once?  Should I drop my indices before doing the
> copy, and then create them after?  I keep a
> tab-delimited file as a buffer, copy it, then do it
> again about 400 times.  Each separate buffer is a few
> thousand records.
>
> We do this at night, so it's not the end of the world
> if it takes 8 hours, but I would be very grateful for
> some good ideas...
>
> Thanks
> W
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Mail - Free email you can access from anywhere!
> http://mail.yahoo.com/

--
Jie LIANG

Internet Products Inc.

10350 Science Center Drive
Suite 100, San Diego, CA 92121
Office:(858)320-4873

jliang@ipinc.com
www.ipinc.com





Re: Optimizing huge inserts/copy's

From
Stephan Szabo
Date:
On Tue, 29 Aug 2000, Jie Liang wrote:

> Hi, there,
> 
> 1. use copy ... from '.....';
> 2. write a PL/pgSQL function and pass multiple records as an array.
> 
> However, if your table have a foreign key constraint, it cannot be speed
> up,
> 
> I have same question as you, my table invloving 9-13 million rows, I
> don't
> know how can I add a foreign key them also?

I haven't tried it on really large tables, but does it turn out faster to
use ALTER TABLE ADD CONSTRAINT to add the foreign key constraint after the
data is loaded and the indexes are created?



Re: Optimizing huge inserts/copy's

From
Jie Liang
Date:
Hi, there,

I tried different  ways,  include vaccum table ,  ensure index works, it
still is as slow as ~100rows per minute.


Stephan Szabo wrote:

> On Tue, 29 Aug 2000, Jie Liang wrote:
>
> > Hi, there,
> >
> > 1. use copy ... from '.....';
> > 2. write a PL/pgSQL function and pass multiple records as an array.
> >
> > However, if your table have a foreign key constraint, it cannot be speed
> > up,
> >
> > I have same question as you, my table invloving 9-13 million rows, I
> > don't
> > know how can I add a foreign key them also?
>
> I haven't tried it on really large tables, but does it turn out faster to
> use ALTER TABLE ADD CONSTRAINT to add the foreign key constraint after the
> data is loaded and the indexes are created?

--
Jie LIANG

Internet Products Inc.

10350 Science Center Drive
Suite 100, San Diego, CA 92121
Office:(858)320-4873

jliang@ipinc.com
www.ipinc.com





Re: Optimizing huge inserts/copy's

From
Webb Sprague
Date:
I am experimenting with this too.  If I have any
indexes at all, the copy's get VERY SLOW as the table
gets big.  Delete ALL your indexes, do your copy's,
and then create your indexes again.

Good luck.
--- Jie Liang <jliang@ipinc.com> wrote:
> Hi, there,
> 
> I tried different  ways,  include vaccum table , 
> ensure index works, it
> still is as slow as ~100rows per minute.
> 
> 
> Stephan Szabo wrote:
> 
> > On Tue, 29 Aug 2000, Jie Liang wrote:
> >
> > > Hi, there,
> > >
> > > 1. use copy ... from '.....';
> > > 2. write a PL/pgSQL function and pass multiple
> records as an array.
> > >
> > > However, if your table have a foreign key
> constraint, it cannot be speed
> > > up,
> > >
> > > I have same question as you, my table invloving
> 9-13 million rows, I
> > > don't
> > > know how can I add a foreign key them also?
> >
> > I haven't tried it on really large tables, but
> does it turn out faster to
> > use ALTER TABLE ADD CONSTRAINT to add the foreign
> key constraint after the
> > data is loaded and the indexes are created?
> 
> --
> Jie LIANG
> 
> Internet Products Inc.
> 
> 10350 Science Center Drive
> Suite 100, San Diego, CA 92121
> Office:(858)320-4873
> 
> jliang@ipinc.com
> www.ipinc.com
> 
> 
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Mail - Free email you can access from anywhere!
http://mail.yahoo.com/


Re: Optimizing huge inserts/copy's

From
Jie Liang
Date:
Hi,

I knew that if no constarint, it populate very quick, my question is:
when two tables have been
reloaded, then I want to add a foreign key constraint to it, say:
tableA has primary key column (id)
tableB has a column (id) references it, so I say:
ALTER TABLE tableB ADD CONSTRAINT distfk FOREIGN KEY (id) REFERENCES
tableA(id)  ON DELETE CASCADE ;

It just seems takes forever.

Thanks anyway.


Webb Sprague wrote:

> I am experimenting with this too.  If I have any
> indexes at all, the copy's get VERY SLOW as the table
> gets big.  Delete ALL your indexes, do your copy's,
> and then create your indexes again.
>
> Good luck.
> --- Jie Liang <jliang@ipinc.com> wrote:
> > Hi, there,
> >
> > I tried different  ways,  include vaccum table ,
> > ensure index works, it
> > still is as slow as ~100rows per minute.
> >
> >
> > Stephan Szabo wrote:
> >
> > > On Tue, 29 Aug 2000, Jie Liang wrote:
> > >
> > > > Hi, there,
> > > >
> > > > 1. use copy ... from '.....';
> > > > 2. write a PL/pgSQL function and pass multiple
> > records as an array.
> > > >
> > > > However, if your table have a foreign key
> > constraint, it cannot be speed
> > > > up,
> > > >
> > > > I have same question as you, my table invloving
> > 9-13 million rows, I
> > > > don't
> > > > know how can I add a foreign key them also?
> > >
> > > I haven't tried it on really large tables, but
> > does it turn out faster to
> > > use ALTER TABLE ADD CONSTRAINT to add the foreign
> > key constraint after the
> > > data is loaded and the indexes are created?
> >
> > --
> > Jie LIANG
> >
> > Internet Products Inc.
> >
> > 10350 Science Center Drive
> > Suite 100, San Diego, CA 92121
> > Office:(858)320-4873
> >
> > jliang@ipinc.com
> > www.ipinc.com
> >
> >
> >
>
> __________________________________________________
> Do You Yahoo!?
> Yahoo! Mail - Free email you can access from anywhere!
> http://mail.yahoo.com/

--
Jie LIANG

Internet Products Inc.

10350 Science Center Drive
Suite 100, San Diego, CA 92121
Office:(858)320-4873

jliang@ipinc.com
www.ipinc.com





Re: Optimizing huge inserts/copy's

From
Stephan Szabo
Date:
On Wed, 30 Aug 2000, Jie Liang wrote:

> Hi,
> 
> I knew that if no constarint, it populate very quick, my question is:
> when two tables have been
> reloaded, then I want to add a foreign key constraint to it, say:
> tableA has primary key column (id)
> tableB has a column (id) references it, so I say:
> ALTER TABLE tableB ADD CONSTRAINT distfk FOREIGN KEY (id) REFERENCES
> tableA(id)  ON DELETE CASCADE ;

Yeah, the alter table has to check that the constraint is valid.  There
might be a faster way than the current "scan through table calling
trigger function" mechanism, although doing most of them starts pulling
logic for the obeying constraint into multiple places.



Re: Optimizing huge inserts/copy's

From
Tom Lane
Date:
Webb Sprague <wsprague100@yahoo.com> writes:
> I am experimenting with this too.  If I have any
> indexes at all, the copy's get VERY SLOW as the table
> gets big.  Delete ALL your indexes, do your copy's,
> and then create your indexes again.

Do you have a lot of equal index keys in the data you're inserting?
I've recently been swatting some performance problems in the btree
index code for the case of large numbers of equal keys.
        regards, tom lane


Re: Optimizing huge inserts/copy's

From
Zlatko Calusic
Date:
Jie Liang <jliang@ipinc.com> writes:

> Hi, there,
> 
> I tried different  ways,  include vaccum table ,  ensure index works, it
> still is as slow as ~100rows per minute.
> 

PGFSYNC=no in postmaster.init?

Well, this might be Debian Linux specific, pardon me if it is. I have
just begun playing with Postgres. Still learning, myself.

IOW, disable fsync after every statement and your OS will do much
better work clustering writes. That means more inserts/sec for you.

In one of my tests I was able to insert at ~1000/sec rate. Then I made
an experiment, enabled pgfsync _and_ indexes. The inserting speed
dropped to 10/sec. Very interesting.

Regards,
-- 
Zlatko