Thread: optimal insert
Hello experts, I have a database that contains three tables: create table a ( id serial primary key, ... -- some more fields not relevant for my question ); create table b ( id serial primary key, ... -- some more fields not relevant for my question ); create table a_b ( a int not null references a, b int not null references b ); Tables a and b have already been filled with lots of rows. Now my application needs to insert the relationship of a to b into table a_b and is currently doing it with inserts like the following: insert into a_b(a,b) values(1,100); insert into a_b(a,b) values(1,200); insert into a_b(a,b) values(1,54); insert into a_b(a,b) values(1,4577); So for a batch of inserts the value of a stays the same, while for by arbitrary values are inserted. Now I have wondered if PostreSQL offers a smarter way to insert those values? A solution can contains usage of some plpgsql code. -- ---> Dirk Jagdmann ----> http://cubic.org/~doj -----> http://llg.cubic.org
On Sun, Oct 08, 2006 at 23:04:02 +0200, Dirk Jagdmann <jagdmann@gmail.com> wrote: > > insert into a_b(a,b) values(1,100); > insert into a_b(a,b) values(1,200); > insert into a_b(a,b) values(1,54); > insert into a_b(a,b) values(1,4577); > > So for a batch of inserts the value of a stays the same, while for by > arbitrary values are inserted. Now I have wondered if PostreSQL offers > a smarter way to insert those values? A solution can contains usage of > some plpgsql code. I don't think you are going to be able to save anything by 'a' being fixed unless there is a way to compute 'b', so that you can use a select statement. Just do the normal stuff you would when speeding up inserts. The biggest probably being wrapping them all up in one transaction.
On 10/8/06, Dirk Jagdmann <jagdmann@gmail.com> wrote:
Hello experts,
I have a database that contains three tables:
create table a (
id serial primary key,
... -- some more fields not relevant for my question
);
create table b (
id serial primary key,
... -- some more fields not relevant for my question
);
create table a_b (
a int not null references a,
b int not null references b
);
Tables a and b have already been filled with lots of rows. Now my
application needs to insert the relationship of a to b into table a_b
and is currently doing it with inserts like the following:
insert into a_b(a,b) values(1,100);
insert into a_b(a,b) values(1,200);
insert into a_b(a,b) values(1,54);
insert into a_b(a,b) values(1,4577);
So for a batch of inserts the value of a stays the same, while for by
arbitrary values are inserted. Now I have wondered if PostreSQL offers
a smarter way to insert those values? A solution can contains usage of
some plpgsql code.
It depends on your logic. If you can write a query that selects out the a and b records, there is a smart way:
insert into a_b(a, b)
This is not really smart because you already have the id values - the select may diminish your performance rather than help. But if the select is "smarter" then you don't even need to know what the b.id values are:
insert into a_b(a, b)
select 1, b.id from b where b.somecolumn = 'somevalue';
Hope this helps.
==================================================================
Aaron Bono
Aranya Software Technologies, Inc.
http://www.aranya.com
http://codeelixir.com
==================================================================
And don't forget that \COPY and especially COPY are usually much faster (and, IMHO, easier to compose/maintain) than gobs of INSERTs. > -----Original Message----- > From: pgsql-sql-owner@postgresql.org > [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Aaron Bono > Sent: Tuesday, October 10, 2006 1:46 PM > To: Dirk Jagdmann > Cc: pgsql-sql@postgresql.org > Subject: Re: [SQL] optimal insert > > On 10/8/06, Dirk Jagdmann <jagdmann@gmail.com> wrote: > > Hello experts, > > I have a database that contains three tables: > > create table a ( > id serial primary key, > ... -- some more fields not relevant for my question > ); > create table b ( > id serial primary key, > ... -- some more fields not relevant for my question > ); > create table a_b ( > a int not null references a, > b int not null references b > ); > > Tables a and b have already been filled with lots of > rows. Now my > application needs to insert the relationship of a to b > into table a_b > and is currently doing it with inserts like the following: > > insert into a_b(a,b) values(1,100); > insert into a_b(a,b) values(1,200); > insert into a_b(a,b) values(1,54); > insert into a_b(a,b) values(1,4577); > > So for a batch of inserts the value of a stays the > same, while for by > arbitrary values are inserted. Now I have wondered if > PostreSQL offers > a smarter way to insert those values? A solution can > contains usage of > some plpgsql code. > > > It depends on your logic. If you can write a query that > selects out the a and b records, there is a smart way: > > insert into a_b(a, b) > select 1, b.id from b where b.id in (100, 200, 54, 4577); > > This is not really smart because you already have the id > values - the select may diminish your performance rather than > help. But if the select is "smarter" then you don't even > need to know what the b.id <http://b.id> values are: > > insert into a_b(a, b) > select 1, b.id from b where b.somecolumn = 'somevalue'; > > Hope this helps. > > > > ================================================================== > Aaron Bono > Aranya Software Technologies, Inc. > http://www.aranya.com > http://codeelixir.com > > ================================================================== > >
Hello Aaron, thank you for your suggestion. I will have to think if something similar would be of any benefit for my data. -- ---> Dirk Jagdmann ----> http://cubic.org/~doj -----> http://llg.cubic.org
Hello George, > And don't forget that \COPY and especially COPY are usually much faster > (and, IMHO, easier to compose/maintain) than gobs of INSERTs. I did not forget, but my application uses embedded SQL (with the epcg preprocessor) and I don't think it can handle COPYs :( -- ---> Dirk Jagdmann ----> http://cubic.org/~doj -----> http://llg.cubic.org
Hi, Dirk, Dirk Jagdmann wrote: > So for a batch of inserts the value of a stays the same, while for by > arbitrary values are inserted. Now I have wondered if PostreSQL offers > a smarter way to insert those values? A solution can contains usage of > some plpgsql code. [local]:postgres=# insert into a_b (a,b) SELECT 1, (ARRAY[100,200,54,4577])[i] FROM generate_series(1,4) AS i; INSERT 0 4 [local]:postgres=# select * from a_b;a | b ---+------1 | 1001 | 2001 | 541 | 4577 But I tend to agree with others that packing a whole bunch of INSERTs into a single transaction, and using COPY instead will give you better benefits. HTH, Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org