Thread: optimal insert

optimal insert

From
"Dirk Jagdmann"
Date:
Hello experts,

I have a database that contains three tables:

create table a ( id serial primary key, ... -- some more fields not relevant for my question
);
create table b ( id serial primary key, ... -- some more fields not relevant for my question
);
create table a_b ( a int not null references a, b int not null references b
);

Tables a and b have already been filled with lots of rows. Now my
application needs to insert the relationship of a to b into table a_b
and is currently doing it with inserts like the following:

insert into a_b(a,b) values(1,100);
insert into a_b(a,b) values(1,200);
insert into a_b(a,b) values(1,54);
insert into a_b(a,b) values(1,4577);

So for a batch of inserts the value of a stays the same, while for by
arbitrary values are inserted. Now I have wondered if PostreSQL offers
a smarter way to insert those values? A solution can contains usage of
some plpgsql code.

-- 
---> Dirk Jagdmann
----> http://cubic.org/~doj
-----> http://llg.cubic.org


Re: optimal insert

From
Bruno Wolff III
Date:
On Sun, Oct 08, 2006 at 23:04:02 +0200, Dirk Jagdmann <jagdmann@gmail.com> wrote:
> 
> insert into a_b(a,b) values(1,100);
> insert into a_b(a,b) values(1,200);
> insert into a_b(a,b) values(1,54);
> insert into a_b(a,b) values(1,4577);
> 
> So for a batch of inserts the value of a stays the same, while for by
> arbitrary values are inserted. Now I have wondered if PostreSQL offers
> a smarter way to insert those values? A solution can contains usage of
> some plpgsql code.

I don't think you are going to be able to save anything by 'a' being fixed
unless there is a way to compute 'b', so that you can use a select statement.

Just do the normal stuff you would when speeding up inserts. The biggest
probably being wrapping them all up in one transaction.


Re: optimal insert

From
"Aaron Bono"
Date:
On 10/8/06, Dirk Jagdmann <jagdmann@gmail.com> wrote:
Hello experts,

I have a database that contains three tables:

create table a (
id serial primary key,
... -- some more fields not relevant for my question
);
create table b (
id serial primary key,
... -- some more fields not relevant for my question
);
create table a_b (
a int not null references a,
b int not null references b
);

Tables a and b have already been filled with lots of rows. Now my
application needs to insert the relationship of a to b into table a_b
and is currently doing it with inserts like the following:

insert into a_b(a,b) values(1,100);
insert into a_b(a,b) values(1,200);
insert into a_b(a,b) values(1,54);
insert into a_b(a,b) values(1,4577);

So for a batch of inserts the value of a stays the same, while for by
arbitrary values are inserted. Now I have wondered if PostreSQL offers
a smarter way to insert those values? A solution can contains usage of
some plpgsql code.
 
It depends on your logic.  If you can write a query that selects out the a and b records, there is a smart way:
 
insert into a_b(a, b)
select 1, b.id from b where b.id in (100, 200, 54, 4577);
 
This is not really smart because you already have the id values - the select may diminish your performance rather than help.  But if the select is "smarter" then you don't even need to know what the b.id values are:
 
insert into a_b(a, b)
select 1, b.id from b where b.somecolumn = 'somevalue';
 
Hope this helps.
 
==================================================================
   Aaron Bono
   Aranya Software Technologies, Inc.
   http://www.aranya.com
   http://codeelixir.com
==================================================================

Re: optimal insert

From
"George Pavlov"
Date:
And don't forget that \COPY and especially COPY are usually much faster
(and, IMHO, easier to compose/maintain) than gobs of INSERTs.


> -----Original Message-----
> From: pgsql-sql-owner@postgresql.org
> [mailto:pgsql-sql-owner@postgresql.org] On Behalf Of Aaron Bono
> Sent: Tuesday, October 10, 2006 1:46 PM
> To: Dirk Jagdmann
> Cc: pgsql-sql@postgresql.org
> Subject: Re: [SQL] optimal insert
>
> On 10/8/06, Dirk Jagdmann <jagdmann@gmail.com> wrote:
>
>     Hello experts,
>
>     I have a database that contains three tables:
>
>     create table a (
>     id serial primary key,
>     ... -- some more fields not relevant for my question
>     );
>     create table b (
>     id serial primary key,
>     ... -- some more fields not relevant for my question
>     );
>     create table a_b (
>     a int not null references a,
>     b int not null references b
>     );
>
>     Tables a and b have already been filled with lots of
> rows. Now my
>     application needs to insert the relationship of a to b
> into table a_b
>     and is currently doing it with inserts like the following:
>
>     insert into a_b(a,b) values(1,100);
>     insert into a_b(a,b) values(1,200);
>     insert into a_b(a,b) values(1,54);
>     insert into a_b(a,b) values(1,4577);
>
>     So for a batch of inserts the value of a stays the
> same, while for by
>     arbitrary values are inserted. Now I have wondered if
> PostreSQL offers
>     a smarter way to insert those values? A solution can
> contains usage of
>     some plpgsql code.
>
>
> It depends on your logic.  If you can write a query that
> selects out the a and b records, there is a smart way:
>
> insert into a_b(a, b)
> select 1, b.id from b where b.id in (100, 200, 54, 4577);
>
> This is not really smart because you already have the id
> values - the select may diminish your performance rather than
> help.  But if the select is "smarter" then you don't even
> need to know what the b.id <http://b.id>  values are:
>
> insert into a_b(a, b)
> select 1, b.id from b where b.somecolumn = 'somevalue';
>
> Hope this helps.
>
>
>
> ==================================================================
>        Aaron Bono
>        Aranya Software Technologies, Inc.
>        http://www.aranya.com
>        http://codeelixir.com
>
> ==================================================================
>
>


Re: optimal insert

From
"Dirk Jagdmann"
Date:
Hello Aaron,

thank you for your suggestion. I will have to think if something
similar would be of any benefit for my data.

-- 
---> Dirk Jagdmann
----> http://cubic.org/~doj
-----> http://llg.cubic.org


Re: optimal insert

From
"Dirk Jagdmann"
Date:
Hello George,

> And don't forget that \COPY and especially COPY are usually much faster
> (and, IMHO, easier to compose/maintain) than gobs of INSERTs.

I did not forget, but my application uses embedded SQL (with the epcg
preprocessor) and I don't think it can handle COPYs :(

-- 
---> Dirk Jagdmann
----> http://cubic.org/~doj
-----> http://llg.cubic.org


Re: optimal insert

From
Markus Schaber
Date:
Hi, Dirk,

Dirk Jagdmann wrote:

> So for a batch of inserts the value of a stays the same, while for by
> arbitrary values are inserted. Now I have wondered if PostreSQL offers
> a smarter way to insert those values? A solution can contains usage of
> some plpgsql code.

[local]:postgres=# insert into a_b (a,b) SELECT 1,
(ARRAY[100,200,54,4577])[i] FROM generate_series(1,4) AS i;
INSERT 0 4

[local]:postgres=# select * from a_b;a |  b
---+------1 |  1001 |  2001 |   541 | 4577

But I tend to agree with others that packing a whole bunch of INSERTs
into a single transaction, and using COPY instead will give you better
benefits.

HTH,
Markus
-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf.     | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org