Re: Bulk Inserts - Mailing list pgsql-general

From Souvik Bhattacherjee
Subject Re: Bulk Inserts
Date
Msg-id CAANrPSfx6D0qQdh6gEAQA4pcE-gk34EOS4Ac0b1ieaAKngmvZA@mail.gmail.com
Whole thread Raw
In response to Re: Bulk Inserts  (Adrian Klaver <adrian.klaver@aklaver.com>)
List pgsql-general
> All I have left is:

> select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1,
> attr2 into tab2 from tab1;

> That will not create a serial type in the id column though. You can
> attach a sequence to that column. Something like:

> 1) create sequence tab2_id start <max id + 1> owned by tab2.id;

> 2) alter table tab2 alter COLUMN id set default nextval('tab2_id');

Thanks. This is a bit indirect but works fine. Performance wise this turns
out to the best when inserting rows from one table to another (new) table
with a serial id column in the new table.

Best,
-SB

On Tue, Aug 13, 2019 at 11:08 AM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 8/13/19 6:34 AM, Souvik Bhattacherjee wrote:
>  > If the selects are returning more then one row then you are already
>  > doing bulk inserts. If they are returning single rows or you want to
>  > batch them then you need some sort of code to do that. Something
>  > like(pseudo Python like code):
>
>  > attr2_vals= [(10, 20, 30, 40), (50, 60, 70, 80)]
>
>  > for val_batch in attr2_vals:
>          BEGIN
>          for id in val_batch:
>                  insert into tab2 (attr1, attr2) (select attr1, attr2
>                   from tab1 where attr2 = id)
>           COMMIT
>
> For *EXP 1: inserts with multiple txn:*
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> attr2 = 10);
> insert into tab2 (attr1, attr2) (select attr1, attr2 from tab1 where
> attr2 = 20);
>
> tab1 has ~6M rows and there are only two values for the attribute attr2 in
> tab1 which are evenly distributed. So, yes, I guess I'm already doing
> batching
> here.
>
> Also, I ran the following two statements to see if their performances
> are comparable.
> While STMT 1 always runs faster in my machine but their performances
> seem to differ
> by a couple of seconds at most.
>
> STMT 1: select attr1, attr2 into tab2 from tab1;
> STMT 2: insert into tab2 (select attr1, attr2 from tab1);

All I have left is:

select tab2 row_number () OVER (order by attr2, attr1 )AS id, attr1,
attr2 into tab2 from tab1;

That will not create a serial type in the id column though. You can
attach a sequence to that column. Something like:

1) create sequence tab2_id start <max id + 1> owned by tab2.id;

2) alter table tab2 alter COLUMN id set default nextval('tab2_id');



>
> However adding the serial id column as an ALTER TABLE statement actually
> takes more time
> than inserting the tuples, so the combined total time is more than
> double the time taken to insert
> the tuples into tab2 without serial id column.
>
> Best,
> -SB
>
>
>



--
Adrian Klaver
adrian.klaver@aklaver.com

pgsql-general by date:

Previous
From: Luca Ferrari
Date:
Subject: Re: Changing work_mem
Next
From: rihad
Date:
Subject: Re: Changing work_mem