Thread: How many insert + update should one transaction handle?

How many insert + update should one transaction handle?

From
Yonatan Ben-Nes
Date:
Hi all,

Every few days I need to DELETE all of the content of few tables and
INSERT new data in them.
The amount of new data is about 5 million rows and each row get about 3
queries (INSERT + UPDATE).
Now because I need the old data to be displayed till all of the new data
will be available I'm doing all of the process of deleting the old
content and inserting the new one in one transaction.
Should I divide the insertion so ill insert the new data into a
temporary table and the transaction should be commited every
100,1000,10000 whatever queries? or maybe it doesnt matter to the server
whats the size of the transaction and its ok to handle such a process in
one transaction?

Thanks alot in advance,
   Yonatan Ben-Nes

Re: How many insert + update should one transaction handle?

From
Bruno Wolff III
Date:
On Fri, Sep 23, 2005 at 12:51:09 +0200,
  Yonatan Ben-Nes <da@canaan.co.il> wrote:
> Hi all,
>
> Every few days I need to DELETE all of the content of few tables and
> INSERT new data in them.
> The amount of new data is about 5 million rows and each row get about 3
> queries (INSERT + UPDATE).
> Now because I need the old data to be displayed till all of the new data
> will be available I'm doing all of the process of deleting the old
> content and inserting the new one in one transaction.
> Should I divide the insertion so ill insert the new data into a
> temporary table and the transaction should be commited every
> 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> whats the size of the transaction and its ok to handle such a process in
> one transaction?

Doesn't breaking this into multiple transactions defeat your stated intent
of keeping the old data visible until all of the need data is loaded?
You should be able to do this all in one transaction. If this is a one shot
deal you might want to use vacuum full or cluster to reclaim disk space.
If it is something you regularly, then a plain vacuum with adequate FSM
setting should be adequate. You will also want to do an analyze which you
can piggyback on the vacuum.

Re: How many insert + update should one transaction handle?

From
Yonatan Ben-Nes
Date:
Bruno Wolff III wrote:
> On Fri, Sep 23, 2005 at 12:51:09 +0200,
>   Yonatan Ben-Nes <da@canaan.co.il> wrote:
>
>>Hi all,
>>
>>Every few days I need to DELETE all of the content of few tables and
>>INSERT new data in them.
>>The amount of new data is about 5 million rows and each row get about 3
>>queries (INSERT + UPDATE).
>>Now because I need the old data to be displayed till all of the new data
>>will be available I'm doing all of the process of deleting the old
>>content and inserting the new one in one transaction.
>>Should I divide the insertion so ill insert the new data into a
>>temporary table and the transaction should be commited every
>>100,1000,10000 whatever queries? or maybe it doesnt matter to the server
>>whats the size of the transaction and its ok to handle such a process in
>>one transaction?
>
>
> Doesn't breaking this into multiple transactions defeat your stated intent
> of keeping the old data visible until all of the need data is loaded?
> You should be able to do this all in one transaction. If this is a one shot
> deal you might want to use vacuum full or cluster to reclaim disk space.
> If it is something you regularly, then a plain vacuum with adequate FSM
> setting should be adequate. You will also want to do an analyze which you
> can piggyback on the vacuum.


Well if ill use multiple transactions then ill insert them into a
temporary table as mentioned and only when all of the process will end
ill replace the old data with the new data.
But actually it doesnt really matter anyway if its ok to do one big
transaction.

Thanks Bruno,
   Yonatan Ben-Nes

Re: How many insert + update should one transaction

From
Scott Marlowe
Date:
On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
> Hi all,
>
> Every few days I need to DELETE all of the content of few tables and
> INSERT new data in them.
> The amount of new data is about 5 million rows and each row get about 3
> queries (INSERT + UPDATE).
> Now because I need the old data to be displayed till all of the new data
> will be available I'm doing all of the process of deleting the old
> content and inserting the new one in one transaction.
> Should I divide the insertion so ill insert the new data into a
> temporary table and the transaction should be commited every
> 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> whats the size of the transaction and its ok to handle such a process in
> one transaction?

The only possible issue would be one of capacity, and possibly having a
lot of dead tuples laying about.

If you have 5 million rows, and you update every one, then you now have
5 million live and 5 million dead tuples in your database.  A Vacuum
full will take quite a while.

If you're fsm is set large enough, then as long as you vacuum (regular,
non full vacuum) between these transactions, then the 5 million dead
tuples should get reused.  however, the performance of your database
will for selects and such will be like it was a 10 million row database.

Given that you NEED to have all 10 million tuples in the database at the
same time, the use of a temp / holding table would allow you to truncate
the main table, move everything into the main table, and then drop /
truncate the temp / holding table.

If you truncate the main table, then initiate another transaction to
move the data into it, it shouldn't be so bloated, but the down side is
you'll have a period of time when it appears empty to users.

So, the real question is whether or not you can afford to have an empty
table at some point in the process.

If you can't, then either method (running the whole transaction against
the one table or using the temp / holding table) are equivalent.  If you
can, there should be a noticeable gain from the method of truncating the
main table outside the update transaction.

If you need that table to always have the old or new tuples (i.e. never
be empty) and you can afford the very lengthy vacuum full on the 5
million dead rows, then that method will give you the best select
performance the rest of the day.

Re: How many insert + update should one transaction handle?

From
Dawid Kuroczko
Date:
On 9/23/05, Yonatan Ben-Nes <da@canaan.co.il> wrote:
Hi all,

Every few days I need to DELETE all of the content of few tables and
INSERT new data in them.
The amount of new data is about 5 million rows and each row get about 3
queries (INSERT + UPDATE).

If I understand you well, in transaction you do:
DELETE FROM table; -- remove all columns, you have a table with 5m dead tuples
then you:
INSERT into the table, -- you have 5m dead tuples + 5m new tuples
then you:
UPDATE what you've inserted -- you have 10m dead tuples + 5m new tuples
and then you:
UPDATE once again what you've inserted -- you have 15m dead tuples + 5 new tuples
and then COMMIT;

Alternatively you want to do:
CREATE TEMPORARY temp_table and UPDATE it as you please
BEGIN;
DELETE FROM table;
SELECT INTO table * FROM temp_table;
COMMIT; -- which will leave you with 5m dead and 5m live.

Or you could even try (haven't tested it):
BEGIN;
CREATE new_table;
SELECT INTO new_table * FROM temp_table;
DROP TABLE table;
ALTER TABLE new_table RENAME TO table;
COMMIT; -- leaving you with fresh 5mln new tuples table
...with a risk of loosing all the changes made to old table after BEGIN;

   Regards,
     Dawid

Re: How many insert + update should one transaction

From
Robert Treat
Date:
On Fri, 2005-09-23 at 14:53, Dawid Kuroczko wrote:
> On 9/23/05, Yonatan Ben-Nes < da@canaan.co.il <mailto:da@canaan.co.il> >
> wrote:
>
> Hi all,
>
> Every few days I need to DELETE all of the content of few tables and
> INSERT new data in them.
> The amount of new data is about 5 million rows and each row get about 3
> queries (INSERT + UPDATE).
>
<snip>
> Or you could even try (haven't tested it):
> BEGIN;
> CREATE new_table;
> SELECT INTO new_table * FROM temp_table;
> DROP TABLE table;
> ALTER TABLE new_table RENAME TO table;
> COMMIT; -- leaving you with fresh 5mln new tuples table
> ...with a risk of loosing all the changes made to old table after BEGIN;
>

yeah, i was thinking

create newtable;
~~ load data into newtable

begin;
  drop oldtable;
  alter table newtable rename to oldtable
commit;

this seperates the data loading piece from the piece where you promote
the data to live data, plus then the time you have to hold the
transaction open is only for the drop and rename, which will be quite
fast.

the only potential issues would be making sure you dont have FK/View
type issues, but it doesn't sound like it would apply here.


Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL


Re: How many insert + update should one transaction

From
"Jim C. Nasby"
Date:
Just remember the first rule of performance tuning: don't.

Unless you *know* having the dead rows will be an issue, you will almost
certainly be best off going the simple, straightforward route.

On Fri, Sep 23, 2005 at 10:49:00AM -0500, Scott Marlowe wrote:
> On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
> > Hi all,
> >
> > Every few days I need to DELETE all of the content of few tables and
> > INSERT new data in them.
> > The amount of new data is about 5 million rows and each row get about 3
> > queries (INSERT + UPDATE).
> > Now because I need the old data to be displayed till all of the new data
> > will be available I'm doing all of the process of deleting the old
> > content and inserting the new one in one transaction.
> > Should I divide the insertion so ill insert the new data into a
> > temporary table and the transaction should be commited every
> > 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> > whats the size of the transaction and its ok to handle such a process in
> > one transaction?
>
> The only possible issue would be one of capacity, and possibly having a
> lot of dead tuples laying about.
>
> If you have 5 million rows, and you update every one, then you now have
> 5 million live and 5 million dead tuples in your database.  A Vacuum
> full will take quite a while.
>
> If you're fsm is set large enough, then as long as you vacuum (regular,
> non full vacuum) between these transactions, then the 5 million dead
> tuples should get reused.  however, the performance of your database
> will for selects and such will be like it was a 10 million row database.
>
> Given that you NEED to have all 10 million tuples in the database at the
> same time, the use of a temp / holding table would allow you to truncate
> the main table, move everything into the main table, and then drop /
> truncate the temp / holding table.
>
> If you truncate the main table, then initiate another transaction to
> move the data into it, it shouldn't be so bloated, but the down side is
> you'll have a period of time when it appears empty to users.
>
> So, the real question is whether or not you can afford to have an empty
> table at some point in the process.
>
> If you can't, then either method (running the whole transaction against
> the one table or using the temp / holding table) are equivalent.  If you
> can, there should be a noticeable gain from the method of truncating the
> main table outside the update transaction.
>
> If you need that table to always have the old or new tuples (i.e. never
> be empty) and you can afford the very lengthy vacuum full on the 5
> million dead rows, then that method will give you the best select
> performance the rest of the day.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: How many insert + update should one transaction handle?

From
Yonatan Ben-Nes
Date:
Robert Treat wrote:
> On Fri, 2005-09-23 at 14:53, Dawid Kuroczko wrote:
>
>>On 9/23/05, Yonatan Ben-Nes < da@canaan.co.il <mailto:da@canaan.co.il> >
>>wrote:
>>
>>Hi all,
>>
>>Every few days I need to DELETE all of the content of few tables and
>>INSERT new data in them.
>>The amount of new data is about 5 million rows and each row get about 3
>>queries (INSERT + UPDATE).
>>
>
> <snip>
>
>>Or you could even try (haven't tested it):
>>BEGIN;
>>CREATE new_table;
>>SELECT INTO new_table * FROM temp_table;
>>DROP TABLE table;
>>ALTER TABLE new_table RENAME TO table;
>>COMMIT; -- leaving you with fresh 5mln new tuples table
>>...with a risk of loosing all the changes made to old table after BEGIN;
>>
>
>
> yeah, i was thinking
>
> create newtable;
> ~~ load data into newtable
>
> begin;
>   drop oldtable;
>   alter table newtable rename to oldtable
> commit;
>
> this seperates the data loading piece from the piece where you promote
> the data to live data, plus then the time you have to hold the
> transaction open is only for the drop and rename, which will be quite
> fast.
>
> the only potential issues would be making sure you dont have FK/View
> type issues, but it doesn't sound like it would apply here.
>
>
> Robert Treat

Sorry everyone for not responding... I just didnt know that the
discussion continued :)

Anyway I saw the idea:
BEGIN;
CREATE new_table;
SELECT INTO new_table * FROM temp_table;
DROP TABLE table;
ALTER TABLE new_table RENAME TO table;
COMMIT;

Where if I understood correctly "table" is the final table, "temp_table"
is the table that receive all the proccess and at the end of it got
10mil delete tuples and 5mil active and finally "new_table" is the
receipent of all of the active tuples from "temp_table".

Its looking quite promising to me but I did alittle check and saw that
between the drop table command & the commit I get a lock on the table
(obvious but problematic to a 24/7 site) so im wondering to myself how
much time such a transaction will take from the drop command point?

If users wont be able to access the table for some extremly small amount
of time (less then a second obviously) then though I dont like it much
it is better then running a vacuum full which will slow all my server
for a considerable amount of time...

So anyone know how much time does such a process take? (tried to explain
analyze it with no success :)).

Thanks alot everyone!
   Ben-Nes Yonatan

Re: How many insert + update should one transaction handle?

From
"Jim C. Nasby"
Date:
On Mon, Sep 26, 2005 at 08:41:03PM +0200, Yonatan Ben-Nes wrote:

> Anyway I saw the idea:
> BEGIN;
> CREATE new_table;
> SELECT INTO new_table * FROM temp_table;
> DROP TABLE table;
> ALTER TABLE new_table RENAME TO table;
> COMMIT;
>
> Where if I understood correctly "table" is the final table, "temp_table"
> is the table that receive all the proccess and at the end of it got
> 10mil delete tuples and 5mil active and finally "new_table" is the
> receipent of all of the active tuples from "temp_table".
>
> Its looking quite promising to me but I did alittle check and saw that
> between the drop table command & the commit I get a lock on the table
> (obvious but problematic to a 24/7 site) so im wondering to myself how
> much time such a transaction will take from the drop command point?
>
> If users wont be able to access the table for some extremly small amount
> of time (less then a second obviously) then though I dont like it much
> it is better then running a vacuum full which will slow all my server
> for a considerable amount of time...
>
> So anyone know how much time does such a process take? (tried to explain
> analyze it with no success :)).

Based on http://lnk.nu/developer.postgresql.org/44b.c, line 1478 on,
there's not a lot that happens during the ALTER TABLE. Likewise DROP
(line 517) doesn't do much either. So basically, anything trying to
access the old table will block for a while waiting for the update to
happen.

But keep in mind that 'a while' will depend on what's happening on the
system. Imagine...

Start long transaction involving table
Run code above; drop aquires lock on table

Everything else against table will now block, waiting for the DROP to
happen.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: How many insert + update should one transaction handle?

From
Gnanavel S
Date:

On 9/27/05, Jim C. Nasby <jnasby@pervasive.com> wrote:
On Mon, Sep 26, 2005 at 08:41:03PM +0200, Yonatan Ben-Nes wrote:

> Anyway I saw the idea:
> BEGIN;
> CREATE new_table;
> SELECT INTO new_table * FROM temp_table;
> DROP TABLE table;

Instead of dropping it here, just rename to a different name and  then after  doing the  next  step  drop  the table.

> ALTER TABLE new_table RENAME TO table;
> COMMIT;

you can do like this,

BEGIN;
CREATE new_table;
SELECT INTO new_table * FROM temp_table;
ALTER TABLE table RENAME TO temp_table_orig;
ALTER TABLE new_table RENAME TO table;
COMMIT;
drop table temp_table_orig;
 

>
> Where if I understood correctly "table" is the final table, "temp_table"
> is the table that receive all the proccess and at the end of it got
> 10mil delete tuples and 5mil active and finally "new_table" is the
> receipent of all of the active tuples from "temp_table".
>
> Its looking quite promising to me but I did alittle check and saw that
> between the drop table command & the commit I get a lock on the table
> (obvious but problematic to a 24/7 site) so im wondering to myself how
> much time such a transaction will take from the drop command point?
>
> If users wont be able to access the table for some extremly small amount
> of time (less then a second obviously) then though I dont like it much
> it is better then running a vacuum full which will slow all my server
> for a considerable amount of time...
>
> So anyone know how much time does such a process take? (tried to explain
> analyze it with no success :)).

Based on http://lnk.nu/developer.postgresql.org/44b.c, line 1478 on,
there's not a lot that happens during the ALTER TABLE. Likewise DROP
(line 517) doesn't do much either. So basically, anything trying to
access the old table will block for a while waiting for the update to
happen.

But keep in mind that 'a while' will depend on what's happening on the
system. Imagine...

Start long transaction involving table
Run code above; drop aquires lock on table

Everything else against table will now block, waiting for the DROP to
happen.
--
Jim C. Nasby, Sr. Engineering Consultant       jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf        cell: 512-569-9461

---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend



--
with regards,
S.Gnanavel
Satyam Computer Services Ltd.

Re: How many insert + update should one transaction handle?

From
Yonatan Ben-Nes
Date:
Jim C. Nasby wrote:
> On Mon, Sep 26, 2005 at 08:41:03PM +0200, Yonatan Ben-Nes wrote:
>
>
>>Anyway I saw the idea:
>>BEGIN;
>>CREATE new_table;
>>SELECT INTO new_table * FROM temp_table;
>>DROP TABLE table;
>>ALTER TABLE new_table RENAME TO table;
>>COMMIT;
>>
>>Where if I understood correctly "table" is the final table, "temp_table"
>>is the table that receive all the proccess and at the end of it got
>>10mil delete tuples and 5mil active and finally "new_table" is the
>>receipent of all of the active tuples from "temp_table".
>>
>>Its looking quite promising to me but I did alittle check and saw that
>>between the drop table command & the commit I get a lock on the table
>>(obvious but problematic to a 24/7 site) so im wondering to myself how
>>much time such a transaction will take from the drop command point?
>>
>>If users wont be able to access the table for some extremly small amount
>>of time (less then a second obviously) then though I dont like it much
>>it is better then running a vacuum full which will slow all my server
>>for a considerable amount of time...
>>
>>So anyone know how much time does such a process take? (tried to explain
>>analyze it with no success :)).
>
>
> Based on http://lnk.nu/developer.postgresql.org/44b.c, line 1478 on,
> there's not a lot that happens during the ALTER TABLE. Likewise DROP
> (line 517) doesn't do much either. So basically, anything trying to
> access the old table will block for a while waiting for the update to
> happen.
>
> But keep in mind that 'a while' will depend on what's happening on the
> system. Imagine...
>
> Start long transaction involving table
> Run code above; drop aquires lock on table
>
> Everything else against table will now block, waiting for the DROP to
> happen.

Jim unless I didnt understand you I think that at my case I wont need to
make any long transaction which will handle the DROP & renaming of tables.
I will actually have 2 transactions now:
1. which will handle the INSERT + UPDATE of the data into the temp_table
& at the end will move all of the new data (without the deleted tuples)
to the new_table and create its indexes.
2. the second transaction will only handle the drop & renaming of the 2
tables (new_table & table);

The question is whats the expected time for the second transaction to
run? will it create problems to the constant availability of the site?

S.Gnanavel I tried your idea but sadly it gives me the same block as I
would DROP the table and not RENAME it.

Thanks everyone again,
   Ben-Nes Yonatan

Re: How many insert + update should one transaction handle?

From
"Jim C. Nasby"
Date:
On Tue, Sep 27, 2005 at 01:34:37PM +0200, Yonatan Ben-Nes wrote:
> >Based on http://lnk.nu/developer.postgresql.org/44b.c, line 1478 on,
> >there's not a lot that happens during the ALTER TABLE. Likewise DROP
> >(line 517) doesn't do much either. So basically, anything trying to
> >access the old table will block for a while waiting for the update to
> >happen.
> >
> >But keep in mind that 'a while' will depend on what's happening on the
> >system. Imagine...
> >
> >Start long transaction involving table
> >Run code above; drop aquires lock on table
> >
> >Everything else against table will now block, waiting for the DROP to
> >happen.
>
> Jim unless I didnt understand you I think that at my case I wont need to
> make any long transaction which will handle the DROP & renaming of tables.
> I will actually have 2 transactions now:
> 1. which will handle the INSERT + UPDATE of the data into the temp_table
> & at the end will move all of the new data (without the deleted tuples)
> to the new_table and create its indexes.
> 2. the second transaction will only handle the drop & renaming of the 2
> tables (new_table & table);
>
> The question is whats the expected time for the second transaction to
> run? will it create problems to the constant availability of the site?
>
> S.Gnanavel I tried your idea but sadly it gives me the same block as I
> would DROP the table and not RENAME it.

Like I said, neither the ALTER or the DROP do much themselves. There is
a slight advantage to Gnanavel's plan in that the ALTER won't wait for
the filesystem to delete the files from the disk, but I *think* that
DROP will. So it will be slightly faster.

In either case, if a user is running a long transaction on the old table
when you try and rename/drop it, that is going to completely swamp the
effects of everything else. So, if you have some long-running queries
that use that table, there is going to be a noticable delay to the
system. If you don't have any queries like that, then this should work
just fine.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: How many insert + update should one transaction handle?

From
Ben-Nes Yonatan
Date:
Jim C. Nasby wrote:

>On Tue, Sep 27, 2005 at 01:34:37PM +0200, Yonatan Ben-Nes wrote:
>
>
>>>Based on http://lnk.nu/developer.postgresql.org/44b.c, line 1478 on,
>>>there's not a lot that happens during the ALTER TABLE. Likewise DROP
>>>(line 517) doesn't do much either. So basically, anything trying to
>>>access the old table will block for a while waiting for the update to
>>>happen.
>>>
>>>But keep in mind that 'a while' will depend on what's happening on the
>>>system. Imagine...
>>>
>>>Start long transaction involving table
>>>Run code above; drop aquires lock on table
>>>
>>>Everything else against table will now block, waiting for the DROP to
>>>happen.
>>>
>>>
>>Jim unless I didnt understand you I think that at my case I wont need to
>>make any long transaction which will handle the DROP & renaming of tables.
>>I will actually have 2 transactions now:
>>1. which will handle the INSERT + UPDATE of the data into the temp_table
>>& at the end will move all of the new data (without the deleted tuples)
>>to the new_table and create its indexes.
>>2. the second transaction will only handle the drop & renaming of the 2
>>tables (new_table & table);
>>
>>The question is whats the expected time for the second transaction to
>>run? will it create problems to the constant availability of the site?
>>
>>S.Gnanavel I tried your idea but sadly it gives me the same block as I
>>would DROP the table and not RENAME it.
>>
>>
>
>Like I said, neither the ALTER or the DROP do much themselves. There is
>a slight advantage to Gnanavel's plan in that the ALTER won't wait for
>the filesystem to delete the files from the disk, but I *think* that
>DROP will. So it will be slightly faster.
>
>In either case, if a user is running a long transaction on the old table
>when you try and rename/drop it, that is going to completely swamp the
>effects of everything else. So, if you have some long-running queries
>that use that table, there is going to be a noticable delay to the
>system. If you don't have any queries like that, then this should work
>just fine.
>
>

Thanks for the help, now ill just have to think what will be better at
my case.

Shana Tova Everyone! (Happy new year in hebrew :))
  Ben-Nes Yonatan