Re: How many insert + update should one transaction - Mailing list pgsql-general

From Scott Marlowe
Subject Re: How many insert + update should one transaction
Date
Msg-id 1127490540.30825.99.camel@state.g2switchworks.com
Whole thread Raw
In response to How many insert + update should one transaction handle?  (Yonatan Ben-Nes <da@canaan.co.il>)
Responses Re: How many insert + update should one transaction
List pgsql-general
On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
> Hi all,
>
> Every few days I need to DELETE all of the content of few tables and
> INSERT new data in them.
> The amount of new data is about 5 million rows and each row get about 3
> queries (INSERT + UPDATE).
> Now because I need the old data to be displayed till all of the new data
> will be available I'm doing all of the process of deleting the old
> content and inserting the new one in one transaction.
> Should I divide the insertion so ill insert the new data into a
> temporary table and the transaction should be commited every
> 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> whats the size of the transaction and its ok to handle such a process in
> one transaction?

The only possible issue would be one of capacity, and possibly having a
lot of dead tuples laying about.

If you have 5 million rows, and you update every one, then you now have
5 million live and 5 million dead tuples in your database.  A Vacuum
full will take quite a while.

If you're fsm is set large enough, then as long as you vacuum (regular,
non full vacuum) between these transactions, then the 5 million dead
tuples should get reused.  however, the performance of your database
will for selects and such will be like it was a 10 million row database.

Given that you NEED to have all 10 million tuples in the database at the
same time, the use of a temp / holding table would allow you to truncate
the main table, move everything into the main table, and then drop /
truncate the temp / holding table.

If you truncate the main table, then initiate another transaction to
move the data into it, it shouldn't be so bloated, but the down side is
you'll have a period of time when it appears empty to users.

So, the real question is whether or not you can afford to have an empty
table at some point in the process.

If you can't, then either method (running the whole transaction against
the one table or using the temp / holding table) are equivalent.  If you
can, there should be a noticeable gain from the method of truncating the
main table outside the update transaction.

If you need that table to always have the old or new tuples (i.e. never
be empty) and you can afford the very lengthy vacuum full on the 5
million dead rows, then that method will give you the best select
performance the rest of the day.

pgsql-general by date:

Previous
From: Dawid Kuroczko
Date:
Subject: Re: COPY - permission denied
Next
From: ruben
Date:
Subject: Re: SQL command to dump the contents of table failed: PQendcopy()