Re: How many insert + update should one transaction - Mailing list pgsql-general

From Jim C. Nasby
Subject Re: How many insert + update should one transaction
Date
Msg-id 20050924212222.GN7630@pervasive.com
Whole thread Raw
In response to Re: How many insert + update should one transaction  (Scott Marlowe <smarlowe@g2switchworks.com>)
List pgsql-general
Just remember the first rule of performance tuning: don't.

Unless you *know* having the dead rows will be an issue, you will almost
certainly be best off going the simple, straightforward route.

On Fri, Sep 23, 2005 at 10:49:00AM -0500, Scott Marlowe wrote:
> On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote:
> > Hi all,
> >
> > Every few days I need to DELETE all of the content of few tables and
> > INSERT new data in them.
> > The amount of new data is about 5 million rows and each row get about 3
> > queries (INSERT + UPDATE).
> > Now because I need the old data to be displayed till all of the new data
> > will be available I'm doing all of the process of deleting the old
> > content and inserting the new one in one transaction.
> > Should I divide the insertion so ill insert the new data into a
> > temporary table and the transaction should be commited every
> > 100,1000,10000 whatever queries? or maybe it doesnt matter to the server
> > whats the size of the transaction and its ok to handle such a process in
> > one transaction?
>
> The only possible issue would be one of capacity, and possibly having a
> lot of dead tuples laying about.
>
> If you have 5 million rows, and you update every one, then you now have
> 5 million live and 5 million dead tuples in your database.  A Vacuum
> full will take quite a while.
>
> If you're fsm is set large enough, then as long as you vacuum (regular,
> non full vacuum) between these transactions, then the 5 million dead
> tuples should get reused.  however, the performance of your database
> will for selects and such will be like it was a 10 million row database.
>
> Given that you NEED to have all 10 million tuples in the database at the
> same time, the use of a temp / holding table would allow you to truncate
> the main table, move everything into the main table, and then drop /
> truncate the temp / holding table.
>
> If you truncate the main table, then initiate another transaction to
> move the data into it, it shouldn't be so bloated, but the down side is
> you'll have a period of time when it appears empty to users.
>
> So, the real question is whether or not you can afford to have an empty
> table at some point in the process.
>
> If you can't, then either method (running the whole transaction against
> the one table or using the temp / holding table) are equivalent.  If you
> can, there should be a noticeable gain from the method of truncating the
> main table outside the update transaction.
>
> If you need that table to always have the old or new tuples (i.e. never
> be empty) and you can afford the very lengthy vacuum full on the 5
> million dead rows, then that method will give you the best select
> performance the rest of the day.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

pgsql-general by date:

Previous
From: Ron Mayer
Date:
Subject: Re: Backend crash with user defined aggregate
Next
From: "Jim C. Nasby"
Date:
Subject: What is an 'unused item pointer'