Re: How many insert + update should one transaction - Mailing list pgsql-general
From | Jim C. Nasby |
---|---|
Subject | Re: How many insert + update should one transaction |
Date | |
Msg-id | 20050924212222.GN7630@pervasive.com Whole thread Raw |
In response to | Re: How many insert + update should one transaction (Scott Marlowe <smarlowe@g2switchworks.com>) |
List | pgsql-general |
Just remember the first rule of performance tuning: don't. Unless you *know* having the dead rows will be an issue, you will almost certainly be best off going the simple, straightforward route. On Fri, Sep 23, 2005 at 10:49:00AM -0500, Scott Marlowe wrote: > On Fri, 2005-09-23 at 05:51, Yonatan Ben-Nes wrote: > > Hi all, > > > > Every few days I need to DELETE all of the content of few tables and > > INSERT new data in them. > > The amount of new data is about 5 million rows and each row get about 3 > > queries (INSERT + UPDATE). > > Now because I need the old data to be displayed till all of the new data > > will be available I'm doing all of the process of deleting the old > > content and inserting the new one in one transaction. > > Should I divide the insertion so ill insert the new data into a > > temporary table and the transaction should be commited every > > 100,1000,10000 whatever queries? or maybe it doesnt matter to the server > > whats the size of the transaction and its ok to handle such a process in > > one transaction? > > The only possible issue would be one of capacity, and possibly having a > lot of dead tuples laying about. > > If you have 5 million rows, and you update every one, then you now have > 5 million live and 5 million dead tuples in your database. A Vacuum > full will take quite a while. > > If you're fsm is set large enough, then as long as you vacuum (regular, > non full vacuum) between these transactions, then the 5 million dead > tuples should get reused. however, the performance of your database > will for selects and such will be like it was a 10 million row database. > > Given that you NEED to have all 10 million tuples in the database at the > same time, the use of a temp / holding table would allow you to truncate > the main table, move everything into the main table, and then drop / > truncate the temp / holding table. > > If you truncate the main table, then initiate another transaction to > move the data into it, it shouldn't be so bloated, but the down side is > you'll have a period of time when it appears empty to users. > > So, the real question is whether or not you can afford to have an empty > table at some point in the process. > > If you can't, then either method (running the whole transaction against > the one table or using the temp / holding table) are equivalent. If you > can, there should be a noticeable gain from the method of truncating the > main table outside the update transaction. > > If you need that table to always have the old or new tuples (i.e. never > be empty) and you can afford the very lengthy vacuum full on the 5 > million dead rows, then that method will give you the best select > performance the rest of the day. > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
pgsql-general by date: