Thread: update/insert, delete/insert efficiency WRT vacuum and MVCC
Is there a difference in PostgreSQL performance between these two different strategies: if(!exec("update foo set bar='blahblah' where name = 'xx'")) exec("insert into foo(name, bar) values('xx','blahblah'"); or exec("delete from foo where name = 'xx'"); exec("insert into foo(name, bar) values('xx','blahblah'"); In my session handler code I can do either, but am curious if it makes any difference. Yes, "name" is unique.
Mark, I don't know how it will exactly works in postgres but my expectations are: Mark Woodward wrote: > Is there a difference in PostgreSQL performance between these two > different strategies: > > > if(!exec("update foo set bar='blahblah' where name = 'xx'")) > exec("insert into foo(name, bar) values('xx','blahblah'"); > or The update code generates new tuple in the datafile and pointer has been changed in the indexfile to the new version of tuple. This action does not generate B-Tree structure changes. If update falls than insert command creates new tuple in the datafile and it adds new item into B-Tree. It should be generate B-Tree node split. > exec("delete from foo where name = 'xx'"); > exec("insert into foo(name, bar) values('xx','blahblah'"); Both commands should generate B-Tree structure modification. I expect that first variant is better, but It should depend on many others things - for examples triggers, other indexes ... REPLACE/UPSERT command solves this problem, but It is still in the TODO list. Zdenek
On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote: > Mark, > I don't know how it will exactly works in postgres but my expectations are: > > Mark Woodward wrote: > >Is there a difference in PostgreSQL performance between these two > >different strategies: > > > > > >if(!exec("update foo set bar='blahblah' where name = 'xx'")) > > exec("insert into foo(name, bar) values('xx','blahblah'"); > >or > > The update code generates new tuple in the datafile and pointer has been > changed in the indexfile to the new version of tuple. This action does > not generate B-Tree structure changes. If update falls than insert > command creates new tuple in the datafile and it adds new item into > B-Tree. It should be generate B-Tree node split. Actually, not true. Both versions will generate a row row and create a new index tuple. The only difference may be that in the update case the may be a ctid link from the old version to the new one, but that's about it... Which is faster will probably depends on what is more common in your DB: row already exists or not. If you know that 99% of the time the row will exist, the update will probably be faster because you'll only execute one query 99% of the time. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
> > >Is there a difference in PostgreSQL performance between these two > > >different strategies: > > > > > > > > >if(!exec("update foo set bar='blahblah' where name = 'xx'")) > > > exec("insert into foo(name, bar) values('xx','blahblah'"); or In pg, this strategy is generally more efficient, since a pk failing insert would create a tx abort and a heap tuple. (so in pg, I would choose the insert first strategy only when the insert succeeds most of the time (say > 95%)) Note however that the above error handling is not enough, because two different sessions can still both end up trying the insert (This is true for all db systems when using this strategy). Andreas
Ühel kenal päeval, T, 2006-07-04 kell 14:53, kirjutas Zeugswetter Andreas DCP SD: > > > >Is there a difference in PostgreSQL performance between these two > > > >different strategies: > > > > > > > > > > > >if(!exec("update foo set bar='blahblah' where name = 'xx'")) > > > > exec("insert into foo(name, bar) values('xx','blahblah'"); or > > In pg, this strategy is generally more efficient, since a pk failing > insert would create > a tx abort and a heap tuple. (so in pg, I would choose the insert first > strategy only when > the insert succeeds most of the time (say > 95%)) > > Note however that the above error handling is not enough, because two > different sessions > can still both end up trying the insert (This is true for all db systems > when using this strategy). I think the recommended strategy is to first try tu UPDATE, if not found then INSERT, if primary key violation on insert, then UPDATE -- ---------------- Hannu Krosing Database Architect Skype Technologies OÜ Akadeemia tee 21 F, Tallinn, 12618, Estonia Skype me: callto:hkrosing Get Skype for free: http://www.skype.com
> On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote: >> Mark, >> I don't know how it will exactly works in postgres but my expectations >> are: >> >> Mark Woodward wrote: >> >Is there a difference in PostgreSQL performance between these two >> >different strategies: >> > >> > >> >if(!exec("update foo set bar='blahblah' where name = 'xx'")) >> > exec("insert into foo(name, bar) values('xx','blahblah'"); >> >or >> >> The update code generates new tuple in the datafile and pointer has been >> changed in the indexfile to the new version of tuple. This action does >> not generate B-Tree structure changes. If update falls than insert >> command creates new tuple in the datafile and it adds new item into >> B-Tree. It should be generate B-Tree node split. > > Actually, not true. Both versions will generate a row row and create a > new index tuple. The only difference may be that in the update case the > may be a ctid link from the old version to the new one, but that's > about it... > > Which is faster will probably depends on what is more common in your DB: > row already exists or not. If you know that 99% of the time the row > will exist, the update will probably be faster because you'll only > execute one query 99% of the time. OK, but the point of the question is that constantly updating a single row steadily degrades performance, would delete/insery also do the same?
Mark Woodward wrote: >>On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote: >> >> >>>Mark, >>>I don't know how it will exactly works in postgres but my expectations >>>are: >>> >>>Mark Woodward wrote: >>> >>> >>>>Is there a difference in PostgreSQL performance between these two >>>>different strategies: >>>> >>>> >>>>if(!exec("update foo set bar='blahblah' where name = 'xx'")) >>>> exec("insert into foo(name, bar) values('xx','blahblah'"); >>>>or >>>> >>>> >>>The update code generates new tuple in the datafile and pointer has been >>>changed in the indexfile to the new version of tuple. This action does >>>not generate B-Tree structure changes. If update falls than insert >>>command creates new tuple in the datafile and it adds new item into >>>B-Tree. It should be generate B-Tree node split. >>> >>> >>Actually, not true. Both versions will generate a row row and create a >>new index tuple. The only difference may be that in the update case the >>may be a ctid link from the old version to the new one, but that's >>about it... >> >>Which is faster will probably depends on what is more common in your DB: >>row already exists or not. If you know that 99% of the time the row >>will exist, the update will probably be faster because you'll only >>execute one query 99% of the time. >> >> > >OK, but the point of the question is that constantly updating a single row >steadily degrades performance, would delete/insery also do the same? > > > If that was the point of the question, you should have said so. And unless I am much mistaken the answer is "of course it will." cheers andrew
> OK, but the point of the question is that constantly updating > a single row steadily degrades performance, would > delete/insery also do the same? Yes, there is currently no difference (so you should do the update). Of course performance only degrades if vaccuum is not setup correctly. Andreas
On Wed, Jul 05, 2006 at 04:59:52PM +0200, Zeugswetter Andreas DCP SD wrote: > > OK, but the point of the question is that constantly updating > > a single row steadily degrades performance, would > > delete/insery also do the same? > Yes, there is currently no difference (so you should do the update). > Of course performance only degrades if vaccuum is not setup correctly. As Martijn pointed out, there are two differences. One almost insignificant having to do with internal linkage. The other that multiples queries are being executed. I would presume with separate query plans, and so on, therefore you should do the update. For the case you are talking about, the difference is: 1) Delete which will always succeed 2) Insert that will probably succeed Vs: 1) Update which if it succeeds, will stop 2) Insert that will probably succeed In the first case, you are always executing two queries. In the second, you can sometimes get away with only one query. Note what other people mentioned, though, that neither of the above is safe against parallel transactions updating or inserting rows with the same key. In both cases, a 'safe' implementation should loop if 2) fails and restart the operation. Cheers, mark -- mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________ . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ | | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada One ring to rule them all, one ring to find them, one ring to bring them all and in the darkness bindthem... http://mark.mielke.cc/
> > Which is faster will probably depends on what is more common in your DB: > > row already exists or not. If you know that 99% of the time the row > > will exist, the update will probably be faster because you'll only > > execute one query 99% of the time. > > OK, but the point of the question is that constantly updating a single row > steadily degrades performance, would delete/insery also do the same? Yes. Delete still creates a dead row. There are programatic ways around this but keeping a delete table that can be truncated at intervals. Joshua D. Drake > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings