Thread: update/insert, delete/insert efficiency WRT vacuum and MVCC

update/insert, delete/insert efficiency WRT vacuum and MVCC

From

"Mark Woodward"

Date:

03 July 2006, 10:09:26

Is there a difference in PostgreSQL performance between these two
different strategies:


if(!exec("update foo set bar='blahblah' where name = 'xx'"))   exec("insert into foo(name, bar)
values('xx','blahblah'");
or
exec("delete from foo where name = 'xx'");
exec("insert into foo(name, bar) values('xx','blahblah'");

In my session handler code I can do either, but am curious if it makes any
difference. Yes, "name" is unique.

Re: update/insert, delete/insert efficiency WRT vacuum and

From

Zdenek Kotala

Date:

04 July 2006, 07:01:58

Mark,
I don't know how it will exactly works in postgres but my expectations are:

Mark Woodward wrote:
> Is there a difference in PostgreSQL performance between these two
> different strategies:
> 
> 
> if(!exec("update foo set bar='blahblah' where name = 'xx'"))
>     exec("insert into foo(name, bar) values('xx','blahblah'");
> or

The update code generates new tuple in the datafile and pointer has been 
changed in the indexfile to the new version of tuple. This action does 
not generate B-Tree structure changes. If update falls than insert 
command creates new tuple in the datafile and it adds new item into 
B-Tree. It should be generate B-Tree node split.

> exec("delete from foo where name = 'xx'");
> exec("insert into foo(name, bar) values('xx','blahblah'");

Both commands should generate B-Tree structure modification.

I expect that first variant is better, but It should depend on many 
others things - for examples triggers, other indexes ...

REPLACE/UPSERT command solves this problem, but It is still in the TODO 
list.
Zdenek

Re: update/insert, delete/insert efficiency WRT vacuum and

From

Martijn van Oosterhout

Date:

04 July 2006, 07:14:48

On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:
> Mark,
> I don't know how it will exactly works in postgres but my expectations are:
>
> Mark Woodward wrote:
> >Is there a difference in PostgreSQL performance between these two
> >different strategies:
> >
> >
> >if(!exec("update foo set bar='blahblah' where name = 'xx'"))
> >    exec("insert into foo(name, bar) values('xx','blahblah'");
> >or
>
> The update code generates new tuple in the datafile and pointer has been
> changed in the indexfile to the new version of tuple. This action does
> not generate B-Tree structure changes. If update falls than insert
> command creates new tuple in the datafile and it adds new item into
> B-Tree. It should be generate B-Tree node split.

Actually, not true. Both versions will generate a row row and create a
new index tuple. The only difference may be that in the update case the
may be a ctid link from the old version to the new one, but that's
about it...

Which is faster will probably depends on what is more common in your DB:
row already exists or not. If you know that 99% of the time the row
will exist, the update will probably be faster because you'll only
execute one query 99% of the time.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

Re: update/insert, delete/insert efficiency WRT vacuum and

From

"Zeugswetter Andreas DCP SD"

Date:

04 July 2006, 09:54:10

> > >Is there a difference in PostgreSQL performance between these two
> > >different strategies:
> > >
> > >
> > >if(!exec("update foo set bar='blahblah' where name = 'xx'"))
> > >    exec("insert into foo(name, bar) values('xx','blahblah'"); or

In pg, this strategy is generally more efficient, since a pk failing
insert would create
a tx abort and a heap tuple. (so in pg, I would choose the insert first
strategy only when
the insert succeeds most of the time (say > 95%))

Note however that the above error handling is not enough, because two
different sessions
can still both end up trying the insert (This is true for all db systems
when using this strategy).

Andreas

Re: update/insert, delete/insert efficiency WRT vacuum

From

Hannu Krosing

Date:

05 July 2006, 07:26:46

Ühel kenal päeval, T, 2006-07-04 kell 14:53, kirjutas Zeugswetter
Andreas DCP SD:
> > > >Is there a difference in PostgreSQL performance between these two 
> > > >different strategies:
> > > >
> > > >
> > > >if(!exec("update foo set bar='blahblah' where name = 'xx'"))
> > > >    exec("insert into foo(name, bar) values('xx','blahblah'"); or
> 
> In pg, this strategy is generally more efficient, since a pk failing
> insert would create
> a tx abort and a heap tuple. (so in pg, I would choose the insert first
> strategy only when 
> the insert succeeds most of the time (say > 95%))
> 
> Note however that the above error handling is not enough, because two
> different sessions
> can still both end up trying the insert (This is true for all db systems
> when using this strategy).

I think the recommended strategy is to first try tu UPDATE, if not found
then INSERT, if primary key violation on insert, then UPDATE


-- 
----------------
Hannu Krosing
Database Architect
Skype Technologies OÜ
Akadeemia tee 21 F, Tallinn, 12618, Estonia

Skype me:  callto:hkrosing
Get Skype for free:  http://www.skype.com

Re: update/insert,

From

"Mark Woodward"

Date:

05 July 2006, 11:25:46

> On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:
>> Mark,
>> I don't know how it will exactly works in postgres but my expectations
>> are:
>>
>> Mark Woodward wrote:
>> >Is there a difference in PostgreSQL performance between these two
>> >different strategies:
>> >
>> >
>> >if(!exec("update foo set bar='blahblah' where name = 'xx'"))
>> >    exec("insert into foo(name, bar) values('xx','blahblah'");
>> >or
>>
>> The update code generates new tuple in the datafile and pointer has been
>> changed in the indexfile to the new version of tuple. This action does
>> not generate B-Tree structure changes. If update falls than insert
>> command creates new tuple in the datafile and it adds new item into
>> B-Tree. It should be generate B-Tree node split.
>
> Actually, not true. Both versions will generate a row row and create a
> new index tuple. The only difference may be that in the update case the
> may be a ctid link from the old version to the new one, but that's
> about it...
>
> Which is faster will probably depends on what is more common in your DB:
> row already exists or not. If you know that 99% of the time the row
> will exist, the update will probably be faster because you'll only
> execute one query 99% of the time.

OK, but the point of the question is that constantly updating a single row
steadily degrades performance, would delete/insery also do the same?

Re: update/insert,

From

Andrew Dunstan

Date:

05 July 2006, 11:36:23

Mark Woodward wrote:

>>On Tue, Jul 04, 2006 at 11:59:27AM +0200, Zdenek Kotala wrote:
>>    
>>
>>>Mark,
>>>I don't know how it will exactly works in postgres but my expectations
>>>are:
>>>
>>>Mark Woodward wrote:
>>>      
>>>
>>>>Is there a difference in PostgreSQL performance between these two
>>>>different strategies:
>>>>
>>>>
>>>>if(!exec("update foo set bar='blahblah' where name = 'xx'"))
>>>>   exec("insert into foo(name, bar) values('xx','blahblah'");
>>>>or
>>>>        
>>>>
>>>The update code generates new tuple in the datafile and pointer has been
>>>changed in the indexfile to the new version of tuple. This action does
>>>not generate B-Tree structure changes. If update falls than insert
>>>command creates new tuple in the datafile and it adds new item into
>>>B-Tree. It should be generate B-Tree node split.
>>>      
>>>
>>Actually, not true. Both versions will generate a row row and create a
>>new index tuple. The only difference may be that in the update case the
>>may be a ctid link from the old version to the new one, but that's
>>about it...
>>
>>Which is faster will probably depends on what is more common in your DB:
>>row already exists or not. If you know that 99% of the time the row
>>will exist, the update will probably be faster because you'll only
>>execute one query 99% of the time.
>>    
>>
>
>OK, but the point of the question is that constantly updating a single row
>steadily degrades performance, would delete/insery also do the same?
>
>  
>


If that was the point of the question, you should have said so.

And unless I am much mistaken the answer is "of course it will."

cheers

andrew

Re: update/insert,

From

"Zeugswetter Andreas DCP SD"

Date:

05 July 2006, 12:00:13

> OK, but the point of the question is that constantly updating
> a single row steadily degrades performance, would
> delete/insery also do the same?

Yes, there is currently no difference (so you should do the update).
Of course performance only degrades if vaccuum is not setup correctly.

Andreas

Re: update/insert,

From

mark@mark.mielke.cc

Date:

05 July 2006, 12:25:57

On Wed, Jul 05, 2006 at 04:59:52PM +0200, Zeugswetter Andreas DCP SD wrote:
> > OK, but the point of the question is that constantly updating 
> > a single row steadily degrades performance, would 
> > delete/insery also do the same?
> Yes, there is currently no difference (so you should do the update).
> Of course performance only degrades if vaccuum is not setup correctly.

As Martijn pointed out, there are two differences. One almost
insignificant having to do with internal linkage. The other that
multiples queries are being executed. I would presume with separate
query plans, and so on, therefore you should do the update.

For the case you are talking about, the difference is:
    1) Delete which will always succeed    2) Insert that will probably succeed

Vs:
    1) Update which if it succeeds, will stop    2) Insert that will probably succeed

In the first case, you are always executing two queries. In the second,
you can sometimes get away with only one query.

Note what other people mentioned, though, that neither of the above is
safe against parallel transactions updating or inserting rows with the
same key.

In both cases, a 'safe' implementation should loop if 2) fails and
restart the operation.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...

                          http://mark.mielke.cc/

Re: update/insert,

From

"Joshua D. Drake"

Date:

05 July 2006, 12:52:14

> > Which is faster will probably depends on what is more common in your DB:
> > row already exists or not. If you know that 99% of the time the row
> > will exist, the update will probably be faster because you'll only
> > execute one query 99% of the time.
>
> OK, but the point of the question is that constantly updating a single row
> steadily degrades performance, would delete/insery also do the same?

Yes. Delete still creates a dead row. There are programatic ways around this
but keeping a delete table that can be truncated at intervals.

Joshua D. Drake


>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings