Re: MERGE vs REPLACE - Mailing list pgsql-hackers

From mark@mark.mielke.cc
Subject Re: MERGE vs REPLACE
Date
Msg-id 20051115184648.GA15498@mark.mielke.cc
Whole thread Raw
In response to Re: MERGE vs REPLACE  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
On Tue, Nov 15, 2005 at 10:27:10AM -0800, Josh Berkus wrote:
> Yes, I guess my hesitation on the full-table-lock strategy is that it 
> doesn't really fulfill the mandate for why people want REPLACE-like 
> statements ... to give them an INSERT-or-UPDATE with *higher* efficiency 
> and concurrency than doing two statements.  That being said, I've 
> personally designed more than a dozen web applications and have not yet 
> been faced with a single circumstance of not knowing whether I wanted to 
> INSERT or UPDATE.  I've even ported MySQL apps and found it easy to 
> re-code them to do "if $id = 0, then insert ..." without even needing to 
> use a pl/pgsql hack.

> So we thus have two seperate use cases.  The first, for bulk loading/ETL is 
> what MERGE fulfills rather neatly and for that full table locking is 
> perfectly OK, even desirable.  You really don't want to MERGE-load the 
> same table on two threads at once.  

Yes.

> The second case is for applications coded for MySQL; this is the REPLACE 
> case.  However, the most common MySQL applications doing this use full 
> table locking (MyISAM) anyway!  So, while full table locking wouldn't gain 
> them any performance over using two statements, it shouldn't lose them 
> anything they're used to having.

The last two times I wished to use it:

Objects with dynamic fields. The table rows are identified by the
object key, a field key, that refer to a text value. I believe I
still have a race in this case, as I allow INSERT, UPDATE, and DELETE.
The DELETE means that an UPDATE, followed by an INSERT, may fail,
if the DELETE occurs between the two. DELETE is fine as is, as it
is a single operation, that won't fail. Combining UPDATE and INSERT
into one, as per the MySQL REPLACE, would eliminate my race caused
by the DELETE. If REPLACE locked the whole table, it would not be a
candidate, as this table is frequently modified by many users at
once. REPLACE using the primary key index to do locking would be
great.

Timestamping the last signin and access time (~30 seconds error
allowed in the access time to eliminate a majority of the updates
using memcached as we all know how much PostgreSQL hates this) for a
table that doesn't have a 1:1 mapping between the fields associated
with the user, and the row that records their last signin/access time.
The signin/access time can be associated with a context, that is a
part of the primary key. Currently, I rely on the code never deleting
rows, and perform update first, and if the update does not change any
rows, I fall back to insert.

As it is, though, I ended up combining the signin and access time
into the same row, to use one table instead of two, and I'm not sure
that MySQL replace really gives me what I want in this situation. I
have never used MERGE, so can't say whether this would do what I want.
I would not want the solution to lock the entire table, as these updates,
although throttled by the the ~30 seconds error allowed, and memcached
queries, would be potentially performed by hundreds of users every
30 seconds. I can extend the error factor to 60 seconds, but that only
cuts the queries in half. In general, however, the current model does
work fine, and doesn't require replacement. UPDATE will be used most
of the time, and be the correct operation.

Unfortunately, there is a race that I see here. If both UPDATE
operations fail at the same time, then one of the INSERTS will fail.
I don't handle this currently, but perhaps I could argue myself into
considering this an ignorable failure.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



pgsql-hackers by date:

Previous
From: Marcus Engene
Date:
Subject: bind variables, soft vs hard parse
Next
From: Stephen Frost
Date:
Subject: Re: MERGE vs REPLACE