Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid
Date
Msg-id 4D22EC00.10808@2ndquadrant.com
Whole thread Raw
In response to Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Responses Re: Re: new patch of MERGE (merge_204) & a question about duplicated ctid
List pgsql-hackers
Heikki Linnakangas wrote:
> You can of course LOCK TABLE as a work-around, if that's what you want.

What I was trying to suggest upthread is that while there are other
possible ways around this problem, the only one that has any hope of
shipping with 9.1 is to do just that.  So from my perspective, the rest
of the discussion about the right way to proceed is moot for now.

For some reason it didn't hit me until you said this that I could do the
locking manually in my test case, without even touching the server-side
code yet.  Attached are a new pair of scripts where each pgbench UPDATE
statement executes an explicit LOCK TABLE.  Here's the result of a
sample run here:

$ pgbench -f update-merge.sql -T 60 -c 16 -j 4 -s 2 pgbench
starting vacuum...end.
transaction type: Custom query
scaling factor: 2
query mode: simple
number of clients: 16
number of threads: 4
duration: 60 s
number of transactions actually processed: 84375
tps = 1405.953672 (including connections establishing)
tps = 1406.137456 (excluding connections establishing)
$ psql -c 'select count(*) as updated FROM pgbench_accounts WHERE NOT
abalance=0' -d pgbench
 updated
---------
   68897
(1 row)

$ psql -c 'select count(*) as inserted FROM pgbench_accounts WHERE aid >
100000' -d pgbench
 inserted
----------
    34497
(1 row)

No assertion crashes, no duplicate key failures.  All the weird stuff I
was running into is gone, so decent evidence the worst of the problems
were all because the heavy lock I expecting just wasn't integrated into
the patch.  Congratulations to Boxuan:  for the first time this is
starting to act like a viable feature addition to me, just one with a
moderately long list of limitations and performance issues.

1400 TPS worth of UPSERT on my modest 8-core desktop (single drive with
cheating fsync) isn't uselessly slow.  If I add "SET TRANSACTION
ISOLATION LEVEL SERIALIZABLE;" just after the BEGIN;, I don't see any
serialization errors, and performance is exactly the same.

Run a straight UPDATE over only the existing range of keys, and I get
7000 TPS instead.  So the locking etc. is reducing performance to 20% of
its normal rate, on this assertion+debug build.  I can run this tomorrow
(err, later today I guess looking at the time) on a proper system with
BBWC and without asseritions to see if the magnitude of the difference
changes, but I don't think that's the main issue here.

Presuming the code quality issues and other little quirks I've
documented (and new ones yet to be discovered) can get resolved here,
and that's a sizeable open question, I could see shipping this with the
automatic heavy LOCK TABLE in there.  Then simple UPSERT could work out
of the box via a straightforward MERGE.  We'd need a big warning
disclaiming that concurrent performance is very limited in this first
release of the feature, but I don't know that this is at the
unacceptable level of slow for smaller web apps and such.

Until proper fine-grained concurrency is implemented, I think it would
be PR suicide to release a version of this without a full table lock
happening automatically though.  The idea Robert advocated well, that it
would be possible for advanced users to use even this rough feature in a
smarter way to avoid conflicts and not suffer the full performance
penalty, is true.  But if you consider the main purpose here to be
making it easier to get smaller MySQL apps and the like ported to
PostgreSQL (which is what I see as goal #1), putting that burden on the
user is just going to reinforce the old "PostgreSQL is so much harder
than MySQL" stereotype.  I'd much prefer to see everyone have a slow but
simple to use UPSERT via MERGE available initially, rather than to worry
about optimizing for the advanced user in a way that makes life harder
for the newbies.  The sort of people who must have optimal performance
already have trigger functions available to them, that they can write
and tweak for best performance.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services and Support        www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books

\set nbranches :scale
\set ntellers 10 * :scale
\set naccounts 100000 * :scale
\setrandom aid 1 :naccounts
\setrandom bid 1 :nbranches
\setrandom tid 1 :ntellers
\setrandom delta -5000 5000
BEGIN;

-- Optional mode change
-- SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;

LOCK TABLE pgbench_accounts;
MERGE INTO pgbench_accounts t USING (SELECT :aid,1+(:aid / 1000000)::integer,:delta,'') AS s(aid,bid,balance,filler) ON
s.aid=t.aidWHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED THEN INSERT
VALUES(s.aid,s.bid,s.balance,s.filler);
COMMIT;

-- This syntax worked with MERGE v203 patch, but isn't compatible with v204
--MERGE INTO pgbench_accounts t USING (VALUES (:aid,1+(:aid / 1000000)::integer,:delta,'')) AS
s(aid,bid,balance,filler)ON s.aid=t.aid WHEN MATCHED THEN UPDATE SET abalance=abalance + s.balance WHEN NOT MATCHED
THENINSERT VALUES(s.aid,s.bid,s.balance,s.filler); 

Attachment

pgsql-hackers by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: back branches vs. VS 2008
Next
From: Hannu Krosing
Date:
Subject: Re: pg_dump --split patch