improving concurrent transactin commit rate - Mailing list pgsql-hackers

From Sam Mason
Subject improving concurrent transactin commit rate
Date
Msg-id 20090324235242.GO32672@frubble.xen.chris-lamb.co.uk
Whole thread Raw
Responses Re: improving concurrent transactin commit rate  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: improving concurrent transactin commit rate  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: improving concurrent transactin commit rate  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-hackers
Hi,

I had an idea while going home last night and still can't think why it's
not implemented already as it seems obvious.

The conceptual idea is to have at most one outstanding flush for the
log going through the filesystem at any one time.  The effect, as far
as I can think through, would be to trade latency for bandwidth.  In
commit heavy situations you're almost always going to be starved for
rotational latency with the log while the full bandwidth of the log
device is rarely going to be much of a problem.

I don't understand PG well enough to know if/how this could be
implemented; I've had a look through transam/xlog.c and sort of
understand what's going on but will have missed all the subtleties of
its operation.  So, please take what I say below with a little salt!

The way I'm imagining it working is as follows; when a flush gets issued
the code does:
 global Lock l; global int writtento = 0, flushedto = 0; /* where are we known to have written data up to currently */
writtento= max(writtento,myrecord); /* try and acquire the flush lock */ if (!conditionalacquire (l)) {   /* lock
alreadytaken, block ourself until they finish by acquiring it */   acquire (lock);   /* if somebody "later" in the
queuegot unblocked then their flush is OK for us and we're winning */   if (myrecord <= flushedto) {     goto out;   }
}/* flush needed, record the latest write's position in the queue */ local int curat = writtento; /* actually perform
theflush */ fdatasync (log_fd); /* record where we're done flushing to so others can finish early */ flushedto =
curat;
out: /* send the next process off */ release (l);

To simplify; I've assumed that access to globals is always atomic,
locking would obviously need to be different in a real implementation.

In the case of a single client the performance hit is going to be in
a disk flush anyway; as this is likely to be a somewhat expensive
operation I'm hoping that taking a lock here isn't going to matter
much.  Two clients is going to be worse (I think) as it's going to wait
for the first client to finish flushing before sending the second flush
request off.  Three clients and more will be a win; the two clients will
wait while the first flush completes and then they'll both flush at the
same time.  This would appear to speed things up by n-2 times where n is
the number of clients waiting to commit.

What have I missed?

If this has been explored in the literature I'd appreciate any pointers;
I had a search but couldn't find anything---I'm not sure what the
terminology would be for this sort of thing anyway.

--  Sam  http://samason.me.uk/


pgsql-hackers by date:

Previous
From: Greg Stark
Date:
Subject: Re: DTrace probes broken in HEAD on Solaris?
Next
From: Josh Berkus
Date:
Subject: Re: GIN fast insert