On Tue, Oct 11, 2011 at 12:04 AM, Craig Ringer <ringerc@ringerc.id.au> wrote:
> On 11/10/11 12:48, John R Pierce wrote:
>> On 10/10/11 7:44 PM, Craig Ringer wrote:
>>> If blocking writes causes a server failure that persists once writes
>>> have been unblocked, that's a bug IMO. You might have a bit of a backlog
>>> of writes to clear, but after that all should be well, and if it isn't
>>> then something needs fixing.
>>
>> the process is blocked waiting for this disk write to complete,
>> meanwhile, the packets are queuing up and waiting for service.
>>
>> best of luck with all that....
>
> xfs_freeze for long enough to take a snapshot doesn't take long, or it
> shouldn't, anyway.
On average, xfs_freeze takes about 2 seconds for us with 8 EBS volumes
at 60GB each in a software RAID-0 array.
> Even if it did, that shouldn't cause a server failure
> that persists past when disk I/O is resumed, though it might cause
> individual connections to drop.
<DELETED>
> It is totally unreasonable for Pg to *stay* nonfunctional once disk I/O
> resumes. Existing connections should receive responses they're waiting
> on or die, depending on how long it's been, and new connections should
> be accepted fine.
Exactly. I genuinely expect Postgres to be able to withstand a couple
of seconds of blocked disk I/O. Especially since this isn't a heavy
duty transaction processing system - it's under load, but not a
tremendously high load. During our busier times we average something
in the neighborhood of 300-400 transactions per second, which just
doesn't seem like that much.
As much as I would like Postgres to withstand a 2 second outage, I
don't honestly care. I'd just like to figure out whether I'm looking
at something that's actually a problem or if I should be looking
elsewhere for the problem.
--
Sean Laurent
Director of Operations
StudyBlue, Inc.