Re: Analyse without locking? - Mailing list pgsql-performance

From Richard Neill
Subject Re: Analyse without locking?
Date
Msg-id 4B175950.6020703@cam.ac.uk
Whole thread Raw
In response to Analyse without locking?  (Richard Neill <rn214@cam.ac.uk>)
Responses Re: Checkpoint spikes
List pgsql-performance
Dear All,

I'm still puzzled by this one - it looks like it's causing about 5% of
queries to rise in duration from ~300ms to 2-6 seconds.

On the other hand, the system never seems to be I/O bound. (we have at
least 25 MB/sec of write bandwidth, and use a small fraction of that
normally).

Here's the typical checkpoint logs:

2009-12-03 06:21:21 GMT LOG:  checkpoint complete: wrote 12400 buffers
(2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled;
write=149.883 s, sync=5.143 s, total=155.040 s

We're using 8.4.1, on ext4 with SSD. Is it possible that something
exotic is occurring to do with write barriers (on by default in ext4,
and we haven't changed this).

Perhaps a low priority IO process for writing the previous WAL to disk
is blocking a high-priority transaction (which is trying to write to the
new WAL). If the latter is trying to sync, could the large amount of
lower priority IO be getting in the way thanks to write barriers?

If so, can I safely turn off write barriers?

Thanks,

Richard


P.S. Should I rename this thread?




Richard Neill wrote:
> Dear All,
>
> It definitely looks checkpoint-related - the checkpoint timeout is set
> to 5 minutes, and here is a graph of our response time (in ms) over a 1
> hour period. The query is pretty much identical each time.
>
> Any ideas what I could do to make checkpoints not hurt performance like
> this?
>
> Thanks,
>
> Richard
>
>
>
> Tom Lane wrote:
>> Richard Neill <rn214@cam.ac.uk> writes:
>>> Now, I understand that increasing checkpoint_segments is generally a
>>> good thing (subject to some limit), but doesn't that just mean that
>>> instead of say a 1 second outage every minute, it's a 10 second
>>> outage every 10 minutes?
>>
>> In recent PG versions you can spread the checkpoint I/O out over a
>> period of time, so it shouldn't be an "outage" at all, just background
>> load.  Other things being equal, a longer checkpoint cycle is better
>> since it improves the odds of being able to coalesce multiple changes
>> to the same page into a single write.  The limiting factor is your
>> threshold of pain on how much WAL-replay work would be needed to recover
>> after a crash.
>
>
> ------------------------------------------------------------------------
>

pgsql-performance by date:

Previous
From: Scott Carey
Date:
Subject: Re: Query times change by orders of magnitude as DB ages
Next
From: Greg Smith
Date:
Subject: Re: Checkpoint spikes